Repository: onlyphantom/cvessentials
Branch: master
Commit: e32691a5f1af
Files: 48
Total size: 459.3 KB

Directory structure:
gitextract_nnlovcxn/

├── .gitignore
├── README.md
├── digitrecognition/
│   ├── contourarea_01.py
│   ├── contourarea_02.py
│   ├── contourarea_03.py
│   ├── digit_01.py
│   ├── digitrec.html
│   ├── digitrec.md
│   ├── morphological_01.py
│   ├── morphological_02.py
│   ├── roi_01.py
│   ├── roi_02.py
│   └── utils/
│       └── enumerate.py
├── edgedetect/
│   ├── adaptivethresholding_01.py
│   ├── canny_01.py
│   ├── contour_01.py
│   ├── contourapprox.py
│   ├── edgedetect.html
│   ├── edgedetect.md
│   ├── gaussianblur_01.py
│   ├── gradient.py
│   ├── img2surface.py
│   ├── intensitythresholding_01.py
│   ├── kernel.html
│   ├── kernel.md
│   ├── meanblur_01.py
│   ├── meanblur_02.py
│   ├── meanblur_03.py
│   ├── sharpening_01.py
│   ├── sharpening_02.py
│   ├── sobel_01.py
│   ├── sobel_02.py
│   ├── sobel_03.py
│   ├── unsharpmask_01.py
│   ├── unsharpmask_02.py
│   └── utils/
│       └── gaussiancurve.r
├── quiz.md
├── requirements.txt
├── summarynotes/
│   └── class2201.md
└── transformation/
    ├── lecture_affine.html
    ├── lecture_affine.md
    ├── rotate_01.py
    ├── scale_01.py
    ├── scale_02.py
    ├── scale_03.py
    ├── scale_04.py
    ├── scale_05.py
    └── translate_01.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
solutions/
.DS_Store
.vscode/
answers.md


================================================
FILE: README.md
================================================
# Essentials of Computer Vision  

![](assets/blurb.png)

A math-first approach to learning computer vision in Python. The repository will contain all HTML, PDF, Markdown, Python Scripts, data, and media assets (images or links to supplementary videos). If you wish to contribute, I need translations for Bahasa Indonesia. Please submit a Pull Request.

## Study Guide
### Chapter 1
- Affine Transformation
    - [Definition](transformation/lecture_affine.html#definition)
        - [Mathematical Definitions](transformation/lecture_affine.html#mathematical-definitions)
    - [Practical Examples](transformation/lecture_affine.html#practical-examples)
    - [Motivation](transformation/lecture_affine.html#motivation)
    - [Getting Affine Transformation](transformation/lecture_affine.html#getting_affine-transformation)
        - [Trigonometry Proof](transformation/lecture_affine.html#trigonometry-proof)
    - [Code Illustrations](transformation/lecture_affine.html#code-illustrations)
    - [Summary and Key Points](transformation/lecture_affine.html#summary-and-key-points)
    - Optional video 
        - [Rotation Matrix Explained Visually](https://www.youtube.com/watch?v=tIixrNtLJ8U)
            - [w/ Bahasa Indonesia voiceover](https://www.youtube.com/watch?v=pWfXR_HmyUw)
    - References and learn-by-building modules

### Chapter 2
- Kernel Convolutions
    - [Definition](edgedetect/kernel.html#definition)
        - Optional video
            -  [Kernel Convolutions Explained Visually](https://www.youtube.com/watch?v=WMmHcrX4Obg)
        - [Mathematical Definitions](edgedetect/kernel.html#mathematical-definitions)
        - [Padding](edgedetect/kernel.html#a-note-on-padding)
    - [Smoothing and Blurring](edgedetect/kernel.html#smoothing-and-blurring)
    - [A Note on Terminology](edgedetect/kernel.html#a-note-on-terminology)
        - Kernels or Filters?
        -   Correlations vs Convolutions?
    - [Code Illustrations: Mean Filtering](edgedetect/kernel.html#code-illustrations-mean-filtering)
    - [Role in Convolution Neural Networks](edgedetect/kernel.html#role-in-convolutional-neural-networks)
    - [Handy Kernels for Image Processing](edgedetect/kernel.html#handy-kernels-for-image-processing)
        - [Gaussian Filtering](edgedetect/kernel.html#gaussian-filtering)
        - [Sharpening Kernels](edgedetect/kernel.html#sharpening-kernels)
        - [Gaussian Kernels for Sharpening](edgedetect/kernel.html#approximate-gaussian-kernel-for-sharpening)
        - [Unsharp Masking](edgedetect/kernel.html#unsharp-masking)
    - [Summary and Key Points](edgedetect/kernel.html#summary-and-key-points)
    - References and learn-by-building modules

### Chapter 3
- Edge Detection
    - [Definition](edgedetect/edgedetect.html#definition)
    - [Gradient-based Edge Detection](edgedetect/edgedetect.html#gradient-based-edge-detection)
        - [Sobel Operator](edgedetect/edgedetect.html#sobel-operator)
            - [Discrete Derivative](edgedetect/edgedetect.html#intuition-discrete-derivative)
            - [Code Illustrations: Sobel Operator](edgedetect/edgedetect.html#code-illustrations-sobel-operator)
        - [Gradient Orientation & Magnitude](edgedetect/edgedetect.html#dive-deeper-gradient-orientation-magnitude)
    - [Image Segmentation](edgedetect/edgedetect.html#image-segmentation)
        - [Intensity-based Segmentation](edgedetect/edgedetect.html#intensity-based-segmentation)
            - [Simple Thresholding](edgedetect/edgedetect.html#simple-thresholding)
            - [Adaptive Thresholding](edgedetect/edgedetect.html#adaptive-thresholding)
        - [Edge-based Contour Estimation](edgedetect/edgedetect.html#edge-based-contour-estimation)
            - [Contour Retrieval and Approximation](edgedetect/edgedetect.html#contour-retrieval-and-approximation)
    - [Canny Edge Detector](edgedetect/edgedetect.html#canny-edge-detector)
        - [Edge Thinning](edgedetect/edgedetect.html#edge-thinning)
        - [Hysteresis Thresholding](edgedetect/edgedetect.html#hysteresis-thresholding)
    - References and learn-by-building modules

### Chapter 4
- Digit Classification
    - [A Note on Deep Learning](digitrecognition/digitrec.html#what-about-deep-learning)
        - [Why not MNIST?](digitrecognition/digitrec.html#region-of-interest)
    - Region of Interest
        - [ROI identification](digitrecognition/digitrec.html#selecting-region-of-interest)
        - [Arc Length and Area Size](digitrecognition/digitrec.html#arc-length-and-area-size)
            - [Dive Deeper: ROI](digitrecognition/digitrec.html#dive-deeper-roi)
        - [ROI extraction](digitrecognition/digitrec.html#roi-extraction)
    - [Morphological Transformations](digitrecognition/digitrec.html#morphological-transformations)
        - [Erosion](digitrecognition/digitrec.html#erosion)
        - [Dilation](digitrecognition/digitrec.html#dilation)
        - [Opening and Closing](digitrecognition/digitrec.html#opening-and-closing)
        - [Learn-by-building: Morphological Transformation](digitrecognition/digitrec.html#learn-by-building-morphological-transformation)
    - [Seven-segment display](digitrecognition/digitrec.html#seven-segment-display)
        - [Practical Strategies](digitrecognition/digitrec.html#practical-strategies)
            - [Contour Properties](digitrecognition/digitrec.html#contour-properties)
    - [References and learn-by-building modules](digitrecognition/digitrec.html#references)

### Chapter 5
- Facial Recognition

## Approach and Motivation
The course is foundational to anyone who wish to work with computer vision in Python. It covers some of the most common image processing routines, and have in-depth coverage on mathematical concepts present in the materials: 
- Math-first approach
- Tons of sample python scripts (.py)
    - 45+ python scripts from chapter 1 to 4 for plug-and-play experiments
- Multimedia (image illustrations, video explanation, quiz)
    - 57 image assets from chapter 1 to 4 for practical illustrations
    - 4 PDFs, and 4 HTMLs, one for each chapter
- Practical tips on real-world applications

The course's **only dependency** is `OpenCV`. Getting started is as easy as `pip install opencv-contrib-python` and you're set to go.

##### Question: What about deep learning libraries?

No; While using deep learning for images made for interesting topics, they are probably better suited as an altogether separate course series. This course series (tutorial series) focused on the **essentials of computer vision** and,
for pedagogical reasons, try not to be overly ambitious with the scope it intends to cover. 

There will be similarity in concepts and principles, as modern neural network architectures draw plenty of inspirations from "classical" computer vision techniques that predate it. By first learning how computer vision problems are solved, the student can compare that to the deep learning equivalent, which result in a more comprehensive appreciation of what deep learning offer to modern day computer scientists. 

## Course Materials Preview:
### Python scripts
![](digitrecognition/assets/croproi.gif)

### PDF and HTML
![](assets/ecv_caption.gif)


# Workshops
I conduct in-person lectures using the materials you find in this repository. These workshops are usually paid because there are upfront costs to afford a venue and crew. Not just any venue, but a learning environment that is fully equipped (audio, desks, charging points for everyone, massive screen projector, walking space fo teaching assistants, dinner). 

You can follow me [on LinkedIn](http://linkedin.com/in/chansamuel/) to be updated about the latest workshops. I also make long-form programming tutorials and lessons on computer vision on [my YouTube channel](https://www.youtube.com/@SamuelChan)

### Introduction to AI in Computer Vision
- 4th January 2020, Jakarta
    - Kantorkuu, Citywalk sudirman, Jakarta Pusat
    - Time: 1300-1600
    - 3 hour
    - Fee: Free for Algoritma Alumni, 100k IDR for public

### Computer Vision: Principles and Practice
- 21st and 22nd January 2020, Jakarta
    - Accelerice, Jl. Rasuna Said, Jakarta Selatan
    - Time: 1830-2130
    - 6 Hour
    - Fee: Free for Algoritma Alumni, 1.5m IDR for public

- 24th and 25th Feburary 2020, Bangkok
    - JustCo, Samyan Mitrtown
    - Time: 1830-2130
    - 6 Hour
    - Fee: Free for Algoritma Alumni, 9000 THB for public


## Image Assets
- `car2.png`, `pen.jpg`, `lego.jpg` and `sudoku.jpg` are under Creative Commons (CC) license.

- `sarpi.jpg`, `castello.png`, `canal.png` and all other photography used are taken during my trip to Venice and you are free to use them. 

- All assets in Chapter 4 (the `digitrecognition` folder) are mine and you are free to use them.

- All other illustrations are created by me in Keynote. 

- Videos are created by me, and Bahasa Indonesia voice over on my videos is by [Tiara Dwiputri](https://github.com/tiaradwiputri)

## New to programming? 50-minute Quick Start
Here's a video: [Computer Vision Essentials 1](https://youtu.be/NWXY4ASRlgA) I created to get you through the installation and taking the first step into this lesson path.

If you need help in the course, attend my in-person workshops on this topic (Computer Vision Essentials, free) throughout the course of the year.

## Follow me
- [YouTube](https://www.youtube.com/@SamuelChan)
- [LinkedIn](http://linkedin.com/in/chansamuel/)
- [GitHub](https://github.com/onlyphantom)


================================================
FILE: digitrecognition/contourarea_01.py
================================================
import cv2

BCOLOR = (75, 0, 130)
THICKNESS = 4

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

cv2.drawContours(img_color, cnts, 0, BCOLOR, THICKNESS)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, BCOLOR, THICKNESS)
    print(f"ContourArea:{cv2.contourArea(cnt)}")
    cv2.imshow("Contour one by one", img_color)
    cv2.waitKey(0)


================================================
FILE: digitrecognition/contourarea_02.py
================================================
import cv2

PURPLE = (75, 0, 130)
YELLOW = (0, 255, 255)
THICKNESS = 4
FONT = cv2.FONT_HERSHEY_SIMPLEX

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, PURPLE, THICKNESS)
    x, y, w, h = cv2.boundingRect(cnt)
    cv2.rectangle(img_color, (x, y), (x + w, y + h), YELLOW, THICKNESS)
    area = round(cv2.contourArea(cnt), 1)
    peri = round(cv2.arcLength(cnt, closed=True), 1)
    print(f"ContourArea:{area}, Peri: {peri}")
    cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE, 1)
    cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4, PURPLE, 1)

cv2.imshow("Contours", img_color)
cv2.waitKey(0)


================================================
FILE: digitrecognition/contourarea_03.py
================================================
import cv2

PURPLE = (75, 0, 130)
YELLOW = (0, 255, 255)
THICKNESS = 4
FONT = cv2.FONT_HERSHEY_SIMPLEX

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

cv2.drawContours(img_color, cnts, 0, PURPLE, THICKNESS)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

for i in range(len(cnts)):
    cv2.drawContours(img_color, cnts, i, PURPLE, THICKNESS)
    print(f"ContourArea:{cv2.contourArea(cnts[i])}")
    x, y, w, h = cv2.boundingRect(cnts[i])
    cv2.rectangle(img_color, (x, y), (x + w, y + h), YELLOW, THICKNESS)

    area = round(cv2.contourArea(cnts[i]), 1)
    peri = round(cv2.arcLength(cnts[i], closed=True), 1)
    print(f"ContourArea:{area}, Peri: {peri}")
    cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE, 1)
    cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4, PURPLE, 1)

    cv2.imshow("Contour one by one", img_color)
    cv2.waitKey(0)


================================================
FILE: digitrecognition/digit_01.py
================================================
import cv2
import numpy as np

FONT = cv2.FONT_HERSHEY_SIMPLEX
CYAN = (255, 255, 0)
DIGITSDICT = {
    (1, 1, 1, 1, 1, 1, 0): 0,
    (0, 1, 1, 0, 0, 0, 0): 1,
    (1, 1, 0, 1, 1, 0, 1): 2,
    (1, 1, 1, 1, 0, 0, 1): 3,
    (0, 1, 1, 0, 0, 1, 1): 4,
    (1, 0, 1, 1, 0, 1, 1): 5,
    (1, 0, 1, 1, 1, 1, 1): 6,
    (1, 1, 1, 0, 0, 1, 0): 7,
    (1, 1, 1, 1, 1, 1, 1): 8,
    (1, 1, 1, 1, 0, 1, 1): 9,
}


# roi_color = cv2.imread("inter/dbs-roi.png")
roi_color = cv2.imread("inter/ocbc-roi.png")
roi = cv2.cvtColor(roi_color, cv2.COLOR_BGR2GRAY)

RATIO = roi.shape[0] * 0.2

roi = cv2.bilateralFilter(roi, 5, 30, 60)

trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
roi_color = roi_color[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
cv2.imshow("Blurred and Trimmed", trimmed)
cv2.waitKey(0)

edged = cv2.adaptiveThreshold(
    trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)
cv2.imshow("Edged", edged)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)

cv2.imshow("Dilated", dilated)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
dilated = cv2.dilate(dilated, kernel, iterations=1)

cv2.imshow("Dilated x2", dilated)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2, 1),)
eroded = cv2.erode(dilated, kernel, iterations=1)

cv2.imshow("Eroded", eroded)
cv2.waitKey(0)

h = roi.shape[0]
ratio = int(h * 0.07)
eroded[-ratio:,] = 0
eroded[:, :ratio] = 0

cv2.imshow("Eroded + Black", eroded)
cv2.waitKey(0)

cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
digits_cnts = []

canvas = trimmed.copy()
cv2.drawContours(canvas, cnts, -1, (255, 255, 255), 1)
cv2.imshow("All Contours", canvas)
cv2.waitKey(0)

canvas = trimmed.copy()
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    if h > 20:
        digits_cnts += [cnt]
        cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
        cv2.drawContours(canvas, cnt, 0, (255, 255, 255), 1)
        cv2.imshow("Digit Contours", canvas)
        cv2.waitKey(0)

print(f"No. of Digit Contours: {len(digits_cnts)}")


cv2.imshow("Digit Contours", canvas)
cv2.waitKey(0)


sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])

canvas = trimmed.copy()


for i, cnt in enumerate(sorted_digits):
    (x, y, w, h) = cv2.boundingRect(cnt)
    cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
    cv2.putText(canvas, str(i), (x, y - 3), FONT, 0.3, (0, 0, 0), 1)

cv2.imshow("All Contours sorted", canvas)
cv2.waitKey(0)

digits = []
canvas = roi_color.copy()
for cnt in sorted_digits:
    (x, y, w, h) = cv2.boundingRect(cnt)
    roi = eroded[y : y + h, x : x + w]
    print(f"W:{w}, H:{h}")
    # convenience units
    qW, qH = int(w * 0.25), int(h * 0.15)
    fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)

    # seven segments in the order of wikipedia's illustration
    sevensegs = [
        ((0, 0), (w, qH)),  # a (top bar)
        ((w - qW, 0), (w, halfH)),  # b (upper right)
        ((w - qW, halfH), (w, h)),  # c (lower right)
        ((0, h - qH), (w, h)),  # d (lower bar)
        ((0, halfH), (qW, h)),  # e (lower left)
        ((0, 0), (qW, halfH)),  # f (upper left)
        # ((0, halfH - fractionH), (w, halfH + fractionH)) # center
        (
            (0 + fractionW, halfH - fractionH),
            (w - fractionW, halfH + fractionH),
        ),  # center
    ]

    # initialize to off
    on = [0] * 7

    for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
        region = roi[p1y:p2y, p1x:p2x]
        print(
            f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
        )
        if np.sum(region == 255) > region.size * 0.5:
            on[i] = 1
        print(f"State of ON: {on}")

    digit = DIGITSDICT[tuple(on)]
    print(f"Digit is: {digit}")
    digits += [digit]
    cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
    cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
    cv2.imshow("Digit", canvas)
    cv2.waitKey(0)

print(f"Digits on the token are: {digits}")


================================================
FILE: digitrecognition/digitrec.html
================================================
<!DOCTYPE html><html><head>
      <title>digitrec</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      
        <script type="text/x-mathjax-config">
          MathJax.Hub.Config({"extensions":["tex2jax.js"],"jax":["input/TeX","output/HTML-CSS"],"messageStyle":"none","tex2jax":{"processEnvironments":false,"processEscapes":true,"inlineMath":[["$","$"],["\\(","\\)"]],"displayMath":[["$$","$$"],["\\[","\\]"]]},"TeX":{"extensions":["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]},"HTML-CSS":{"availableFonts":["TeX"]}});
        </script>
        <script type="text/javascript" async src="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.1/node_modules/@shd101wyy/mume/dependencies/mathjax/MathJax.js" charset="UTF-8"></script>
        
      
      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview h1,
.markdown-preview.markdown-preview h2,
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  font-weight: bolder;
  text-decoration-line: underline;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <h1 class="mume-header" id="background">Background</h1>

<p>In Chapter 4: Digit Recognition, we&apos;ll add a few new techniques to our image processing toolset by attempting to build a digit recognition pipeline from start to finish. Throughout the exercise, we will get to practice the image preprocessing tricks we&apos;ve picked up from previous chapters:</p>
<ul>
<li>Image manipulations such as resizing, cropping, rotation, color conversion</li>
<li>Blurring and sharpening operations</li>
<li>Thresholding and Edge Detection</li>
<li>Contour approximation</li>
</ul>
<p>New method and strategies that you&apos;ll be learning include:</p>
<ul>
<li>Drawing operations (rectangles, text) on our image</li>
<li>Region of interest and bounding rectangles</li>
<li>Morphological transformations</li>
<li>The Seven-Segment Display</li>
</ul>
<h2 class="mume-header" id="what-about-deep-learning">What about Deep Learning?</h2>

<p>To be clear, specialised deep learning libraries that have sprung out in recent years are a lot more robust in their approach. By utilizing machine learning principles (cost function, gradient descent etc), these specialised libraries can handle highly complex object recognition and OCR (optical character recognition) tasks at the cost of brute computing power.</p>
<p>The overarching motivation of this free course however, was to make clear to beginners what constitutes artificial intelligence, and to illustrate the principle benefits of machine learning. I try to achieve that by demonstrating -- over multiple chapters of this course -- how computer visions were traditionally, or rather &quot;classically&quot;, performed prior to the emergence of deep learning.</p>
<p>By learning the classical approaches to computer vision, the student (you) can compare the effort it takes to hand-tuning parameters and this adds a new dimension of appreciation towards self-learning methods that we&apos;ll discuss in the near future.</p>
<h2 class="mume-header" id="region-of-interest">Region of Interest</h2>

<p>Do a quick google search on &quot;digit recognition&quot; or &quot;digit classification&quot; and it&apos;s hard to find an introductory deep learning course that <strong>doesn&apos;t use</strong> the famous MNIST (Modified National Institute of Standards and Technology)<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup> database. This is a handwritten digit database that has long become the <em>de facto</em> in pretty much any machine learning tutorials:</p>
<p><img src="assets/mnist.png" alt></p>
<p>But I&apos;d argue, that for a budding computer vision developer, your learning objectives are better served by taking a different approach.</p>
<p>By choosing real life images, you are confronted with a few more key challenges that are not present from using a well-curated database such as MNIST. These challenges present new opportunities to learn about key concepts such as <strong>region of interest</strong>, and <strong>morphological operations</strong>, that you will come to rely upon greatly in the future.</p>
<p>First, take a look at 4 real-life pictures of security tokens issued by banks and institutional agencies (left-to-right: Bank Central Asia, DBS, OCBC Bank, OneKey for Singapore Government e-services):</p>
<p><img src="assets/securitytokens.png" alt></p>
<p>Notice how noisy these images are, as each image is shot with a different background, different lighting conditions, each token is of a different size and shape, and the different colors in each security token etc.</p>
<p>Your task, as a computer vision developer, is to develop a pipeline that, in each phase, take you closer to the goal. Roughly speaking, given the above task, we would formulate a pipeline that looks like the following:</p>
<ol>
<li>Preprocessing, noise reduction</li>
<li>Contour approximation</li>
<li>Find region of interest (ROI), that is the area of the LCD display in each of these pictures</li>
<li>Extract ROI for further preprocessing, discarding the rest of the image</li>
<li>Isolate each digit from the ROI</li>
<li>Iteratively classify each digit in the image</li>
<li>Combine the per-digit classification to a final string (&quot;output&quot;)</li>
</ol>
<p>In practice, step (1) and (2) above is the &quot;application&quot; of the methods you&apos;ve learned in previous chapters of this series. As we&apos;ll soon observe, we will use a combination of blurring operations and edge detection to draw our contours. Among the contours, one of them would be the LCD display containing the digits to be classified. That is our <strong>Region of Interest</strong>.</p>
<p><img src="assets/croproi.gif" alt></p>
<h3 class="mume-header" id="selecting-region-of-interest">Selecting Region of Interest</h3>

<p>The GIF above demonstrates the code in <code>roi_01.py</code> but essentially it shows the <code>selectROI</code> method in action. You&apos;ll commonly combined the <code>selectROI</code> method with a either a slicing operation to crop your region of interest, or a drawing operation to call attention to the specific region of the image.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">x<span class="token punctuation">,</span>y<span class="token punctuation">,</span>w<span class="token punctuation">,</span>h <span class="token operator">=</span> cv2<span class="token punctuation">.</span>selectROI<span class="token punctuation">(</span><span class="token string">&quot;Region of interest&quot;</span><span class="token punctuation">,</span> img<span class="token punctuation">)</span>
cropped <span class="token operator">=</span> img<span class="token punctuation">[</span>y<span class="token punctuation">:</span>y<span class="token operator">+</span>h<span class="token punctuation">,</span> x<span class="token punctuation">:</span>x<span class="token operator">+</span>w<span class="token punctuation">]</span>
<span class="token comment"># draw rectangle </span>
cv2<span class="token punctuation">.</span>rectangle<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span>y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token operator">+</span>w<span class="token punctuation">,</span>y<span class="token operator">+</span>h<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">255</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span>
</pre><p>In most cases, it simply wouldn&apos;t be realistic to render an image before manually specifying our region of interest. We&apos;ll need this operation to be as close to automatic as possible. But how exactly? That depends greatly on the specific problem set.</p>
<p>In some cases, the obvious choice of strategy would be simply shape recognition, say by counting the number of vertices from each contour. The following code is an example implementation of that:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># cnt = contour</span>
peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token comment"># contour approximation</span>
cnt_appro <span class="token operator">=</span> cv2<span class="token punctuation">.</span>approxPolyDP<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> <span class="token number">0.03</span> <span class="token operator">*</span> peri<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>cnt_approx<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">3</span><span class="token punctuation">:</span>
    est_shape <span class="token operator">=</span> <span class="token string">&apos;triangle&apos;</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
<span class="token keyword">elif</span> <span class="token builtin">len</span><span class="token punctuation">(</span>cnt_approx<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">5</span><span class="token punctuation">:</span>
    est_shape <span class="token operator">=</span> <span class="token string">&apos;pentagon&apos;</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
</pre><p>In other cases, you may employ a strategy that try to match contour based on Hu moments (which we&apos;ll study in details in future chapters).</p>
<p>Other methods may involve a saliency map, or a visual attention map, for ROI extraction. These methods create a new representation of the original image where each pixel&apos;s <strong>unique quality</strong> are amplified or emphasized. One example implementation on Wikipedia<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> demonstrates how straightforward this concept really is:</p>
<p></p><div class="mathjax-exps">$$SALS(I_K) = \sum^{N}_{i=1}|I_k-I_i|$$</div><p></p>
<p>As you add new tools and strategies to your computer vision toolbox, you will pick up new approaches to ROI extraction. It is an interesting field of research that has been gaining a lot in popularity with the emergence of deep learning.</p>
<p>As for the images of bank security tokens, can you think of an approach that may be a good fit? Our region of interest is the LCD screen at the top of the button pad on each device, and they all seem to be rather consistent in shape and size. Give it some thought and read on to find out.</p>
<h3 class="mume-header" id="arc-length-and-area-size">Arc Length and Area Size</h3>

<p>I&apos;ve hinted at the shape and size being a factor, so maybe that would be a good starting point. The good news is the OpenCV made this incredibly easy through the <code>contourArea()</code> and <code>arcLength()</code> function.</p>
<p>The following snippet of code, lifted from <code>contourarea_01.py</code>, finds all contours and sort them by area size in descending order before storing the first 10 in <code>cnts</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token comment"># sort contours by contourArea, and get the first 10</span>
cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">9</span><span class="token punctuation">]</span>
</pre><p>We can also obtain the contour area and parameter iteratively in a for-loop, like the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token builtin">len</span><span class="token punctuation">(</span>cnts<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    area <span class="token operator">=</span> cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span>
    peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> closed<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;Area:</span><span class="token interpolation"><span class="token punctuation">{</span>area<span class="token punctuation">}</span></span><span class="token string">, Perimeter:</span><span class="token interpolation"><span class="token punctuation">{</span>peri<span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
</pre><p>In effect, we&apos;re looping through each contour that the <code>findContours()</code> operation found, and computing two values each time, <code>area</code> and <code>peri</code>.</p>
<p>Note that the contour perimeter is also known as the arc length. The second argument <code>closed</code> specify whether the shape is a closed contour (<code>True</code>) or just a curve (<code>closed=False</code>).</p>
<p>Execute <code>contourarea_01.py</code> and observe how each contour is displayed, from the one with the largest area to the one with the least, for a total of 10 contours. As you run the script on different pictures of bank security tokens, you see that it does a reliable job at finding the contours, sorting them, and returning our LCD display screen as the first in the list. This makes sense, because visually it is apparent that the LCD display occupy the largest area among other closed shapes in our picture.</p>
<h4 class="mume-header" id="dive-deeper-roi">Dive Deeper: ROI</h4>

<ol>
<li>
<p>Use <code>assets/dbs.jpg</code> instead of <code>assets/ocbc.jpg</code> in <code>contourarea_01.py</code>. Were you able to extract the region of interest (LCD Display) successfully without any changes to the script?</p>
</li>
<li>
<p>Could we have successfully extract our region of interest have we used <code>arcLength</code> in our strategy?</p>
</li>
<li>
<p>Supposed we only wanted to extract the region of interest and not the rest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.</p>
</li>
<li>
<p>Supposed we wanted the contours sorted according to their respective area, from the smallest to the largest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.</p>
</li>
</ol>
<p>While working through the exercises above, you may find it helpful to also draw the text describing the area size and perimeter next to each contour. I&apos;ve shown you how this can be done in <code>contourarea_02.py</code> but the essential addition we make to the earlier code is the two calls to <code>putText()</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">PURPLE <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">75</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">130</span><span class="token punctuation">)</span>
THICKNESS <span class="token operator">=</span> <span class="token number">1</span>
FONT <span class="token operator">=</span> cv2<span class="token punctuation">.</span>FONT_HERSHEY_SIMPLEX
cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token string">&quot;Area:&quot;</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>area<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y <span class="token operator">-</span> <span class="token number">15</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.4</span><span class="token punctuation">,</span> PURPLE<span class="token punctuation">,</span>THICKNESS<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token string">&quot;Perimeter:&quot;</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>peri<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y <span class="token operator">-</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.4</span><span class="token punctuation">,</span>PURPLE<span class="token punctuation">,</span> THICKNESS<span class="token punctuation">)</span>
</pre><p><img src="assets/textcontour.png" alt></p>
<h3 class="mume-header" id="roi-extraction">ROI extraction</h3>

<p>With these foundations, we are now ready to write a simple utility script that:</p>
<ol>
<li>Find our region of interest</li>
<li>Crop ROI into a new image</li>
<li>Save it into an folder named <code>/inter</code> (intermediary) for the actual digit recognition later</li>
</ol>
<p>Much of what you need to do has already been presented so far, but the core pieces are, lifted from <code>roi_02.py</code> the following few lines of code:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">)</span>
blurred <span class="token operator">=</span> cv2<span class="token punctuation">.</span>GaussianBlur<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">7</span><span class="token punctuation">,</span> <span class="token number">7</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span>
edged <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Canny<span class="token punctuation">(</span>blurred<span class="token punctuation">,</span> <span class="token number">130</span><span class="token punctuation">,</span> <span class="token number">150</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">)</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">1</span><span class="token punctuation">]</span>

x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
roi <span class="token operator">=</span> img<span class="token punctuation">[</span>y <span class="token punctuation">:</span> y <span class="token operator">+</span> h<span class="token punctuation">,</span> x <span class="token punctuation">:</span> x <span class="token operator">+</span> w<span class="token punctuation">]</span>
cv2<span class="token punctuation">.</span>imwrite<span class="token punctuation">(</span><span class="token string">&quot;roi.png&quot;</span><span class="token punctuation">,</span> roi<span class="token punctuation">)</span>
</pre><p>The <code>roi_02.py</code> utility script uses the <code>argparse</code> library so user can specify a file path with a flag <code>-p</code> (or <code>--path</code>) like such:</p>
<pre data-role="codeBlock" data-info="bash" class="language-bash">python roi_02.py -p assets/ocbc.jpg
<span class="token comment"># equivalent:</span>
python roi_02.py --path assets/ocbc.jpg
</pre><p>If the user do not specify a file path using the <code>-p</code> flag, the default value would be <code>assets/ocbc.jpg</code>. If you wish to change this, edit <code>roi_02.py</code> and specify a different value for the <code>default</code> parameter.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">parser <span class="token operator">=</span> argparse<span class="token punctuation">.</span>ArgumentParser<span class="token punctuation">(</span><span class="token punctuation">)</span>
parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">&quot;-p&quot;</span><span class="token punctuation">,</span> <span class="token string">&quot;--path&quot;</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token string">&quot;assets/ocbc.jpg&quot;</span><span class="token punctuation">)</span>
</pre><p>You should run this exercise using <code>dbs.jpg</code>, <code>ocbc2.jpg</code>, or <code>onekey.jpg</code> at least once. Execute the script and check the <code>inter</code> folder to confirm that the ROI has been saved. When you&apos;re done, you are ready to move on to the next phase of the digit recognition pipeline.</p>
<h2 class="mume-header" id="morphological-transformations">Morphological Transformations</h2>

<p>Once the region of interest is obtained, we now have an image that may still contain noises. This is especially the case when our ROI is obtained by means of thresholding methods, since you can expect some &quot;non-features&quot; (noises) to also be included in the resulting image.</p>
<p>To account for these imperfections, we will now perform a series of operations on our image. We&apos;ll learn what they are formally, but let&apos;s begin by seeing what is it that they <em>offer</em> to our image processing pipeline. I&apos;ve included a picture with some random noise, as follow:</p>
<p><img src="assets/0417s.png" alt></p>
<p>The digit &quot;0417&quot; is clearly discernible to the human eye despite the presence of noise. However, consider the perspective of a global thresholding operation; These pixel values are &quot;noise&quot; to us but a computer has no such notion of which pixel values are meaningful and what others are not. A thresold value such as the global mean will take all values into account indiscriminately. A contour finding operation will, instead of 4, return thousands of tiny round segments (they may be tiny, but they are completely valid contours).</p>
<p>An image processing pipeline that fail to account for these may result in sub-optimal performance or, very often, completely undesired results.</p>
<p>Enter two of the most fundamental morphological transformations: <strong>erosion</strong> and <strong>dilation</strong>.</p>
<h3 class="mume-header" id="erosion">Erosion</h3>

<p>Erosion &quot;erodes away the boundaries of foreground object&quot;<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup> by sliding a kernel through the image and set a pixel to 1 <strong>only if all the pixels under the kernel is 1</strong>.</p>
<p>This in effect discard pixels near the boundary and any floating pixels that are not part of a larger blob (which is what the human eye is interested in). Because pixels are eroded, your foreground object will shrink in size.</p>
<h3 class="mume-header" id="dilation">Dilation</h3>

<p>The opposite of erosion, Dilation sets a pixel to 1 if <strong>at least one pixel under the kernel is 1</strong>, essentially &quot;growing&quot; the foreground object.</p>
<p>Because of how these operations work, there are a couple of things to note:</p>
<ol>
<li>Morphological transformations are usually performed on binary images. Recall that pixel values in binary images are either a full white (i.e 1) or black (i.e 0).</li>
<li>As per convention, we want to keep our foregound in white and background in black</li>
<li>Because erosion results in a shrinking foreground and dilation results in a growing foreground, these two operations are also commonly used in combinations, i.e erosion followed by dilation, or vice versa</li>
</ol>
<p><img src="assets/morphexample.png" alt></p>
<p>As we read our image in grayscale mode (<code>flags=0</code>), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled &quot;Original&quot; above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image.</p>
<p>An erosion operation is then performed (step 3). This works by creating our kernel (either through <code>numpy</code> or through <code>opencv</code>&apos;s structuring element) and sliding that kernel across our image to remove white noises in our image.</p>
<p>The side-effect is that our foreground object has now shrunk in size as it&apos;s boundaries are eroded away. We grow it back by applying a dilation (step 4) and finally show the output as illustrated in the bottom-right pane of the image above.</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># read as grayscale</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/0417s.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>
<span class="token comment"># step 1: </span>
_<span class="token punctuation">,</span> thresh <span class="token operator">=</span> cv2<span class="token punctuation">.</span>threshold<span class="token punctuation">(</span>roi<span class="token punctuation">,</span> <span class="token number">170</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY<span class="token punctuation">)</span>
<span class="token comment"># step 2:</span>
inv <span class="token operator">=</span> cv2<span class="token punctuation">.</span>bitwise_not<span class="token punctuation">(</span>thresh<span class="token punctuation">)</span>
<span class="token comment"># step 3 (option 1):</span>
kernel <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">)</span>
<span class="token comment"># step 3 (option 2):</span>
kernel <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getStructuringElement<span class="token punctuation">(</span>cv2<span class="token punctuation">.</span>MORPH_ELLIPSE<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
eroded <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>inv<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># step 4:</span>
dilated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Transformed&quot;</span><span class="token punctuation">,</span> dilated<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
</pre><p>OpenCV provides the three shapes for our kernel:</p>
<ul>
<li>Rectangular box: <code>MORPH_RECT</code></li>
<li>Cross: <code>MORPH_CROSS</code></li>
<li>Ellipse: <code>MORPH_ELLIPSE</code></li>
</ul>
<p>They are fed as the first argument into <code>cv2.getStructuringElement()</code>, with the second being the kernel size (<code>ksize</code>) itself. The third argument is the <em>anchor point</em>, which defaults to the center.</p>
<h3 class="mume-header" id="opening-and-closing">Opening and Closing</h3>

<p>Another name for <strong>Erosion, followed by Dilation</strong> is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first <strong>perform Dilation followed by Erosion</strong>, particularly suited for closing small holes inside foreground objects.</p>
<p>OpenCV includes the more generic <code>morphologyEx</code> method for all other morphological operations beyond Erosion and Dilation. The function takes an image as the first argument, an operation as the second operation and finally the kernel. Compare how your code will differ between <code>cv2.erode</code> and <code>cv2.dilate</code>, and their respective equivalence in <code>cv2.morphologyEx()</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token keyword">import</span> cv2
<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np

img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&apos;image.png&apos;</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span>
kernel <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span>np<span class="token punctuation">.</span>uint8<span class="token punctuation">)</span>
erosion <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>img<span class="token punctuation">,</span>kernel<span class="token punctuation">,</span>iterations <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># Equivalent:</span>
<span class="token comment"># cv2.morphologyEx(img, cv2.MORPH_ERODE, kernel,iterations=1)</span>
dilation <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>img<span class="token punctuation">,</span>kernel<span class="token punctuation">,</span>iterations <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># Equivalent:</span>
<span class="token comment"># cv2.morphologyEx(img, cv2.MORPH_DILATE, kernel,iterations=1)</span>
opening <span class="token operator">=</span> cv2<span class="token punctuation">.</span>morphologyEx<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>MORPH_OPEN<span class="token punctuation">,</span> kernel<span class="token punctuation">)</span>
closing <span class="token operator">=</span> cv2<span class="token punctuation">.</span>morphologyEx<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>MORPH_CLOSE<span class="token punctuation">,</span> kernel<span class="token punctuation">)</span>
</pre><h3 class="mume-header" id="learn-by-building-morphological-transformation">Learn-by-building: Morphological Transformation</h3>

<p>In the <code>homework</code> directory, you&apos;ll find <code>0417h.png</code>. Your job is to apply what you&apos;ve learned in this lesson to clean up the image. Your output should have these qualities:</p>
<ol>
<li>As free of noise as possible (remove the lines, and the red splatted dots across the image)</li>
<li>If you run <code>findContours()</code> on the output, you should have exactly 4 contours</li>
<li>Foreground object in white, background in black</li>
</ol>
<p><img src="homework/0417h.png" alt></p>
<p>You are free to pick your strategy, but a reference solution would look like the following:</p>
<p><img src="assets/0417reference.png" alt></p>
<h2 class="mume-header" id="seven-segment-display">Seven-segment display</h2>

<p>The seven-segment display (known also as &quot;seven-segment indicator&quot;) is a form of electronic display device for displaying decimal numerals<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup> widely used in digital clocks, electronic meters, calculators and banking security tokens.</p>
<p><img src="assets/sevenseg.png" alt></p>
<p>This is relevant because it is the character representation of our digits in each of these security tokens. If we can isolate each digit from each other, we can iteratively predict the &quot;class&quot; of each digit (0 to 9). Specifically, we are going to perform a classification task based on the state of each segment.</p>
<p>To ease our understanding, let&apos;s refer to each segment using the letters A to G:</p>
<p><img src="assets/sevenseg1.png" alt></p>
<p>We can then create a lookup table that match the collective states to the corresponding class:</p>
<table>
<thead>
<tr>
<th>Class</th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>e</th>
<th>f</th>
<th>g</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>6</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>9</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>How would we represent such a lookup table in our Python code and how would we use it? The obvious answer to the first question is a dictionary. Notice that <code>DIGITSDICT</code> is just a representation of the &quot;binary state&quot; of each segment. The digit &quot;8&quot; for example correspond to all seven segments being activated, or &quot;on&quot; (state of <code>1</code>).</p>
<pre data-role="codeBlock" data-info="py" class="language-python">DIGITSDICT <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">0</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">1</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">2</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">3</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">4</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">5</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">6</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">7</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">8</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">9</span>
<span class="token punctuation">}</span>
</pre><p>Then, for each digit, we would look at the pixel values in each of the seven segments, and if the majority of pixels are white, we would classify that segment as being in an activated state (<code>1</code>), otherwise in a state of <code>0</code>. As we iterate over the 7 segments, we now have an array of length 7, each element a binary value(<code>0</code> or <code>1</code>).</p>
<p>We would then find the corresponding value in our dictionary using that array. Your code would resemble the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># define the rectangle areas corresponding each segment</span>
sevensegs <span class="token operator">=</span> <span class="token punctuation">[</span>
    <span class="token punctuation">(</span><span class="token punctuation">(</span>x0<span class="token punctuation">,</span> y0<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x1<span class="token punctuation">,</span> y1<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token punctuation">(</span>x2<span class="token punctuation">,</span> y2<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x3<span class="token punctuation">,</span> y3<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token comment"># 7 of them</span>
<span class="token punctuation">]</span>

<span class="token comment"># initialize the state to OFF</span>
on <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">7</span> 

<span class="token comment"># set each segment to ON / OFF based on majority</span>
<span class="token keyword">for</span> <span class="token punctuation">(</span>i<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>p1x<span class="token punctuation">,</span> p1y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>p2x<span class="token punctuation">,</span> p2y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>sevensegs<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token comment"># numpy slicing to extract only one region</span>
    region <span class="token operator">=</span> roi<span class="token punctuation">[</span>p1y<span class="token punctuation">:</span>p2y<span class="token punctuation">,</span> p1x<span class="token punctuation">:</span>p2x<span class="token punctuation">]</span>
    <span class="token comment"># if majority pixels are white, set state to ON</span>
    <span class="token keyword">if</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> region<span class="token punctuation">.</span>size <span class="token operator">*</span><span class="token number">0.5</span><span class="token punctuation">:</span>
        on<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1</span>

<span class="token comment"># lookup on dictionary</span>
digit <span class="token operator">=</span> DIGITSDICT<span class="token punctuation">[</span><span class="token builtin">tuple</span><span class="token punctuation">(</span>on<span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token comment"># digit is one of 0-9</span>
</pre><p>There are multiple ways to write a for-loop but it&apos;s important that you are aware of the order in which your for-loop your executing. Referring to our seven-segment illustration below,the first iteration is only concerned with the state of &apos;A&apos; while the second interation handles the state of &apos;B&apos;, and so on.</p>
<p><img src="assets/sevenseg1.png" alt></p>
<p>Using <code>enumerate</code>, we obtain an additional counter (<code>i</code>) to our iterable (<code>sevensegs</code>); This is convenient for the purpose of setting states. At the first iteration, the first element is our list is conditionally set to 1 if more than half of the pixels in segment &apos;A&apos; are white. A more detailed example of python&apos;s enumeration is in <code>utils/enumerate.py</code>.</p>
<h3 class="mume-header" id="practical-strategies">Practical Strategies</h3>

<p>If you are paying close attention to the digit &apos;0&apos; in our LCD display, you will notice that the absence of the &apos;G&apos; segment causes a pretty visible and significant gap. When you test your digit recognition script without special consideration to this attribute, you will find it consistently failing to account for the numbers &quot;0&quot;,&quot;1&quot; and &quot;7&quot;. In fact, you may not even be able to isolate the aforementioned numbers altogether using the <code>findContour</code> operation, because they were treated as two disjointed pieces instead of a whole piece.</p>
<p>A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you&apos;ve learned earlier.</p>
<p>Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand.</p>
<p>As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;roi.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>
RATIO <span class="token operator">=</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">0.2</span>
trimmed <span class="token operator">=</span> roi<span class="token punctuation">[</span>
    <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span><span class="token punctuation">,</span> 
    <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span><span class="token punctuation">]</span>
</pre><p>That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution <strong>only</strong> working on those specific images and not others, a phenomenon fondly termed as &quot;overfitting&quot; in the machine learning community.</p>
<p>I&apos;ve re-executed the solution code against some sample image sets, once with the &quot;trimming&quot; in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more &quot;blank space&quot; at the top half compared to the bottom half.</p>
<h4 class="mume-header" id="contour-properties">Contour Properties</h4>

<p>Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits &quot;40710382&quot;, our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as &quot;1740238&quot;.</p>
<p>There are a few strategies you can employ here. We&apos;ve seen in  <code>contourarea_01.py</code> and <code>contourarea_02.py</code> how contour has attributes that can be retrieved using the <code>contourArea()</code> and <code>arcLength()</code> functions. Inspect the following snippet and it should help jog your memory:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">9</span><span class="token punctuation">]</span>

<span class="token keyword">for</span> i<span class="token punctuation">,</span> cnt <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>cnts<span class="token punctuation">)</span><span class="token punctuation">:</span>
    cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> cnts<span class="token punctuation">,</span> i<span class="token punctuation">,</span> BCOLOR<span class="token punctuation">,</span> THICKNESS<span class="token punctuation">)</span>
    area <span class="token operator">=</span> cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> closed<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Area:</span><span class="token interpolation"><span class="token punctuation">{</span>area<span class="token punctuation">}</span></span><span class="token string">; Perimeter: </span><span class="token interpolation"><span class="token punctuation">{</span>peri<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><p>Indeed, we&apos;re using countour area as a good indicator to search for our region of interest. When we take this idea a little further, we can further place a constraint on our search criteria. In the following code, we draw a bounding rectangle and for an extra layer of precaution, only takes any bounding boxes that are taller than 20 pixels (step 1).</p>
<p>Calling <code>boundingRect()</code> on a contour returns 4 values, respectively the x and y coordinate along with the width and height of the contour.</p>
<p>We then use another property of the contour, its top-left coordinate to determine the logical order of our digits. Specifically, we use the first returned value (<code>cv2.boundingRect(cnt)[0]</code>) since that&apos;s the x value for the top-left coordinate of each region. By sorting against this value, our digits are stored in the Python list in an ordered fashion, determined by their respective coordinate value.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">digits_cnts <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> cnts<span class="token punctuation">:</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    <span class="token comment"># step 1</span>
    <span class="token keyword">if</span> h <span class="token operator">&gt;</span> <span class="token number">20</span><span class="token punctuation">:</span>
        digits_cnts <span class="token operator">+=</span> <span class="token punctuation">[</span>cnt<span class="token punctuation">]</span>
<span class="token comment"># step 2</span>
sorted_digits <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>digits_cnts<span class="token punctuation">,</span> key<span class="token operator">=</span><span class="token keyword">lambda</span> cnt<span class="token punctuation">:</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</pre><p>When we put these together, we now have a complete pipeline:<br>
<img src="assets/digitrecflow.png" alt></p>
<p>The full solution code is in <code>digit_01.py</code> but the essential parts are as follow:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token keyword">import</span> cv2
<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np
<span class="token comment"># step 1:</span>
DIGITSDICT <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">0</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">1</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">2</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">3</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">4</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">5</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">6</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">7</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">8</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">9</span><span class="token punctuation">,</span>
<span class="token punctuation">}</span>

<span class="token comment"># step 2</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;inter/ocbc-roi.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>

<span class="token comment"># step 3</span>
RATIO <span class="token operator">=</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">0.2</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>bilateralFilter<span class="token punctuation">(</span>roi<span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">60</span><span class="token punctuation">)</span>
trimmed <span class="token operator">=</span> roi<span class="token punctuation">[</span><span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span><span class="token punctuation">]</span>

<span class="token comment"># step 4</span>
edged <span class="token operator">=</span> cv2<span class="token punctuation">.</span>adaptiveThreshold<span class="token punctuation">(</span>
    trimmed<span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>ADAPTIVE_THRESH_GAUSSIAN_C<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY_INV<span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span>
<span class="token punctuation">)</span>

<span class="token comment"># step 5</span>
kernel <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getStructuringElement<span class="token punctuation">(</span>cv2<span class="token punctuation">.</span>MORPH_RECT<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
dilated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
eroded <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>dilated<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>

<span class="token comment"># step 6</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
digits_cnts <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> cnts<span class="token punctuation">:</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    <span class="token keyword">if</span> h <span class="token operator">&gt;</span> <span class="token number">20</span><span class="token punctuation">:</span>
        digits_cnts <span class="token operator">+=</span> <span class="token punctuation">[</span>cnt<span class="token punctuation">]</span>

<span class="token comment"># step 7</span>
sorted_digits <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>digits_cnts<span class="token punctuation">,</span> key<span class="token operator">=</span><span class="token keyword">lambda</span> cnt<span class="token punctuation">:</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>

<span class="token comment"># step 8</span>
digits <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> sorted_digits<span class="token punctuation">:</span>
    <span class="token comment"># step 8a</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    roi <span class="token operator">=</span> eroded<span class="token punctuation">[</span>y <span class="token punctuation">:</span> y <span class="token operator">+</span> h<span class="token punctuation">,</span> x <span class="token punctuation">:</span> x <span class="token operator">+</span> w<span class="token punctuation">]</span>
    qW<span class="token punctuation">,</span> qH <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>w <span class="token operator">*</span> <span class="token number">0.25</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.15</span><span class="token punctuation">)</span>
    fractionH<span class="token punctuation">,</span> halfH<span class="token punctuation">,</span> fractionW <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.05</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>w <span class="token operator">*</span> <span class="token number">0.25</span><span class="token punctuation">)</span>

    <span class="token comment"># step 8b</span>
    sevensegs <span class="token operator">=</span> <span class="token punctuation">[</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> qH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># a (top bar)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span>w <span class="token operator">-</span> qW<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># b (upper right)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span>w <span class="token operator">-</span> qW<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># c (lower right)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> h <span class="token operator">-</span> qH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># d (lower bar)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>qW<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># e (lower left)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>qW<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># f (upper left)</span>
        <span class="token comment"># ((0, halfH - fractionH), (w, halfH + fractionH)) # center</span>
        <span class="token punctuation">(</span>
            <span class="token punctuation">(</span><span class="token number">0</span> <span class="token operator">+</span> fractionW<span class="token punctuation">,</span> halfH <span class="token operator">-</span> fractionH<span class="token punctuation">)</span><span class="token punctuation">,</span>
            <span class="token punctuation">(</span>w <span class="token operator">-</span> fractionW<span class="token punctuation">,</span> halfH <span class="token operator">+</span> fractionH<span class="token punctuation">)</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># center</span>
    <span class="token punctuation">]</span>

    <span class="token comment"># step 8c</span>
    on <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">7</span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span>i<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>p1x<span class="token punctuation">,</span> p1y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>p2x<span class="token punctuation">,</span> p2y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>sevensegs<span class="token punctuation">)</span><span class="token punctuation">:</span>
        region <span class="token operator">=</span> roi<span class="token punctuation">[</span>p1y<span class="token punctuation">:</span>p2y<span class="token punctuation">,</span> p1x<span class="token punctuation">:</span>p2x<span class="token punctuation">]</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>
            <span class="token string-interpolation"><span class="token string">f&quot;</span><span class="token interpolation"><span class="token punctuation">{</span>i<span class="token punctuation">}</span></span><span class="token string">: Sum of 1: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">, Sum of 0: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">, Shape: </span><span class="token interpolation"><span class="token punctuation">{</span>region<span class="token punctuation">.</span>shape<span class="token punctuation">}</span></span><span class="token string">, Size: </span><span class="token interpolation"><span class="token punctuation">{</span>region<span class="token punctuation">.</span>size<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span>
        <span class="token punctuation">)</span>
        <span class="token keyword">if</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> region<span class="token punctuation">.</span>size <span class="token operator">*</span> <span class="token number">0.5</span><span class="token punctuation">:</span>
            on<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;State of ON: </span><span class="token interpolation"><span class="token punctuation">{</span>on<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
    <span class="token comment"># step 8d</span>
    digit <span class="token operator">=</span> DIGITSDICT<span class="token punctuation">[</span><span class="token builtin">tuple</span><span class="token punctuation">(</span>on<span class="token punctuation">)</span><span class="token punctuation">]</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Digit is: </span><span class="token interpolation"><span class="token punctuation">{</span>digit<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
    digits <span class="token operator">+=</span> <span class="token punctuation">[</span>digit<span class="token punctuation">]</span>
    <span class="token comment"># step 9</span>
    cv2<span class="token punctuation">.</span>rectangle<span class="token punctuation">(</span>canvas<span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x <span class="token operator">+</span> w<span class="token punctuation">,</span> y <span class="token operator">+</span> h<span class="token punctuation">)</span><span class="token punctuation">,</span> CYAN<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>canvas<span class="token punctuation">,</span> <span class="token builtin">str</span><span class="token punctuation">(</span>digit<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x <span class="token operator">-</span> <span class="token number">5</span><span class="token punctuation">,</span> y <span class="token operator">+</span> <span class="token number">6</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.3</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Digit&quot;</span><span class="token punctuation">,</span> canvas<span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Digits on the token are: </span><span class="token interpolation"><span class="token punctuation">{</span>digits<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><ul>
<li>Step 1: Initialize the lookup dictionary</li>
<li>Step 2: Read our ROI image using OpenCV</li>
<li>Step 3: Noise reduction and trim away asymmetrical white space in our ROI</li>
<li>Step 4: Binarize our image using adaptive thresholding</li>
<li>Step 5: Morphological transformation to remove noise and fill the small holes in our digit</li>
<li>Step 6: Find contours in our image with a height greater than 20px</li>
<li>Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)</li>
<li>Step 8
<ul>
<li>Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (<code>h</code>) of our rectangular box</li>
<li>Step 8b: Slice the seven segments; The first segment (&quot;A&quot;) is from point (0,0) to (w, <code>int(h * 0.15)</code>); This segment is <code>w</code> in width and 15% the height of the full digit contour, starting from position (0, 0)</li>
<li>Step 8c: Initialize the state to <code>0</code> for each of the 7 segments, then conditionally set regions with more white than black pixels to <code>1</code></li>
<li>Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the <code>digits</code> list created at the beginning of step 8</li>
</ul>
</li>
<li>Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the <code>digits</code> list.</li>
</ul>
<h1 class="mume-header" id="references">References</h1>

<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278&#x2013;2324 <a href="#fnref1" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Saliency map, Wikipedia <a href="#fnref2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn3" class="footnote-item"><p>Morphological Transformations, OpenCV Documentation <a href="#fnref3" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn4" class="footnote-item"><p>Seven-segment display, Wikipedia <a href="#fnref4" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
</ol>
</section>

      </div>
      <div class="md-sidebar-toc"><ul>
<li><a href="#background">Background</a>
<ul>
<li><a href="#what-about-deep-learning">What about Deep Learning?</a></li>
<li><a href="#region-of-interest">Region of Interest</a>
<ul>
<li><a href="#selecting-region-of-interest">Selecting Region of Interest</a></li>
<li><a href="#arc-length-and-area-size">Arc Length and Area Size</a>
<ul>
<li><a href="#dive-deeper-roi">Dive Deeper: ROI</a></li>
</ul>
</li>
<li><a href="#roi-extraction">ROI extraction</a></li>
</ul>
</li>
<li><a href="#morphological-transformations">Morphological Transformations</a>
<ul>
<li><a href="#erosion">Erosion</a></li>
<li><a href="#dilation">Dilation</a></li>
<li><a href="#opening-and-closing">Opening and Closing</a></li>
<li><a href="#learn-by-building-morphological-transformation">Learn-by-building: Morphological Transformation</a></li>
</ul>
</li>
<li><a href="#seven-segment-display">Seven-segment display</a>
<ul>
<li><a href="#practical-strategies">Practical Strategies</a>
<ul>
<li><a href="#contour-properties">Contour Properties</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#references">References</a></li>
</ul>
</div>
      <a id="sidebar-toc-btn">&#x2261;</a>
    
    
<script>

var sidebarTOCBtn = document.getElementById('sidebar-toc-btn')
sidebarTOCBtn.addEventListener('click', function(event) {
  event.stopPropagation()
  if (document.body.hasAttribute('html-show-sidebar-toc')) {
    document.body.removeAttribute('html-show-sidebar-toc')
  } else {
    document.body.setAttribute('html-show-sidebar-toc', true)
  }
})
</script>
      
  
    </body></html>

================================================
FILE: digitrecognition/digitrec.md
================================================
# Background
In Chapter 4: Digit Recognition, we'll add a few new techniques to our image processing toolset by attempting to build a digit recognition pipeline from start to finish. Throughout the exercise, we will get to practice the image preprocessing tricks we've picked up from previous chapters:
- Image manipulations such as resizing, cropping, rotation, color conversion  
- Blurring and sharpening operations
- Thresholding and Edge Detection
- Contour approximation

New method and strategies that you'll be learning include:
- Drawing operations (rectangles, text) on our image  
- Region of interest and bounding rectangles
- Morphological transformations
- The Seven-Segment Display 

## What about Deep Learning?
To be clear, specialised deep learning libraries that have sprung out in recent years are a lot more robust in their approach. By utilizing machine learning principles (cost function, gradient descent etc), these specialised libraries can handle highly complex object recognition and OCR (optical character recognition) tasks at the cost of brute computing power.

The overarching motivation of this free course however, was to make clear to beginners what constitutes artificial intelligence, and to illustrate the principle benefits of machine learning. I try to achieve that by demonstrating -- over multiple chapters of this course -- how computer visions were traditionally, or rather "classically", performed prior to the emergence of deep learning. 

By learning the classical approaches to computer vision, the student (you) can compare the effort it takes to hand-tuning parameters and this adds a new dimension of appreciation towards self-learning methods that we'll discuss in the near future.

## Region of Interest
Do a quick google search on "digit recognition" or "digit classification" and it's hard to find an introductory deep learning course that **doesn't use** the famous MNIST (Modified National Institute of Standards and Technology)[^1] database. This is a handwritten digit database that has long become the _de facto_ in pretty much any machine learning tutorials:

![](assets/mnist.png)

But I'd argue, that for a budding computer vision developer, your learning objectives are better served by taking a different approach. 

By choosing real life images, you are confronted with a few more key challenges that are not present from using a well-curated database such as MNIST. These challenges present new opportunities to learn about key concepts such as **region of interest**, and **morphological operations**, that you will come to rely upon greatly in the future. 

First, take a look at 4 real-life pictures of security tokens issued by banks and institutional agencies (left-to-right: Bank Central Asia, DBS, OCBC Bank, OneKey for Singapore Government e-services): 

![](assets/securitytokens.png)

Notice how noisy these images are, as each image is shot with a different background, different lighting conditions, each token is of a different size and shape, and the different colors in each security token etc. 

Your task, as a computer vision developer, is to develop a pipeline that, in each phase, take you closer to the goal. Roughly speaking, given the above task, we would formulate a pipeline that looks like the following:
1. Preprocessing, noise reduction
2. Contour approximation
3. Find region of interest (ROI), that is the area of the LCD display in each of these pictures
4. Extract ROI for further preprocessing, discarding the rest of the image
5. Isolate each digit from the ROI
6. Iteratively classify each digit in the image
7. Combine the per-digit classification to a final string ("output")

In practice, step (1) and (2) above is the "application" of the methods you've learned in previous chapters of this series. As we'll soon observe, we will use a combination of blurring operations and edge detection to draw our contours. Among the contours, one of them would be the LCD display containing the digits to be classified. That is our **Region of Interest**.

![](assets/croproi.gif)

### Selecting Region of Interest
The GIF above demonstrates the code in `roi_01.py` but essentially it shows the `selectROI` method in action. You'll commonly combined the `selectROI` method with a either a slicing operation to crop your region of interest, or a drawing operation to call attention to the specific region of the image.

```py
x,y,w,h = cv2.selectROI("Region of interest", img)
cropped = img[y:y+h, x:x+w]
# draw rectangle 
cv2.rectangle(img_color, (x,y), (x+w,y+h), (255,0,0), 2)
```

In most cases, it simply wouldn't be realistic to render an image before manually specifying our region of interest. We'll need this operation to be as close to automatic as possible. But how exactly? That depends greatly on the specific problem set. 

In some cases, the obvious choice of strategy would be simply shape recognition, say by counting the number of vertices from each contour. The following code is an example implementation of that:

```py
# cnt = contour
peri = cv2.arcLength(cnt, True)
# contour approximation
cnt_appro = cv2.approxPolyDP(cnt, 0.03 * peri, True)
if len(cnt_approx) == 3:
    est_shape = 'triangle'
...
elif len(cnt_approx) == 5:
    est_shape = 'pentagon'
...
```

In other cases, you may employ a strategy that try to match contour based on Hu moments (which we'll study in details in future chapters). 

Other methods may involve a saliency map, or a visual attention map, for ROI extraction. These methods create a new representation of the original image where each pixel's **unique quality** are amplified or emphasized. One example implementation on Wikipedia[^2] demonstrates how straightforward this concept really is:

$$SALS(I_K) = \sum^{N}_{i=1}|I_k-I_i|$$

As you add new tools and strategies to your computer vision toolbox, you will pick up new approaches to ROI extraction. It is an interesting field of research that has been gaining a lot in popularity with the emergence of deep learning.

As for the images of bank security tokens, can you think of an approach that may be a good fit? Our region of interest is the LCD screen at the top of the button pad on each device, and they all seem to be rather consistent in shape and size. Give it some thought and read on to find out.

### Arc Length and Area Size
I've hinted at the shape and size being a factor, so maybe that would be a good starting point. The good news is the OpenCV made this incredibly easy through the `contourArea()` and `arcLength()` function. 

The following snippet of code, lifted from `contourarea_01.py`, finds all contours and sort them by area size in descending order before storing the first 10 in `cnts`:
```py
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by contourArea, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]
```

We can also obtain the contour area and parameter iteratively in a for-loop, like the following:
```py
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(cnts)):
    area = cv2.contourArea(cnts[i])
    peri = cv2.arcLength(cnts[i], closed=True)
    print(f'Area:{area}, Perimeter:{peri}')
```

In effect, we're looping through each contour that the `findContours()` operation found, and computing two values each time, `area` and `peri`. 

Note that the contour perimeter is also known as the arc length. The second argument `closed` specify whether the shape is a closed contour (`True`) or just a curve (`closed=False`). 

Execute `contourarea_01.py` and observe how each contour is displayed, from the one with the largest area to the one with the least, for a total of 10 contours. As you run the script on different pictures of bank security tokens, you see that it does a reliable job at finding the contours, sorting them, and returning our LCD display screen as the first in the list. This makes sense, because visually it is apparent that the LCD display occupy the largest area among other closed shapes in our picture.

#### Dive Deeper: ROI
1. Use `assets/dbs.jpg` instead of `assets/ocbc.jpg` in `contourarea_01.py`. Were you able to extract the region of interest (LCD Display) successfully without any changes to the script?

2. Could we have successfully extract our region of interest have we used `arcLength` in our strategy?

3. Supposed we only wanted to extract the region of interest and not the rest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly. 

4. Supposed we wanted the contours sorted according to their respective area, from the smallest to the largest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.

While working through the exercises above, you may find it helpful to also draw the text describing the area size and perimeter next to each contour. I've shown you how this can be done in `contourarea_02.py` but the essential addition we make to the earlier code is the two calls to `putText()`:

```py
PURPLE = (75, 0, 130)
THICKNESS = 1
FONT = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE,THICKNESS)
cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4,PURPLE, THICKNESS)
```

![](assets/textcontour.png)

### ROI extraction
With these foundations, we are now ready to write a simple utility script that:
1. Find our region of interest
2. Crop ROI into a new image
3. Save it into an folder named `/inter` (intermediary) for the actual digit recognition later

Much of what you need to do has already been presented so far, but the core pieces are, lifted from `roi_02.py` the following few lines of code:

```py
img = cv2.imread(...)
blurred = cv2.GaussianBlur(img, (7, 7), 0)
edged = cv2.Canny(blurred, 130, 150, 255)
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:1]

x, y, w, h = cv2.boundingRect(cnts[0])
roi = img[y : y + h, x : x + w]
cv2.imwrite("roi.png", roi)
```

The `roi_02.py` utility script uses the `argparse` library so user can specify a file path with a flag `-p` (or `--path`) like such:
```bash
python roi_02.py -p assets/ocbc.jpg
# equivalent:
python roi_02.py --path assets/ocbc.jpg
```

If the user do not specify a file path using the `-p` flag, the default value would be `assets/ocbc.jpg`. If you wish to change this, edit `roi_02.py` and specify a different value for the `default` parameter.

```py
parser = argparse.ArgumentParser()
parser.add_argument("-p", "--path", default="assets/ocbc.jpg")
```

You should run this exercise using `dbs.jpg`, `ocbc2.jpg`, or `onekey.jpg` at least once. Execute the script and check the `inter` folder to confirm that the ROI has been saved. When you're done, you are ready to move on to the next phase of the digit recognition pipeline. 

## Morphological Transformations
Once the region of interest is obtained, we now have an image that may still contain noises. This is especially the case when our ROI is obtained by means of thresholding methods, since you can expect some "non-features" (noises) to also be included in the resulting image. 

To account for these imperfections, we will now perform a series of operations on our image. We'll learn what they are formally, but let's begin by seeing what is it that they _offer_ to our image processing pipeline. I've included a picture with some random noise, as follow:

![](assets/0417s.png)

The digit "0417" is clearly discernible to the human eye despite the presence of noise. However, consider the perspective of a global thresholding operation; These pixel values are "noise" to us but a computer has no such notion of which pixel values are meaningful and what others are not. A thresold value such as the global mean will take all values into account indiscriminately. A contour finding operation will, instead of 4, return thousands of tiny round segments (they may be tiny, but they are completely valid contours). 

An image processing pipeline that fail to account for these may result in sub-optimal performance or, very often, completely undesired results. 

Enter two of the most fundamental morphological transformations: **erosion** and **dilation**. 

### Erosion
Erosion "erodes away the boundaries of foreground object"[^3] by sliding a kernel through the image and set a pixel to 1 **only if all the pixels under the kernel is 1**.

This in effect discard pixels near the boundary and any floating pixels that are not part of a larger blob (which is what the human eye is interested in). Because pixels are eroded, your foreground object will shrink in size.

### Dilation
The opposite of erosion, Dilation sets a pixel to 1 if **at least one pixel under the kernel is 1**, essentially "growing" the foreground object. 

Because of how these operations work, there are a couple of things to note:
1. Morphological transformations are usually performed on binary images. Recall that pixel values in binary images are either a full white (i.e 1) or black (i.e 0). 
2. As per convention, we want to keep our foregound in white and background in black  
3. Because erosion results in a shrinking foreground and dilation results in a growing foreground, these two operations are also commonly used in combinations, i.e erosion followed by dilation, or vice versa

![](assets/morphexample.png)

The full code solution is in `morphological_02.py`.

As we read our image in grayscale mode (`flags=0`), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled "Original" above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image. 

An erosion operation is then performed (step 3). This works by creating our kernel (either through `numpy` or through `opencv`'s structuring element) and sliding that kernel across our image to remove white noises in our image. 

The side-effect is that our foreground object has now shrunk in size as it's boundaries are eroded away. We grow it back by applying a dilation (step 4) and finally show the output as illustrated in the bottom-right pane of the image above.

```py
# read as grayscale
roi = cv2.imread("assets/0417s.png", flags=0)
# step 1: 
_, thresh = cv2.threshold(roi, 170, 255, cv2.THRESH_BINARY)
# step 2:
inv = cv2.bitwise_not(thresh)
# step 3 (option 1):
kernel = np.ones((5,5), np.uint8)
# step 3 (option 2):
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
eroded = cv2.erode(inv, kernel, iterations=1)
# step 4:
dilated = cv2.dilate(eroded, kernel, iterations=1)
cv2.imshow("Transformed", dilated)
cv2.waitKey(0)
```

OpenCV provides the three shapes for our kernel:
- Rectangular box: `MORPH_RECT`
- Cross: `MORPH_CROSS`
- Ellipse: `MORPH_ELLIPSE`

They are fed as the first argument into `cv2.getStructuringElement()`, with the second being the kernel size (`ksize`) itself. The third argument is the _anchor point_, which defaults to the center.

### Opening and Closing
Another name for **Erosion, followed by Dilation** is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first **perform Dilation followed by Erosion**, particularly suited for closing small holes inside foreground objects.

OpenCV includes the more generic `morphologyEx` method for all other morphological operations beyond Erosion and Dilation. The function takes an image as the first argument, an operation as the second operation and finally the kernel. Compare how your code will differ between `cv2.erode` and `cv2.dilate`, and their respective equivalence in `cv2.morphologyEx()`:

```py
import cv2
import numpy as np

img = cv2.imread('image.png',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
# Equivalent:
# cv2.morphologyEx(img, cv2.MORPH_ERODE, kernel,iterations=1)
dilation = cv2.dilate(img,kernel,iterations = 1)
# Equivalent:
# cv2.morphologyEx(img, cv2.MORPH_DILATE, kernel,iterations=1)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
```

### Learn-by-building: Morphological Transformation
In the `homework` directory, you'll find `0417h.png`. Your job is to apply what you've learned in this lesson to clean up the image. Your output should have these qualities:
1. As free of noise as possible (remove the lines, and the red splatted dots across the image)
2. If you run `findContours()` on the output, you should have exactly 4 contours
3. Foreground object in white, background in black

![](homework/0417h.png)

You are free to pick your strategy, but a reference solution would look like the following:

![](assets/0417reference.png)

## Seven-segment display
The seven-segment display (known also as "seven-segment indicator") is a form of electronic display device for displaying decimal numerals[^4] widely used in digital clocks, electronic meters, calculators and banking security tokens.

![](assets/sevenseg.png)

This is relevant because it is the character representation of our digits in each of these security tokens. If we can isolate each digit from each other, we can iteratively predict the "class" of each digit (0 to 9). Specifically, we are going to perform a classification task based on the state of each segment. 

To ease our understanding, let's refer to each segment using the letters A to G:

![](assets/sevenseg1.png)

We can then create a lookup table that match the collective states to the corresponding class:

| Class 	| a 	| b 	| c 	| d 	| e 	| f 	| g 	|
|-------	|---	|---	|---	|---	|---	|---	|---	|
| 0 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	| 0 	|
| 1 	| 0 	| 1 	| 1 	| 0 	| 0 	| 0 	| 0 	|
| 2 	| 1 	| 1 	| 0 	| 1 	| 1 	| 0 	| 1 	|
| 3 	| 1 	| 1 	| 1 	| 1 	| 0 	| 0 	| 1 	|
| 4 	| 0 	| 1 	| 1 	| 0 	| 0 	| 1 	| 1 	|
| 5 	| 1 	| 0 	| 1 	| 1 	| 0 	| 1 	| 1 	|
| 6 	| 1 	| 0 	| 1 	| 1 	| 1 	| 1 	| 1 	|
| 7 	| 1 	| 1 	| 1 	| 0 	| 0 	| 1 	| 0 	|
| 8 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	|
| 9 	| 1 	| 1 	| 1 	| 1 	| 0 	| 1 	| 1 	|


How would we represent such a lookup table in our Python code and how would we use it? The obvious answer to the first question is a dictionary. Notice that `DIGITSDICT` is just a representation of the "binary state" of each segment. The digit "8" for example correspond to all seven segments being activated, or "on" (state of `1`). 

```py
DIGITSDICT = {
    (1,1,1,1,1,1,0):0,
    (0,1,1,0,0,0,0):1,
    (1,1,0,1,1,0,1):2,
    (1,1,1,1,0,0,1):3,
    (0,1,1,0,0,1,1):4,
    (1,0,1,1,0,1,1):5,
    (1,0,1,1,1,1,1):6,
    (1,1,1,0,0,1,0):7,
    (1,1,1,1,1,1,1):8,
    (1,1,1,1,0,1,1):9
}
```

Then, for each digit, we would look at the pixel values in each of the seven segments, and if the majority of pixels are white, we would classify that segment as being in an activated state (`1`), otherwise in a state of `0`. As we iterate over the 7 segments, we now have an array of length 7, each element a binary value(`0` or `1`). 

We would then find the corresponding value in our dictionary using that array. Your code would resemble the following:

```py
# define the rectangle areas corresponding each segment
sevensegs = [
    ((x0, y0), (x1, y1)),
    ((x2, y2), (x3, y3)),
    ... # 7 of them
]

# initialize the state to OFF
on = [0] * 7 

# set each segment to ON / OFF based on majority
for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
    # numpy slicing to extract only one region
    region = roi[p1y:p2y, p1x:p2x]
    # if majority pixels are white, set state to ON
    if np.sum(region == 255) > region.size *0.5:
        on[i] = 1

# lookup on dictionary
digit = DIGITSDICT[tuple(on)] # digit is one of 0-9
```

There are multiple ways to write a for-loop but it's important that you are aware of the order in which your for-loop your executing. Referring to our seven-segment illustration below,the first iteration is only concerned with the state of 'A' while the second interation handles the state of 'B', and so on. 

![](assets/sevenseg1.png)

Using `enumerate`, we obtain an additional counter (`i`) to our iterable (`sevensegs`); This is convenient for the purpose of setting states. At the first iteration, the first element is our list is conditionally set to 1 if more than half of the pixels in segment 'A' are white. A more detailed example of python's enumeration is in `utils/enumerate.py`.

### Practical Strategies
If you are paying close attention to the digit '0' in our LCD display, you will notice that the absence of the 'G' segment causes a pretty visible and significant gap. When you test your digit recognition script without special consideration to this attribute, you will find it consistently failing to account for the numbers "0","1" and "7". In fact, you may not even be able to isolate the aforementioned numbers altogether using the `findContour` operation, because they were treated as two disjointed pieces instead of a whole piece. 

A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you've learned earlier. 

Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand. 

As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:

```py
roi = cv2.imread("roi.png", flags=0)
RATIO = roi.shape[0] * 0.2
trimmed = roi[
    int(RATIO) :, 
    int(RATIO) : roi.shape[1] - int(RATIO)]
```

That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution **only** working on those specific images and not others, a phenomenon fondly termed as "overfitting" in the machine learning community.

I've re-executed the solution code against some sample image sets, once with the "trimming" in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more "blank space" at the top half compared to the bottom half. 

#### Contour Properties
Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits "40710382", our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as "1740238". 

There are a few strategies you can employ here. We've seen in  `contourarea_01.py` and `contourarea_02.py` how contour has attributes that can be retrieved using the `contourArea()` and `arcLength()` functions. Inspect the following snippet and it should help jog your memory:

```py
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, BCOLOR, THICKNESS)
    area = cv2.contourArea(cnt)
    peri = cv2.arcLength(cnt, closed=True)
    print(f"Area:{area}; Perimeter: {peri}")
```

Indeed, we're using countour area as a good indicator to search for our region of interest. When we take this idea a little further, we can further place a constraint on our search criteria. In the following code, we draw a bounding rectangle and for an extra layer of precaution, only takes any bounding boxes that are taller than 20 pixels (step 1).

Calling `boundingRect()` on a contour returns 4 values, respectively the x and y coordinate along with the width and height of the contour. 

We then use another property of the contour, its top-left coordinate to determine the logical order of our digits. Specifically, we use the first returned value (`cv2.boundingRect(cnt)[0]`) since that's the x value for the top-left coordinate of each region. By sorting against this value, our digits are stored in the Python list in an ordered fashion, determined by their respective coordinate value. 

```py
digits_cnts = []
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    # step 1
    if h > 20:
        digits_cnts += [cnt]
# step 2
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])
```

When we put these together, we now have a complete pipeline:  
![](assets/digitrecflow.png)

The full solution code is in `digit_01.py` but the essential parts are as follow:

```py
import cv2
import numpy as np
# step 1:
DIGITSDICT = {
    (1, 1, 1, 1, 1, 1, 0): 0,
    (0, 1, 1, 0, 0, 0, 0): 1,
    (1, 1, 0, 1, 1, 0, 1): 2,
    (1, 1, 1, 1, 0, 0, 1): 3,
    (0, 1, 1, 0, 0, 1, 1): 4,
    (1, 0, 1, 1, 0, 1, 1): 5,
    (1, 0, 1, 1, 1, 1, 1): 6,
    (1, 1, 1, 0, 0, 1, 0): 7,
    (1, 1, 1, 1, 1, 1, 1): 8,
    (1, 1, 1, 1, 0, 1, 1): 9,
}

# step 2
roi = cv2.imread("inter/ocbc-roi.png", flags=0)

# step 3
RATIO = roi.shape[0] * 0.2
roi = cv2.bilateralFilter(roi, 5, 30, 60)
trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]

# step 4
edged = cv2.adaptiveThreshold(
    trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)

# step 5
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)
eroded = cv2.erode(dilated, kernel, iterations=1)

# step 6
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
digits_cnts = []
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    if h > 20:
        digits_cnts += [cnt]

# step 7
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])

# step 8
digits = []
for cnt in sorted_digits:
    # step 8a
    (x, y, w, h) = cv2.boundingRect(cnt)
    roi = eroded[y : y + h, x : x + w]
    qW, qH = int(w * 0.25), int(h * 0.15)
    fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)

    # step 8b
    sevensegs = [
        ((0, 0), (w, qH)),  # a (top bar)
        ((w - qW, 0), (w, halfH)),  # b (upper right)
        ((w - qW, halfH), (w, h)),  # c (lower right)
        ((0, h - qH), (w, h)),  # d (lower bar)
        ((0, halfH), (qW, h)),  # e (lower left)
        ((0, 0), (qW, halfH)),  # f (upper left)
        # ((0, halfH - fractionH), (w, halfH + fractionH)) # center
        (
            (0 + fractionW, halfH - fractionH),
            (w - fractionW, halfH + fractionH),
        ),  # center
    ]

    # step 8c
    on = [0] * 7
    for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
        region = roi[p1y:p2y, p1x:p2x]
        print(
            f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
        )
        if np.sum(region == 255) > region.size * 0.5:
            on[i] = 1
        print(f"State of ON: {on}")
    # step 8d
    digit = DIGITSDICT[tuple(on)]
    print(f"Digit is: {digit}")
    digits += [digit]
    # step 9
    cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
    cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
    cv2.imshow("Digit", canvas)
    cv2.waitKey(0)
print(f"Digits on the token are: {digits}")
```

- Step 1: Initialize the lookup dictionary
- Step 2: Read our ROI image using OpenCV
- Step 3: Noise reduction and trim away asymmetrical white space in our ROI
- Step 4: Binarize our image using adaptive thresholding
- Step 5: Morphological transformation to remove noise and fill the small holes in our digit
- Step 6: Find contours in our image with a height greater than 20px
- Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)
- Step 8
    - Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (`h`) of our rectangular box
    - Step 8b: Slice the seven segments; The first segment ("A") is from point (0,0) to (w, `int(h * 0.15)`); This segment is `w` in width and 15% the height of the full digit contour, starting from position (0, 0)
    -  Step 8c: Initialize the state to `0` for each of the 7 segments, then conditionally set regions with more white than black pixels to `1`
    -  Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the `digits` list created at the beginning of step 8
- Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the `digits` list. 


# References
[^1]: LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324
[^2]: Saliency map, Wikipedia
[^3]: Morphological Transformations, OpenCV Documentation
[^4]: Seven-segment display, Wikipedia
[^5]: Seven-segment display character representations, Wikipedia


================================================
FILE: digitrecognition/morphological_01.py
================================================
import cv2
import matplotlib.pyplot as plt

roi = cv2.imread("inter/ocbc-roi.png", flags=0)
blurred = cv2.bilateralFilter(roi, 5, 30, 60)
edged = cv2.adaptiveThreshold(
    blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)

plt.subplot(2, 2, 1), plt.imshow(roi, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(blurred, cmap="gray")
plt.title("Blurred"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(edged, cmap="gray")
plt.title("Edged"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(dilated, cmap="gray")
plt.title("Dilated"), plt.xticks([]), plt.yticks([])
plt.show()


================================================
FILE: digitrecognition/morphological_02.py
================================================
import cv2
import matplotlib.pyplot as plt

roi = cv2.imread("assets/0417s.png", flags=0)
cv2.imshow("Original", roi)
cv2.waitKey(0)

_, thresh = cv2.threshold(roi, 170, 255, cv2.THRESH_BINARY)
# thresh = cv2.adaptiveThreshold(dilated, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)
cv2.imshow("Threshold", thresh)
cv2.waitKey(0)

inv = cv2.bitwise_not(thresh)
cv2.imshow("Inverted", inv)
cv2.waitKey(0)


kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (4, 4))
eroded = cv2.erode(inv, kernel, iterations=1)

cv2.imshow("Eroded", eroded)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (6, 6))
dilated = cv2.dilate(eroded, kernel, iterations=1)

cv2.imshow("Dilated", dilated)
cv2.waitKey(0)


plt.subplot(2, 2, 1), plt.imshow(roi, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 2), plt.imshow(thresh, cmap="gray")
plt.title("Thresholded"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 3), plt.imshow(inv, cmap="gray")
plt.title("Inverted"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 4), plt.imshow(dilated, cmap="gray")
plt.title("Transformed"), plt.xticks([]), plt.yticks([])

plt.show()


================================================
FILE: digitrecognition/roi_01.py
================================================
import cv2
BCOLOR = (75, 0, 130)
THICKNESS = 4

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

x,y,w,h = cv2.selectROI("Region of interest", img)
print(x,y,w,h)

cropped = img[y:y+h, x:x+w]
cv2.imshow("Cropped", cropped)
cv2.waitKey(0)

cv2.rectangle(img_color, (x,y), (x+w,y+h), (255,0,0), 2)
cv2.imshow("Original Image", img_color)
cv2.waitKey(0)


================================================
FILE: digitrecognition/roi_02.py
================================================
import cv2
import argparse
import re

parser = argparse.ArgumentParser()
parser.add_argument("-p", "--path", default="assets/ocbc.jpg")
args = vars(parser.parse_args())

# test: dbs.jpg | ocbc.jpg
img_color = cv2.imread(args["path"])
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the largest
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:1]

cv2.drawContours(img_color, cnts, 0, (75, 0, 130), 4)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

x, y, w, h = cv2.boundingRect(cnts[0])
roi = img[y : y + h, x : x + w]
cv2.imshow("ROI", roi)

img_name = re.search("(?<=\/)(.*)(?=\.jpg)", args["path"]).group(1)

cv2.imwrite(f"inter/{img_name}-roi.png", roi)
cv2.waitKey(0)


================================================
FILE: digitrecognition/utils/enumerate.py
================================================
digits = ['a', 'b', 'c', 'd']

contracts = {
    # salesperson: contract value, duration
    'adam':(500, 2),
    'brian':(300, 1.5),
    'canny':(1000, 4)
}

# for i in range(len(digits)):
#     print(i, digits[i])
# better written as:
for i, d in enumerate(digits):
    print(i, d)

print('---')
print(dict(enumerate(digits)))

for i, c in enumerate(contracts):
    print(i, c)

for i, v in enumerate(contracts.values()):
    print(i, v)

d = {i+1:(k,f'${v1} for {v2} years') for i, (k,(v1, v2)) in enumerate(contracts.items())}
print(d)

================================================
FILE: edgedetect/adaptivethresholding_01.py
================================================
import cv2
import matplotlib.pyplot as plt
import numpy as np

img = cv2.imread("assets/sudoku.jpg", flags=0)
_, img_threshold = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)

img = cv2.medianBlur(img, 5)

mean_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
gaussian_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)

plt.subplot(2, 2, 1), plt.imshow(img, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(img_threshold, cmap="gray")
plt.title("Binary Threshold (global:50)"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(mean_adaptive, cmap="gray")
plt.title("Mean Adaptive"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(gaussian_adaptive, cmap="gray")
plt.title("Gaussian Adaptive"), plt.xticks([]), plt.yticks([])
plt.show()


================================================
FILE: edgedetect/canny_01.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("assets/castello.png", flags=0)
img = cv2.medianBlur(img, 9)
img = cv2.GaussianBlur(img, (9, 9), 0)

def sobel(img, k):
    gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0)
    gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1)   
    gradient_x = cv2.convertScaleAbs(gradient_x)
    gradient_y = cv2.convertScaleAbs(gradient_y)

    return cv2.addWeighted(gradient_x, 0.5, gradient_y, 0.5, 0)

sobel = sobel(img, 3)
canny = cv2.Canny(img, 50, 180)


plt.subplot(1, 2, 1)
plt.imshow(sobel, cmap="gray")
plt.title("Sobel Edge Detector"), plt.xticks([]), plt.yticks([])

plt.subplot(1, 2, 2)
plt.imshow(canny, cmap="gray")
plt.title("Canny Edge Detector"), plt.xticks([]), plt.yticks([])
plt.show()

================================================
FILE: edgedetect/contour_01.py
================================================
import cv2
import numpy as np


image = cv2.imread("assets/pens.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Grayscale", image)
cv2.waitKey(0)

image = cv2.GaussianBlur(image, (3, 3), 0)
cv2.imshow("After Smoothing", image)
cv2.waitKey(0)


def sobel(image):
    # run with col.png for best effect
    # cv2.Sobel last 2 argument -> order of derivatives in x and y direction respectively
    sobelX = cv2.Sobel(image, cv2.CV_64F, 1, 0)  # find vertical edges
    sobelY = cv2.Sobel(image, cv2.CV_64F, 0, 1)  # find horizontal edges along y-axis

    gradient_x = np.uint8(np.absolute(sobelX))
    gradient_y = np.uint8(np.absolute(sobelY))

    sobelCombined = cv2.bitwise_or(gradient_x, gradient_y)
    cv2.imshow("Sobel Combined", sobelCombined)
    cv2.waitKey(0)
    return sobelCombined


def counting_penguins(sobel, image):
    sobeled = sobel(image)
    _, edged = cv2.threshold(sobeled, 20, 255, cv2.THRESH_BINARY)
    cv2.imshow("(Edged)", edged)
    cv2.waitKey(0)
    cnts, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE,
    )

    canvas = np.ones(image.shape)
    cv2.drawContours(canvas, cnts, -1, (0, 255, 255), 1)
    cv2.imshow("Contour", canvas)
    cv2.waitKey(0)

    print(f"Found {len(cnts)} penguins")


if __name__ == "__main__":
    counting_penguins(sobel, image)


================================================
FILE: edgedetect/contourapprox.py
================================================
import cv2
import numpy as np


image = cv2.imread("homework/equal.png", flags=0)
cv2.imshow("Original", image)
cv2.waitKey(0)


def edge(image):
    _, edged = cv2.threshold(image, 220, 255, cv2.THRESH_BINARY_INV)
    cv2.imshow("(Edged)", edged)
    cv2.waitKey(0)
    cnts, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE,
    )
    print(f"Cnts Simple Shape (1): {cnts[0].shape}")
    print(f"Cnts Simple Shape (2): {cnts[0].shape}")
    cnts2, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_NONE,
    )
    print(f"Cnts NoApprox Shape:{cnts2[0].shape}")
    print(cnts)
    canvas = np.ones(image.shape)
    cv2.drawContours(canvas, cnts, -1, (0, 255, 255), 1)
    cv2.imshow("Contour", canvas)
    cv2.waitKey(0)
    print(f"Found {len(cnts)} shapes!")


if __name__ == "__main__":
    edge(image)


================================================
FILE: edgedetect/edgedetect.html
================================================
<!DOCTYPE html><html><head>
      <title>edgedetect</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      
        <script type="text/x-mathjax-config">
          MathJax.Hub.Config({"extensions":["tex2jax.js"],"jax":["input/TeX","output/HTML-CSS"],"messageStyle":"none","tex2jax":{"processEnvironments":false,"processEscapes":true,"inlineMath":[["$","$"],["\\(","\\)"]],"displayMath":[["$$","$$"],["\\[","\\]"]]},"TeX":{"extensions":["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]},"HTML-CSS":{"availableFonts":["TeX"]}});
        </script>
        <script type="text/javascript" async src="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.0/node_modules/@shd101wyy/mume/dependencies/mathjax/MathJax.js" charset="UTF-8"></script>
        
      
      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview h1,
.markdown-preview.markdown-preview h2,
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  font-weight: bolder;
  text-decoration-line: underline;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <h1 class="mume-header" id="definition">Definition</h1>

<p>An edge can be defined as boundary between regions in an image<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>. Edge detection techniques we&apos;ll learn in this course builds upon what we&apos;ve learned from our lessons in kernel convolution. It is the process of using kernels to reduce the information in our data and preserving only the necessary structural properties in our image<sup class="footnote-ref"><a href="#fn1" id="fnref1:1">[1:1]</a></sup>.</p>
<h1 class="mume-header" id="gradient-based-edge-detection">Gradient-based Edge Detection</h1>

<p>Gradient points in the direction of the most rapid increase in intensity. When we apply a gradient based edge detection method, we are searching for the maximum and minimum in the first derivative of the image.</p>
<p>When we apply our convolution onto the image, we are finding for regions in the image where there&apos;s a sharp change in intensity or color. Arguably the most common edge detection method using this approach is the Sobel Operator.</p>
<h2 class="mume-header" id="sobel-operator">Sobel Operator</h2>

<p>The <code>Sobel</code> operator applies a filtering operation to produce an image output where the edge is emphasized. It convolves our original image using two 3x3 kernels to capture approximations of the derivatives in both the horizontal and vertical directions.</p>
<p>The x-direction and y-direction kernels would be:</p>
<p></p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}  G_y = \begin{bmatrix} 1 &amp; 2 &amp; 1 \\ 0 &amp; 0 &amp; 0 \\ -1 &amp; -2 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>Each kernel is applied separately to obtain the gradient component in each orientation, <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>. Expressed in formula, the gradient magnitude is:<br>
</p><div class="mathjax-exps">$$|G| = \sqrt{G^2_x + G^2_y}$$</div><p></p>
<p>Where the slope <span class="mathjax-exps">$\theta$</span> of the gradient is calculated as follow:<br>
</p><div class="mathjax-exps">$$\theta(x,y)=tan^{-1}(\frac{G_y}{G_x})$$</div><p></p>
<p>If the two formula above confuses you, read on as we unpack these ideas one at a time.</p>
<h3 class="mume-header" id="intuition-discrete-derivative">Intuition: Discrete Derivative</h3>

<p>In computer vision literature, you&apos;ll often hear about &quot;taking the derivative&quot; and this may erve as a source of confusion for beginning practitioners since &quot;derivatives&quot; is often thought of in the context of a continuous function. Images are a 2D matrix of discrete values, so how do we wrap our head around the idea of finding derivative?</p>
<p>But why do we even bother with derivatives when this course is suppopsed to be about edge detection in images?</p>
<p><img src="assets/derivatives.png" alt></p>
<p>Among the many ways to answer the question, my favorite being that image is really just a function. When it treat an image as a function, the utility of taking derivatives become a little more obvious. In the image below, supposed you want to count the number of windows in this area of Venezia Sestiere Cannaregio, your program can look for large derivatives since there are sharp changes in pixel intensity from the windows to the surrounding wall:</p>
<p><img src="assets/surface.png" alt></p>
<p>The code to generate the surface plot above is in <code>img2surface.py</code>.</p>
<p>Going back to our x-direction kernel in the Sobel Operator.<br>
This kernel has all 0 in the middle, which is quite easy to intuit about. Essentially, for each pixel in our image, we want to compute its derivative in the x-direction by approximating a formula that you may have come across in your calculus class:</p>
<p></p><div class="mathjax-exps">$$f&apos;(x) = \lim_{h\to0}\frac{f(x+h)-f(x)}{h}$$</div><p></p>
<p>This approximation is also called &apos;forward difference&apos;, because we&apos;re taking a value of <span class="mathjax-exps">$x$</span>, and computing the difference in <span class="mathjax-exps">$f(x)$</span> as we increment it by a small amount forward, denoted as <span class="mathjax-exps">$h$</span>.</p>
<p>And as it turns out, using the &apos;central difference&apos; to compute the derivative of our discrete signal can deliver better results<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>:</p>
<p></p><div class="mathjax-exps">$$f&apos;(x) = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}$$</div><p></p>
<p>To make this more concrete, we can plug the formula into an actual array of pixels:</p>
<p></p><div class="mathjax-exps">$$[0, 255, 65, \underline{180}, 255, 255, 255]$$</div><p></p>
<p>when we set <span class="mathjax-exps">$h=2$</span> at the center pixel (index of value 180), we have the following:</p>
<p></p><div class="mathjax-exps">$$\begin{aligned} f&apos;(x) &amp; = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}\\ &amp; = \frac{f(x+1)-f(x-1)}{2} \\ &amp; = \frac{255-65}{2} \\  &amp; = 95 \end{aligned}$$</div><p></p>
<p>Notice that a large part of the calculation we just perform is synonymous to a 1D convolution operation using a <span class="mathjax-exps">$\begin{bmatrix} -1 &amp; 0 &amp;  1 \end{bmatrix}$</span> kernel.</p>
<p>When the same 1x3 kernel <span class="mathjax-exps">$\begin{bmatrix} -1 &amp; 0 &amp;  1 \end{bmatrix}$</span> is applied on the right-most part of the image where its just white space ([..., 255, 255, 255]) the kernel would evaluate to 0. In other words, our derivative filter returns no response where it can&apos;t detect a sharp change in pixel intensity.</p>
<p>As a reminder, the x-direction kernel in our Sobel Operator is the following:<br>
</p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>This takes our 1x3 kernel and instead of convolving one row of pixels at a time, extends it to convolve at 3x3 neighborhoods at a time using a weighted average approach.</p>
<h3 class="mume-header" id="code-illustrations-sobel-operator">Code Illustrations: Sobel Operator</h3>

<p>The two kernels (one for horizontal and another for vertical edge detection) can be constructed, respectively, like the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">sobel_x <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>

sobel_y <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</pre><p>You may have guessed that, given its role in digital image processing, <code>opencv</code> have included a method that performs our Sobel Operator for us, and thankfully there is. Here&apos;s an example of using the <code>cv2.Sobel(src, ddepth, dx, dy, dst=None, ksize)</code> method:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># Range: -177.0 | 204.0</span>

gradient_x <span class="token operator">=</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">(</span>np<span class="token punctuation">.</span>absolute<span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">(</span>np<span class="token punctuation">.</span>absolute<span class="token punctuation">(</span>gradient_y<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range uint8: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># Range uint8: 0 | 204</span>

cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Gradient X&quot;</span><span class="token punctuation">,</span> gradient_x<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Gradient Y&quot;</span><span class="token punctuation">,</span> gradient_y<span class="token punctuation">)</span>
</pre><p><img src="assets/sudokudemo.png" alt></p>
<p>The code above, extracted from <code>sobel_01.py</code> reinforces a couple of ideas that we&apos;ve been working on. It shows that:</p>
<ul>
<li>the <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>, gradients of the image, are computed separately through the convolution of two different Sobel kernels</li>
<li><span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span> responded to the change in pixel values along the x-direction and y-direction respectively, as visualized in the illustration above</li>
<li>convolution using the two Sobel filters may, and often will, produce a value outside the range of 0 and 255. Given the presence of [-1, -2, -1]  in one side of our kernel, mathematically this may lead to an output value of -1020. To store the values from these convolutions we use a 64-bit floating point (<code>cv2.CV_64F</code>). OpenCV suggests to &quot;keep the output datatype to some higher form such as <code>cv2.CV_64F</code>, take its absolute value and then convert back to <code>cv2.CV_8U</code>.<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>&quot;</li>
</ul>
<p>While the code above certainly works, OpenCV also has a method that scales, calculates absolute values and converts the result to 8-bit. <code>cv2.convertScaleAbs(src, dst, alpha=1, beta=0)</code> performs the following:<br>
</p><div class="mathjax-exps">$$dst(I) = cast&lt;uchar&gt;(|src(I) * \alpha + \beta|)$$</div><p></p>
<pre data-role="codeBlock" data-info="py" class="language-python">gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>

gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>convertScaleAbs<span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>convertScaleAbs<span class="token punctuation">(</span>gradient_y<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><h3 class="mume-header" id="dive-deeper-gradient-orientation-magnitude">Dive Deeper: Gradient Orientation &amp; Magnitude</h3>

<p>At the beginning of this course I said that images are really just 2d functions before showing you the intricacies of our Sobel kernels. We saw the clever design of both the x- and y-direction kernels, by borrowing from the concept of &quot;taking the derivatives&quot; you often see in calculus text books.</p>
<p>But on a really basic level, these kernels only return the x and y edge responses. These are <strong>not the image gradient</strong>, just pure arithmetic values from following the convolution process. To get to the final form (where the edges in our image are emphasized) we still need to compute the gradient direction and magnitude for each point in our image.</p>
<p>This brings us back to our original formula. Recall that the x-direction and y-direction kernels are:</p>
<p></p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}  G_y = \begin{bmatrix} 1 &amp; 2 &amp; 1 \\ 0 &amp; 0 &amp; 0 \\ -1 &amp; -2 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>We understand that each kernel is applied separately to obtain the gradient component in each orientation, <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>. What is the significance of this? Well as it turns out if we know the shift in the x-direction and the corresponding change in value in the y-direction, then we can use the pythagorean theorem to approximate the &quot;length of the slope&quot;, a concept that many of you are familiar with.</p>
<p>Expressed in formula, the gradient magnitude is hence:<br>
</p><div class="mathjax-exps">$$|G| = \sqrt{G^2_x + G^2_y}$$</div><p></p>
<p>Along with the well-known mathematical formula that is Pythagorean theorem, some of you may also have some familiarity with the three trigonometric functions. Particularly, the tangent function tells us that in a right triangle, the <strong>tangent of an angle is the length of the opposite side divided by the length of the adjacent side</strong>.</p>
<p>This leads us to the following expression:<br>
</p><div class="mathjax-exps">$$tan(\theta_{(x,y)})=\frac{G_y}{G_x}$$</div><p></p>
<p>To rewrite the expression above, we arrive at the formula to capture the gradient&apos;s direction:<br>
</p><div class="mathjax-exps">$$\theta_{(x,y)}=tan^{-1}(\frac{G_y}{G_x})$$</div><p></p>
<p><img src="assets/2dfuncs.png" alt></p>
<p>This whole idea is also illustrated in code, and the script is provided to you:</p>
<ul>
<li><code>gradient.py</code> to generate the vector field in the picture above (right)</li>
<li><code>img2surface.py</code> on the penguin image in the <code>assets</code> folder generates the surface plot</li>
</ul>
<p>Succinctly, supposed the two 3x3 kernels do not fire a response (for example when no edges are detected in the white background of our penguin), both <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span> will be 0, which leads to a gradient magnitude of 0. You can compute these by hand, let OpenCV&apos;s implementation handle that for you, or use <code>numpy</code> as illustrated in <code>gradient.py</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">dY<span class="token punctuation">,</span> dX <span class="token operator">=</span> np<span class="token punctuation">.</span>gradient<span class="token punctuation">(</span>img<span class="token punctuation">)</span>
</pre><h1 class="mume-header" id="image-segmentation">Image Segmentation</h1>

<p>Image segmentation is the process of decomposing an image into parts for further analysis. This has many utility:</p>
<ul>
<li>Background subtraction in human motion analysis</li>
<li>Multi-object classification</li>
<li>Find region of interest for OCR (optical character recognition)</li>
<li>Count pedestrians from a streamed video source</li>
<li>Isolating vehicle registration plates (license plate) and vehicle models from a busy highway scene</li>
</ul>
<p>Current literature on image segmentation techniques can be classified into<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>:</p>
<ul>
<li>Intensity-based segmentation</li>
<li>Edge-based segmentation</li>
<li>Region-based semantic segmentation</li>
</ul>
<p>It&apos;s important to note, however, that the rise in popularity of deep learning framework and techniques has ushered a proliferation of new methods to perform what was once a highly difficult task. In future lectures, we&apos;ll explore image segmentation in far greater details. In this course, we&apos;ll study intensity-based segmentation and edge-based segmentation methods.</p>
<h2 class="mume-header" id="intensity-based-segmentation">Intensity-based Segmentation</h2>

<p>Intensity-based method is perhaps the simplest as intensity is the simplest property that pixels can share.</p>
<p>To make a more concrete case of this, let&apos;s assume you&apos;re working with a team of researchers to build an AI-based &quot;sudoku solver&quot; that, unimaginatively, will compete against human sudoku players in an attempt to further stake the claim in an ongoing debate of AI superiority.</p>
<p>While your teammates work on the algorithmic design for the actual solver, your task is comparatively straightforward: write a script to scan newspaper images (or print magazines), binarize them to discard everything except the digits in the sudoku puzzle.</p>
<p>This presents a great opportunity to use an intensity-based segmentation technique we spoke about earlier.</p>
<p>In <code>intensitytresholding_01.py</code>, you&apos;ll find a code demonstration of the numerous thresholding methods provided by OpenCV. In total, there are 5 simple thresholding methods: <code>THRESH_BINARY</code>, <code>THRESH_BINARY_INV</code>, <code>THRESH_TRUNC</code>, <code>THRESH_TOZERO</code> and <code>THRESH_TOZERO_INV</code><sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup>.</p>
<h3 class="mume-header" id="simple-thresholding">Simple Thresholding</h3>

<p>The method call between all of them are identical:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cv2<span class="token punctuation">.</span>threshold<span class="token punctuation">(</span>img<span class="token punctuation">,</span> thresh<span class="token punctuation">,</span> maxval<span class="token punctuation">,</span> <span class="token builtin">type</span><span class="token punctuation">)</span>
</pre><p>We specify our source image <code>img</code> (usually in grayscale), a threshold value <code>thresh</code> used to binarize the image pixels, and a max value <code>maxval</code> for the pixel value to use for any pixel that crosses our threshold.</p>
<p>The mathematical functions for each one of them:<br>
<img src="assets/threshmethods.png" alt></p>
<p>They&apos;re collectively known as <strong>simple thresholding</strong> in OpenCV because they use a global threshold value; Any pixels smaller than the threshold is set to 0 otherwise it is set to the <code>maxval</code> value.</p>
<p>The probably sound too simplistic for anything beyond the simplest of real-world images, and for the majority of cases they are. They call for proper judgment of the task at hand.</p>
<p>Applying the various types of simple thresholding method on our sudoku image, we observe that the digits are for the most part extracted successfully while the background information are greatly reduced:</p>
<p><img src="assets/sudoku_simple.png" alt></p>
<p>Refer to<code>intensitythresholding_01.py</code> for the full code.</p>
<p>As a simple homework, try to practice <strong>simple thresholding</strong> on the <code>car2.png</code> located in your <code>homework</code> folder. To reduce noise, you may have to combine a blurring operation prior to thresholding. As you practice, pay attention to the interaction between your threshold values and the output. Later in the course, you&apos;ll learn how to draw contours, which would come in handy in producing the final output:</p>
<p><img src="assets/cars_hw.png" alt></p>
<p>As you work on your homework, you will notice that given the varying lighting condition across the different region of our image, regardless of the global value we pick we either have a threshold value that is too low or too high.</p>
<h3 class="mume-header" id="adaptive-thresholding">Adaptive Thresholding</h3>

<p>Using a global value as an intensity threshold may work in particular cases but may be overly naive to perform well when, say, an image has different lighting conditions in different areas. A great example of this case is the object extraction exercise you performed using <code>car2.png</code>.</p>
<p>Adaptive thresholding is not a lot different from the aforementioned thresholding techniques, except it determines the threshold for each pixel based on its neighborhood. This in effect mans that the image is assigned different thresholds across the different regions, leading to a cleaner output when our image has different degrees of illumination.</p>
<p><img src="assets/cars_adaptive.png" alt></p>
<p>The method is called with the source image (<code>src</code>), a max value (<code>maxValue</code>), the method (<code>adaptiveMethod</code>), a threshold type (<code>thresholdType</code>), the size of the neighborhood (<code>blockSize</code>) and a constant (<code>C</code>) that is subtracted from the mean or the weightted sum of the neighborhood pixels.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">mean_adaptive <span class="token operator">=</span> cv2<span class="token punctuation">.</span>adaptiveThreshold<span class="token punctuation">(</span>
    img<span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>ADAPTIVE_THRESH_MEAN_C<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY<span class="token punctuation">,</span> <span class="token number">11</span><span class="token punctuation">,</span> <span class="token number">2</span>
<span class="token punctuation">)</span>
gaussian_adaptive <span class="token operator">=</span> cv2<span class="token punctuation">.</span>adaptiveThreshold<span class="token punctuation">(</span>
    img<span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>ADAPTIVE_THRESH_GAUSSIAN_C<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY<span class="token punctuation">,</span> <span class="token number">11</span><span class="token punctuation">,</span> <span class="token number">2</span>
<span class="token punctuation">)</span>
</pre><p>The code, taken from <code>adaptivethresholding_01.py</code> produces the following:<br>
<img src="assets/sudoku_binary.png" alt></p>
<h2 class="mume-header" id="edge-based-contour-estimation">Edge-based contour estimation</h2>

<p>Edge-based segmentation separates foreground objects by first identifying all edges in our image. Sobel Operator and other gradient-based filter function are good and well-known candidates for such an operation.<sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup></p>
<p>Once we obtain the edges, we perform the contour approximation operation using the <code>findContours</code> method in OpenCV. But what exactly are contours?</p>
<p>In OpenCV&apos;s words<sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup>,</p>
<blockquote>
<p>Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.</p>
</blockquote>
<p>If we have &quot;a curve joining all the continuous points along the boundary&quot;, then we are able to extract this object. If we wish to count the number of contours in our image, the method also convenient return a list of all the found contours, making it easy to perform <code>len()</code> on the list to retrieve the final value.</p>
<p>There are three arguments to the <code>findContours()</code> function, first being the source image, second is the retrieval mode and last is the contour approximation method. Both the contour retrieval mode and approximation method is discussed in the next sub-section.</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> hierarchy<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>
    img<span class="token punctuation">,</span>
    cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span>
    cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">,</span>
<span class="token punctuation">)</span>
</pre><p>The function returns the contours and hierarchy, with contours being a list of all the contours in the image. Each contour is a Numpy array of <code>(x,y)</code> coordinates of boundary points of the object, giving each contour a shape of <code>(n, x, y)</code>.</p>
<p>What this allow us to do, is to combine the contours we retrieved with the <code>cv2.drawContours()</code> function either individually, exhaustively in a for-loop fashion, or just everything in one go.</p>
<p>Assuming <code>img</code> being the image we want to draw our contours on, the following code demonstrates these different methods:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># draw all contours</span>
cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cnts<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">255</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span>
<span class="token comment"># draw the 3rd contour</span>
cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cnts<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">255</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span>
<span class="token comment"># draw the first, fourth and fifth contour</span>
cnt_selected <span class="token operator">=</span> <span class="token punctuation">[</span>cnts<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> cnts<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">,</span> cnts<span class="token punctuation">[</span><span class="token number">4</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>canvas<span class="token punctuation">,</span> cnt_selected<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># draw the fourth contour</span>
cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>img<span class="token punctuation">,</span> contours<span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">255</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">)</span>
</pre><p>The first argument to this function being the source image, the second is the contours as a Python list, the third is the index of contours and remaining arguments are color and thickness of contour lines respectively.</p>
<p>One common problem beginners can run into is to perform the <code>findContours</code> operation on the grayscale image instead of the binary image, leading to poorer accuracy.</p>
<p>When we execute <code>contour_01.py</code>, we notice that the <code>drawContour</code> operation yields the following output:</p>
<p><img src="assets/handholding.png" alt></p>
<p>There are 5 occurrences where our <code>findContours</code> function incorrectly approximated the wrong contour because two penguins were too close to each other. When we execute <code>len(cnts)</code>, we will find that the returned value is 5 less than the actual count.</p>
<p>Try to fix <code>contour_01.py</code> by performing the contour approximation on our binary image using the thresholding technique you&apos;ve learned in previous section.</p>
<h3 class="mume-header" id="contour-retrieval-and-approximation">Contour Retrieval and Approximation</h3>

<p>In the <code>findContours()</code> function call, we passed our image to <code>src</code> in the first argumet. The second argument is the contour retrieval mode, and there are documentation for 4 of them<sup class="footnote-ref"><a href="#fn8" id="fnref8">[8]</a></sup>:</p>
<ul>
<li><code>RETR_EXTERNAL</code>: retrieves only the extreme outer contours (see image below for reference)</li>
<li><code>RETR_LIST</code>: retrieves all contours without establishing any hierarchical relationships</li>
<li><code>RETR_CCOMP</code>: retrieves all contours and organize them into a two-level hierarchy (external boundary + boundaries of the holes)</li>
<li><code>RETR_TREE</code>: retrieves all of the contours and reconstructs a full hierarchy of nested contours</li>
</ul>
<p><img src="assets/outervsall.png" alt></p>
<p>In our case, we don&apos;t particularly care about the hierarchy, and so the second to fourth method all has the same effect. In other cases, you may experiment with a different contour retrieval method to obtain both the contours and the hierarchy for further processing.</p>
<p>What about the last parameter passed to our <code>findContours</code> method?</p>
<p>Recall that contours are just boundaries of a shape? In a sense, it is an array of <code>(x,y)</code> coordinates used to &quot;record&quot; the boundary of a shape. Given this collection of coordinates, we can then recreate the boundary of our shape. This begs the next question: how many set of coordinates do we need to store to recreate our boundary?</p>
<p>Supposed we perform the <code>findContour</code> operation on an image of two rectangles, one method it may use to achieve that is to store as many points around these rectangle boxes as possible? When we set <code>cv2.CHAIN_APPROX_NONE</code>, that is in fact what the algorithm would do, resulting in 658 points around the border of the top rectangle:<br>
<img src="homework/equal.png" alt></p>
<p>However, notice the more efficient solution would have been to store only the 4 coordinates at each corner of the rectangle. The contour is perfectly represented and recreated using just 4 points for each rectangle, resulting in a total number of 8 points compared to 1,316 points. <code>cv2.CHAIN_APPROX_SIMPLE</code><sup class="footnote-ref"><a href="#fn9" id="fnref9">[9]</a></sup> is an implementation of this, and you can find the sample code below:</p>
<p><img src="assets/approx.png" alt></p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>
        <span class="token comment"># does this need to be changed?</span>
        edged<span class="token punctuation">,</span>
        cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span>
        cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">,</span>
    <span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Cnts Simple Shape (1): </span><span class="token interpolation"><span class="token punctuation">{</span>cnts<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>shape<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># return: Cnts Simple Shape (1): (4, 1, 2)</span>
<span class="token comment"># output of cnts[0]:</span>
<span class="token comment"># array([[[ 47, 179]],</span>
<span class="token comment">#       [[ 47, 259]],</span>
<span class="token comment">#       [[296, 259]],</span>
<span class="token comment">#       [[296, 179]]], dtype=int32)</span>

cnts2<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>
        <span class="token comment"># does this need to be changed?</span>
        edged<span class="token punctuation">,</span>
        cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span>
        cv2<span class="token punctuation">.</span>CHAIN_APPROX_NONE<span class="token punctuation">,</span>
    <span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Cnts NoApprox Shape:</span><span class="token interpolation"><span class="token punctuation">{</span>cnts2<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">.</span>shape<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># Cnts NoApprox Shape:(658, 1, 2)</span>
</pre><p>The full script for the experiment above is in <code>contourapprox.py</code>.</p>
<p>You may, at this point, hop to the Learn By Building section to attempt your homework.</p>
<h1 class="mume-header" id="canny-edge-detector">Canny Edge Detector</h1>

<p>John Canny developed a multi-stage procedure that, some 30 years later, is &quot;still a state-of-the-art edge detector&quot;<sup class="footnote-ref"><a href="#fn10" id="fnref10">[10]</a></sup>. Better edge detection algorithms usually require greater computational resources -- and consequently -- longer processing times -- or a greater number of parameters, in an area where algorithm speed is oftentimes the most important criteria. For the reasons above along with its general robustness, the canny edge algorithm has become one of the &quot;most important methods to find edges&quot; even in modern literature<sup class="footnote-ref"><a href="#fn1" id="fnref1:2">[1:2]</a></sup>.</p>
<p>I said it&apos;s a multi-stage procedure, because the technique as described in his original paper, <em>computational theory of edge detection</em>, works as follow<sup class="footnote-ref"><a href="#fn11" id="fnref11">[11]</a></sup>:</p>
<ol>
<li>Gaussian smoothing
<ul>
<li>Noise reduction using a 5x5 Gaussian filter</li>
</ul>
</li>
<li>Compute gradient magnitudes and angles</li>
<li>Apply non-maximum suppression (NMS)
<ul>
<li>Suppress close-by edges that are non-maximal, leaving only local maxima as edges</li>
</ul>
</li>
<li>Track edge by hysterisis
<ul>
<li>Suppress all other edges that are weak and not connected to strong edges and link the edges</li>
</ul>
</li>
</ol>
<p>Step (1) and (2) in the procedure above can be achieved using code we&apos;ve written so far in our Sobel Operator scripts. We use the Sobel mask filters to compute <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>, respectively the gradient component in each orientation. We then compute the gradient magnitude and the angle <span class="mathjax-exps">$\theta$</span>:</p>
<p>Gradient magnitude:<br>
</p><div class="mathjax-exps">$$|G| = \sqrt{G^2_x + G^2_y}$$</div><p></p>
<p>And recall that the slope <span class="mathjax-exps">$\theta$</span> of the gradient is calculated as follow:<br>
</p><div class="mathjax-exps">$$\theta(x,y)=tan^{-1}(\frac{G_y}{G_x})$$</div><p></p>
<h2 class="mume-header" id="edge-thinning">Edge Thinning</h2>

<p>Step (3) in the procedure is another common technique in computer vision known as the non-maximum suppression (NMS). Let&apos;s begin by taking a look at the output of our Sobel edge detector from earlier exercises:<br>
<img src="assets/sobeledges.png" alt></p>
<p>Notice as we zoom in on the output image, we can see the gradient-based method did create our strong edges, but it also created &quot;weak&quot; edges it find in our image. Because it is not a parameterized function -- the edge is computed using values of the gradient magnitude and direction -- we have to rely on an additional mechanism for the edge thinning operation with the criterion being one accurate response to any given edge<sup class="footnote-ref"><a href="#fn12" id="fnref12">[12]</a></sup>.</p>
<p>Non-maximum suppression help us obtain the strongest edge by suppressing all the gradient values, i.e. setting them to 0 except for the local maxima, which indicate locations with the sharpest change of intensity value. In the words of <code>OpenCV</code>:</p>
<blockquote>
<p>After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a local maximum in its neighborhood in the direction of gradient. If point A is on the edge, and point B and C are in gradient directions, point A is checked with point B and C to see if it forms a local maximum. If so, it is considered for next stage, otherwise, it is suppressed (put to zero).</p>
</blockquote>
<p>The output of step (3) is a binary image with thin edges.</p>
<p>The code<sup class="footnote-ref"><a href="#fn13" id="fnref13">[13]</a></sup> demonstrates how you would code such an NMS for the purpose of canny edge detection.</p>
<h2 class="mume-header" id="hysterisis-thresholding">Hysterisis Thresholding</h2>

<p>The final step of this multi-stage algorithm decides which among all edges are really edges and which of them are not. It accomplishes this using two threshold values, specified when we call the <code>cv2.Canny()</code> function:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">canny <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Canny<span class="token punctuation">(</span>img<span class="token punctuation">,</span> threshold1<span class="token operator">=</span><span class="token number">50</span><span class="token punctuation">,</span> threshold2<span class="token operator">=</span><span class="token number">180</span><span class="token punctuation">)</span>
</pre><p>Any edges with an intensity gradient above <code>threshold2</code> are considered edges and any edges below <code>threshold1</code> are considered non-edges and so are suppressed.</p>
<p>The edges that lie between these two values (in our code above, edges with intensity gradient between 50 and 180) are classified as edges <strong>if they are connected to sure-edge pixels</strong> (the ones above 180) otherwise they are also discarded.</p>
<p>This stage also removes small pixels (&quot;noises&quot;) on the assumption that edges are long lines (&quot;connected&quot;).</p>
<p>The full procedure is implemented in a single function, <code>cv2.Canny()</code> and the first three parameters are required, respectively being the input image, the first and second threshold value. <code>canny_01.py</code> implements this and compare that to the Sobel Edge detector we developed earlier:</p>
<p><img src="assets/sobelvscanny.png" alt></p>
<h2 class="mume-header" id="learn-by-building">Learn By Building</h2>

<p>In the <code>homework</code> directory, you&apos;ll find a picture of scattered lego bricks <code>lego.jpg</code>. Exactly the kind of stuff you don&apos;t want on your bedroom floor, as anyone living with kids at home would testify.</p>
<p>Your job is to apply what you&apos;ve learned in this lesson to combine what you&apos;ve learned from the class in kernel convolutions and Edge Detection (<code>kernel.md</code>) to build a lego brick counter.</p>
<p>Note that there are many ways you can build an edge detection. Given what you&apos;ve learned so far, there are at least 3 equally adequate routines you can apply for this particular problem set.</p>
<p>For the sake of this exercise, your script should feature the use of a Sobel Operator (or a similar gradient-based edge detection method) since this is the main topic of this chapter.</p>
<p><img src="homework/lego.jpg" alt></p>
<h2 class="mume-header" id="references">References</h2>

<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>S.Kaur, I.Singh, Comparison between Edge Detection Techniques, International Journal of Computer Applications, July 2016 <a href="#fnref1" class="footnote-backref">&#x21A9;&#xFE0E;</a> <a href="#fnref1:1" class="footnote-backref">&#x21A9;&#xFE0E;</a> <a href="#fnref1:2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Carnegie Mellon University, Image Gradients and Gradient Filtering (16-385 Computer Vision) <a href="#fnref2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn3" class="footnote-item"><p>Image Gradients, <a href="https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_gradients/py_gradients.html">OpenCV Documentation</a> <a href="#fnref3" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn4" class="footnote-item"><p>University of Victoria, Electrical and Computer Engineering, Computer Vision: Image Segmentation <a href="#fnref4" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn5" class="footnote-item"><p>Image Thresholding, <a href="https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html">OpenCV Documentation</a> <a href="#fnref5" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn6" class="footnote-item"><p>C.Leubner, A Framework for Segmentation and Contour Approximation in Computer-Vision Systems, 2002 <a href="#fnref6" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn7" class="footnote-item"><p>Contours: Getting Started, <a href="https://docs.opencv.org/trunk/d4/d73/tutorial_py_contours_begin.html">OpenCV Documentation</a> <a href="#fnref7" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn8" class="footnote-item"><p>Structural Analysis and Shape Descriptors, <a href="https://docs.opencv.org/master/d3/dc0/group__imgproc__shape.html#ga819779b9857cc2f8601e6526a3a5bc71">OpenCV Documentation</a> <a href="#fnref8" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn9" class="footnote-item"><p>Contours Hierarchy, <a href="https://docs.opencv.org/trunk/d9/d8b/tutorial_py_contours_hierarchy.html">OpenCV Documentation</a> <a href="#fnref9" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn10" class="footnote-item"><p>Shapiro, L. G. and Stockman, G. C, Computer Vision, London etc, 2001 <a href="#fnref10" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn11" class="footnote-item"><p>Bastan, M., Bukhari, S., and Breuel, T., Active Canny: Edge Detection and Recovery with Open Active Contour Models, Technical University of Kaiserslautern, 2016 <a href="#fnref11" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn12" class="footnote-item"><p>Maini, R. and Aggarwal, H., Study and Comparison of various Image Edge Detection Techniques, Internal Jounral of Image Processing (IJIP) <a href="#fnref12" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn13" class="footnote-item"><p><a href="https://github.com/onlyphantom/Canny-edge-detector/blob/master/nonmax_suppression.py">Example code for NMS, github</a> <a href="#fnref13" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
</ol>
</section>

      </div>
      <div class="md-sidebar-toc"><ul>
<li><a href="#definition">Definition</a></li>
<li><a href="#gradient-based-edge-detection">Gradient-based Edge Detection</a>
<ul>
<li><a href="#sobel-operator">Sobel Operator</a>
<ul>
<li><a href="#intuition-discrete-derivative">Intuition: Discrete Derivative</a></li>
<li><a href="#code-illustrations-sobel-operator">Code Illustrations: Sobel Operator</a></li>
<li><a href="#dive-deeper-gradient-orientation-magnitude">Dive Deeper: Gradient Orientation &amp; Magnitude</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#image-segmentation">Image Segmentation</a>
<ul>
<li><a href="#intensity-based-segmentation">Intensity-based Segmentation</a>
<ul>
<li><a href="#simple-thresholding">Simple Thresholding</a></li>
<li><a href="#adaptive-thresholding">Adaptive Thresholding</a></li>
</ul>
</li>
<li><a href="#edge-based-contour-estimation">Edge-based contour estimation</a>
<ul>
<li><a href="#contour-retrieval-and-approximation">Contour Retrieval and Approximation</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#canny-edge-detector">Canny Edge Detector</a>
<ul>
<li><a href="#edge-thinning">Edge Thinning</a></li>
<li><a href="#hysterisis-thresholding">Hysterisis Thresholding</a></li>
<li><a href="#learn-by-building">Learn By Building</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</div>
      <a id="sidebar-toc-btn">&#x2261;</a>
    
    
<script>

var sidebarTOCBtn = document.getElementById('sidebar-toc-btn')
sidebarTOCBtn.addEventListener('click', function(event) {
  event.stopPropagation()
  if (document.body.hasAttribute('html-show-sidebar-toc')) {
    document.body.removeAttribute('html-show-sidebar-toc')
  } else {
    document.body.setAttribute('html-show-sidebar-toc', true)
  }
})
</script>
      
  
    </body></html>

================================================
FILE: edgedetect/edgedetect.md
================================================
# Definition
An edge can be defined as boundary between regions in an image[^1]. Edge detection techniques we'll learn in this course builds upon what we've learned from our lessons in kernel convolution. It is the process of using kernels to reduce the information in our data and preserving only the necessary structural properties in our image[^1].

# Gradient-based Edge Detection
Gradient points in the direction of the most rapid increase in intensity. When we apply a gradient based edge detection method, we are searching for the maximum and minimum in the first derivative of the image. 

When we apply our convolution onto the image, we are finding for regions in the image where there's a sharp change in intensity or color. Arguably the most common edge detection method using this approach is the Sobel Operator. 

## Sobel Operator
The `Sobel` operator applies a filtering operation to produce an image output where the edge is emphasized. It convolves our original image using two 3x3 kernels to capture approximations of the derivatives in both the horizontal and vertical directions.

The x-direction and y-direction kernels would be: 

$$G_x = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1  \end{bmatrix}
 G_y = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1  \end{bmatrix}
$$

Each kernel is applied separately to obtain the gradient component in each orientation, $G_x$ and $G_y$. Expressed in formula, the gradient magnitude is:
$$|G| = \sqrt{G^2_x + G^2_y} $$

Where the slope $\theta$ of the gradient is calculated as follow:
$$\theta(x,y)=tan^{-1}(\frac{G_y}{G_x})$$

If the two formula above confuses you, read on as we unpack these ideas one at a time. 

### Intuition: Discrete Derivative
In computer vision literature, you'll often hear about "taking the derivative" and this may erve as a source of confusion for beginning practitioners since "derivatives" is often thought of in the context of a continuous function. Images are a 2D matrix of discrete values, so how do we wrap our head around the idea of finding derivative?

But why do we even bother with derivatives when this course is suppopsed to be about edge detection in images? 

![](assets/derivatives.png)

Among the many ways to answer the question, my favorite being that image is really just a function. When it treat an image as a function, the utility of taking derivatives become a little more obvious. In the image below, supposed you want to count the number of windows in this area of Venezia Sestiere Cannaregio, your program can look for large derivatives since there are sharp changes in pixel intensity from the windows to the surrounding wall:

![](assets/surface.png)

The code to generate the surface plot above is in `img2surface.py`.

Going back to our x-direction kernel in the Sobel Operator. 
This kernel has all 0 in the middle, which is quite easy to intuit about. Essentially, for each pixel in our image, we want to compute its derivative in the x-direction by approximating a formula that you may have come across in your calculus class:

$$f'(x) = \lim_{h\to0}\frac{f(x+h)-f(x)}{h}$$

This approximation is also called 'forward difference', because we're taking a value of $x$, and computing the difference in $f(x)$ as we increment it by a small amount forward, denoted as $h$. 

And as it turns out, using the 'central difference' to compute the derivative of our discrete signal can deliver better results[^2]:

$$f'(x) = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}$$

To make this more concrete, we can plug the formula into an actual array of pixels:

$$[0, 255, 65, \underline{180}, 255, 255, 255]$$

when we set $h=2$ at the center pixel (index of value 180), we have the following:

$$\begin{aligned}
f'(x) & = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}\\
& = \frac{f(x+1)-f(x-1)}{2} \\
& = \frac{255-65}{2} \\ 
& = 95 \end{aligned}$$

Notice that a large part of the calculation we just perform is synonymous to a 1D convolution operation using a $\begin{bmatrix} -1 & 0 &  1 \end{bmatrix}$ kernel. 

When the same 1x3 kernel $\begin{bmatrix} -1 & 0 &  1 \end{bmatrix}$ is applied on the right-most part of the image where its just white space ([..., 255, 255, 255]) the kernel would evaluate to 0. In other words, our derivative filter returns no response where it can't detect a sharp change in pixel intensity.

As a reminder, the x-direction kernel in our Sobel Operator is the following:
$$G_x = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1  \end{bmatrix}$$

This takes our 1x3 kernel and instead of convolving one row of pixels at a time, extends it to convolve at 3x3 neighborhoods at a time using a weighted average approach.

### Code Illustrations: Sobel Operator
The two kernels (one for horizontal and another for vertical edge detection) can be constructed, respectively, like the following:

```py
sobel_x = np.array([[1, 0, -1],
                    [2, 0, -2],
                    [1, 0, -1]])

sobel_y = np.array([[1, 2, 1],
                    [0, 0, 0],
                    [-1, -2, -1]])
```

You may have guessed that, given its role in digital image processing, `opencv` have included a method that performs our Sobel Operator for us, and thankfully there is. Here's an example of using the `cv2.Sobel(src, ddepth, dx, dy, dst=None, ksize)` method:

```py
gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)
print(f"Range: {np.min(gradient_x)} | {np.max(gradient_x)}")
# Range: -177.0 | 204.0

gradient_x = np.uint8(np.absolute(gradient_x))
gradient_y = np.uint8(np.absolute(gradient_y))
print(f"Range uint8: {np.min(gradient_x)} | {np.max(gradient_x)}")
# Range uint8: 0 | 204

cv2.imshow("Gradient X", gradient_x)
cv2.imshow("Gradient Y", gradient_y)
```
![](assets/sudokudemo.png)

The code above, extracted from `sobel_01.py` reinforces a couple of ideas that we've been working on. It shows that:
- the $G_x$ and $G_y$, gradients of the image, are computed separately through the convolution of two different Sobel kernels
- $G_x$ and $G_y$ responded to the change in pixel values along the x-direction and y-direction respectively, as visualized in the illustration above
- convolution using the two Sobel filters may, and often will, produce a value outside the range of 0 and 255. Given the presence of [-1, -2, -1]  in one side of our kernel, mathematically this may lead to an output value of -1020. To store the values from these convolutions we use a 64-bit floating point (`cv2.CV_64F`). OpenCV suggests to "keep the output datatype to some higher form such as `cv2.CV_64F`, take its absolute value and then convert back to `cv2.CV_8U`.[^3]"

While the code above certainly works, OpenCV also has a method that scales, calculates absolute values and converts the result to 8-bit. `cv2.convertScaleAbs(src, dst, alpha=1, beta=0)` performs the following:
$$dst(I) = cast<uchar>(|src(I) * \alpha + \beta|)$$

```py
gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)

gradient_x = cv2.convertScaleAbs(gradient_x)
gradient_y = cv2.convertScaleAbs(gradient_y)
print(f"Range: {np.min(gradient_x)} | {np.max(gradient_x)}")
```

### Dive Deeper: Gradient Orientation & Magnitude
At the beginning of this course I said that images are really just 2d functions before showing you the intricacies of our Sobel kernels. We saw the clever design of both the x- and y-direction kernels, by borrowing from the concept of "taking the derivatives" you often see in calculus text books. 

But on a really basic level, these kernels only return the x and y edge responses. These are **not the image gradient**, just pure arithmetic values from following the convolution process. To get to the final form (where the edges in our image are emphasized) we still need to compute the gradient direction and magnitude for each point in our image. 

This brings us back to our original formula. Recall that the x-direction and y-direction kernels are: 

$$G_x = \begin{bmatrix} 1 & 0 & -1 \\ 2 & 0 & -2 \\ 1 & 0 & -1  \end{bmatrix}
 G_y = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1  \end{bmatrix}
$$

We understand that each kernel is applied separately to obtain the gradient component in each orientation, $G_x$ and $G_y$. What is the significance of this? Well as it turns out if we know the shift in the x-direction and the corresponding change in value in the y-direction, then we can use the pythagorean theorem to approximate the "length of the slope", a concept that many of you are familiar with. 

Expressed in formula, the gradient magnitude is hence:
$$|G| = \sqrt{G^2_x + G^2_y} $$

Along with the well-known mathematical formula that is Pythagorean theorem, some of you may also have some familiarity with the three trigonometric functions. Particularly, the tangent function tells us that in a right triangle, the **tangent of an angle is the length of the opposite side divided by the length of the adjacent side**.

This leads us to the following expression:
$$tan(\theta_{(x,y)})=\frac{G_y}{G_x}$$

To rewrite the expression above, we arrive at the formula to capture the gradient's direction:
$$\theta_{(x,y)}=tan^{-1}(\frac{G_y}{G_x})$$

![](assets/2dfuncs.png)

This whole idea is also illustrated in code, and the script is provided to you: 
- `gradient.py` to generate the vector field in the picture above (right)
- `img2surface.py` on the penguin image in the `assets` folder generates the surface plot

Succinctly, supposed the two 3x3 kernels do not fire a response (for example when no edges are detected in the white background of our penguin), both $G_x$ and $G_y$ will be 0, which leads to a gradient magnitude of 0. You can compute these by hand, let OpenCV's implementation handle that for you, or use `numpy` as illustrated in `gradient.py`:

```py
dY, dX = np.gradient(img)
```

# Image Segmentation
Image segmentation is the process of decomposing an image into parts for further analysis. This has many utility:

- Background subtraction in human motion analysis
- Multi-object classification
- Find region of interest for OCR (optical character recognition)
- Count pedestrians from a streamed video source
- Isolating vehicle registration plates (license plate) and vehicle models from a busy highway scene

Current literature on image segmentation techniques can be classified into[^4]:
- Intensity-based segmentation
- Edge-based segmentation
- Region-based semantic segmentation

It's important to note, however, that the rise in popularity of deep learning framework and techniques has ushered a proliferation of new methods to perform what was once a highly difficult task. In future lectures, we'll explore image segmentation in far greater details. In this course, we'll study intensity-based segmentation and edge-based segmentation methods.

## Intensity-based Segmentation
Intensity-based method is perhaps the simplest as intensity is the simplest property that pixels can share. 

To make a more concrete case of this, let's assume you're working with a team of researchers to build an AI-based "sudoku solver" that, unimaginatively, will compete against human sudoku players in an attempt to further stake the claim in an ongoing debate of AI superiority. 

While your teammates work on the algorithmic design for the actual solver, your task is comparatively straightforward: write a script to scan newspaper images (or print magazines), binarize them to discard everything except the digits in the sudoku puzzle.

This presents a great opportunity to use an intensity-based segmentation technique we spoke about earlier.

In `intensitytresholding_01.py`, you'll find a code demonstration of the numerous thresholding methods provided by OpenCV. In total, there are 5 simple thresholding methods: `THRESH_BINARY`, `THRESH_BINARY_INV`, `THRESH_TRUNC`, `THRESH_TOZERO` and `THRESH_TOZERO_INV`[^5]. 

### Simple Thresholding
The method call between all of them are identical:
```py
cv2.threshold(img, thresh, maxval, type)
```
We specify our source image `img` (usually in grayscale), a threshold value `thresh` used to binarize the image pixels, and a max value `maxval` for the pixel value to use for any pixel that crosses our threshold. 

The mathematical functions for each one of them:
![](assets/threshmethods.png)

They're collectively known as **simple thresholding** in OpenCV because they use a global threshold value; Any pixels smaller than the threshold is set to 0 otherwise it is set to the `maxval` value. 

The probably sound too simplistic for anything beyond the simplest of real-world images, and for the majority of cases they are. They call for proper judgment of the task at hand. 

Applying the various types of simple thresholding method on our sudoku image, we observe that the digits are for the most part extracted successfully while the background information are greatly reduced:

![](assets/sudoku_simple.png)

Refer to`intensitythresholding_01.py` for the full code. 

As a simple homework, try to practice **simple thresholding** on the `car2.png` located in your `homework` folder. To reduce noise, you may have to combine a blurring operation prior to thresholding. As you practice, pay attention to the interaction between your threshold values and the output. Later in the course, you'll learn how to draw contours, which would come in handy in producing the final output:

![](assets/cars_hw.png)

As you work on your homework, you will notice that given the varying lighting condition across the different region of our image, regardless of the global value we pick we either have a threshold value that is too low or too high. 

### Adaptive Thresholding
Using a global value as an intensity threshold may work in particular cases but may be overly naive to perform well when, say, an image has different lighting conditions in different areas. A great example of this case is the object extraction exercise you performed using `car2.png`.

Adaptive thresholding is not a lot different from the aforementioned thresholding techniques, except it determines the threshold for each pixel based on its neighborhood. This in effect mans that the image is assigned different thresholds across the different regions, leading to a cleaner output when our image has different degrees of illumination.

![](assets/cars_adaptive.png)

The method is called with the source image (`src`), a max value (`maxValue`), the method (`adaptiveMethod`), a threshold type (`thresholdType`), the size of the neighborhood (`blockSize`) and a constant (`C`) that is subtracted from the mean or the weightted sum of the neighborhood pixels. 

```py
mean_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
gaussian_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)
```

The code, taken from `adaptivethresholding_01.py` produces the following:
![](assets/sudoku_binary.png)

## Edge-based contour estimation
Edge-based segmentation separates foreground objects by first identifying all edges in our image. Sobel Operator and other gradient-based filter function are good and well-known candidates for such an operation.[^6] 

Once we obtain the edges, we perform the contour approximation operation using the `findContours` method in OpenCV. But what exactly are contours?

In OpenCV's words[^7],
> Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.

If we have "a curve joining all the continuous points along the boundary", then we are able to extract this object. If we wish to count the number of contours in our image, the method also convenient return a list of all the found contours, making it easy to perform `len()` on the list to retrieve the final value.

There are three arguments to the `findContours()` function, first being the source image, second is the retrieval mode and last is the contour approximation method. Both the contour retrieval mode and approximation method is discussed in the next sub-section.
```py
(cnts, hierarchy) = cv2.findContours(
    img,
    cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE,
)
```
The function returns the contours and hierarchy, with contours being a list of all the contours in the image. Each contour is a Numpy array of `(x,y)` coordinates of boundary points of the object, giving each contour a shape of `(n, x, y)`.


What this allow us to do, is to combine the contours we retrieved with the `cv2.drawContours()` function either individually, exhaustively in a for-loop fashion, or just everything in one go.

Assuming `img` being the image we want to draw our contours on, the following code demonstrates these different methods:
```py
# draw all contours
cv2.drawContours(img, cnts, -1, (0,255,0), 3)
# draw the 3rd contour
cv2.drawContours(img, cnts, 2, (0,255,0), 3)
# draw the first, fourth and fifth contour
cnt_selected = [cnts[0], cnts[3], cnts[4]]
cv2.drawContours(canvas, cnt_selected, -1, (0, 255, 255), 1)
# draw the fourth contour
cv2.drawContours(img, contours, 3, (0,255,0), 3)
```
The first argument to this function being the source image, the second is the contours as a Python list, the third is the index of contours and remaining arguments are color and thickness of contour lines respectively.

One common problem beginners can run into is to perform the `findContours` operation on the grayscale image instead of the binary image, leading to poorer accuracy.

When we execute `contour_01.py`, we notice that the `drawContour` operation yields the following output:

![](assets/handholding.png)

There are 5 occurrences where our `findContours` function incorrectly approximated the wrong contour because two penguins were too close to each other. When we execute `len(cnts)`, we will find that the returned value is 5 less than the actual count. 

Try to fix `contour_01.py` by performing the contour approximation on our binary image using the thresholding technique you've learned in previous section.  

### Contour Retrieval and Approximation
In the `findContours()` function call, we passed our image to `src` in the first argumet. The second argument is the contour retrieval mode, and there are documentation for 4 of them[^8]:
- `RETR_EXTERNAL`: retrieves only the extreme outer contours (see image below for reference)
- `RETR_LIST`: retrieves all contours without establishing any hierarchical relationships
- `RETR_CCOMP`: retrieves all contours and organize them into a two-level hierarchy (external boundary + boundaries of the holes)
- `RETR_TREE`: retrieves all of the contours and reconstructs a full hierarchy of nested contours

![](assets/outervsall.png)

In our case, we don't particularly care about the hierarchy, and so the second to fourth method all has the same effect. In other cases, you may experiment with a different contour retrieval method to obtain both the contours and the hierarchy for further processing.

What about the last parameter passed to our `findContours` method? 

Recall that contours are just boundaries of a shape? In a sense, it is an array of `(x,y)` coordinates used to "record" the boundary of a shape. Given this collection of coordinates, we can then recreate the boundary of our shape. This begs the next question: how many set of coordinates do we need to store to recreate our boundary?

Supposed we perform the `findContour` operation on an image of two rectangles, one method it may use to achieve that is to store as many points around these rectangle boxes as possible? When we set `cv2.CHAIN_APPROX_NONE`, that is in fact what the algorithm would do, resulting in 658 points around the border of the top rectangle:
![](homework/equal.png)

However, notice the more efficient solution would have been to store only the 4 coordinates at each corner of the rectangle. The contour is perfectly represented and recreated using just 4 points for each rectangle, resulting in a total number of 8 points compared to 1,316 points. `cv2.CHAIN_APPROX_SIMPLE`[^9] is an implementation of this, and you can find the sample code below: 

![](assets/approx.png)

```py
cnts, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE,
    )
print(f"Cnts Simple Shape (1): {cnts[0].shape}")
# return: Cnts Simple Shape (1): (4, 1, 2)
# output of cnts[0]:
# array([[[ 47, 179]],
#       [[ 47, 259]],
#       [[296, 259]],
#       [[296, 179]]], dtype=int32)

cnts2, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_NONE,
    )
print(f"Cnts NoApprox Shape:{cnts2[0].shape}")
# Cnts NoApprox Shape:(658, 1, 2)
```
The full script for the experiment above is in `contourapprox.py`.

You may, at this point, hop to the Learn By Building section to attempt your homework.

# Canny Edge Detector
John Canny developed a multi-stage procedure that, some 30 years later, is "still a state-of-the-art edge detector"[^10]. Better edge detection algorithms usually require greater computational resources -- and consequently -- longer processing times -- or a greater number of parameters, in an area where algorithm speed is oftentimes the most important criteria. For the reasons above along with its general robustness, the canny edge algorithm has become one of the "most important methods to find edges" even in modern literature[^1].

I said it's a multi-stage procedure, because the technique as described in his original paper, _computational theory of edge detection_, works as follow[^11]:
1. Gaussian smoothing
    - Noise reduction using a 5x5 Gaussian filter
2. Compute gradient magnitudes and angles
3. Apply non-maximum suppression (NMS) 
    - Suppress close-by edges that are non-maximal, leaving only local maxima as edges
4. Track edge by hysteresis
    - Suppress all other edges that are weak and not connected to strong edges and link the edges

Step (1) and (2) in the procedure above can be achieved using code we've written so far in our Sobel Operator scripts. We use the Sobel mask filters to compute $G_x$ and $G_y$, respectively the gradient component in each orientation. We then compute the gradient magnitude and the angle $\theta$:

Gradient magnitude:
$$|G| = \sqrt{G^2_x + G^2_y} $$

And recall that the slope $\theta$ of the gradient is calculated as follow:
$$\theta(x,y)=tan^{-1}(\frac{G_y}{G_x})$$

## Edge Thinning
Step (3) in the procedure is another common technique in computer vision known as the non-maximum suppression (NMS). Let's begin by taking a look at the output of our Sobel edge detector from earlier exercises:
![](assets/sobeledges.png)

Notice as we zoom in on the output image, we can see the gradient-based method did create our strong edges, but it also created "weak" edges it find in our image. Because it is not a parameterized function -- the edge is computed using values of the gradient magnitude and direction -- we have to rely on an additional mechanism for the edge thinning operation with the criterion being one accurate response to any given edge[^12].

Non-maximum suppression help us obtain the strongest edge by suppressing all the gradient values, i.e. setting them to 0 except for the local maxima, which indicate locations with the sharpest change of intensity value. In the words of `OpenCV`:
> After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a local maximum in its neighborhood in the direction of gradient. If point A is on the edge, and point B and C are in gradient directions, point A is checked with point B and C to see if it forms a local maximum. If so, it is considered for next stage, otherwise, it is suppressed (put to zero).

The output of step (3) is a binary image with thin edges.

The code[^13] demonstrates how you would code such an NMS for the purpose of canny edge detection. 

## Hysteresis Thresholding
The final step of this multi-stage algorithm decides which among all edges are really edges and which of them are not. It accomplishes this using two threshold values, specified when we call the `cv2.Canny()` function:

```py
canny = cv2.Canny(img, threshold1=50, threshold2=180)
```

Any edges with an intensity gradient above `threshold2` are considered edges and any edges below `threshold1` are considered non-edges and so are suppressed. 

The edges that lie between these two values (in our code above, edges with intensity gradient between 50 and 180) are classified as edges **if they are connected to sure-edge pixels** (the ones above 180) otherwise they are also discarded.

This stage also removes small pixels ("noises") on the assumption that edges are long lines ("connected").

The full procedure is implemented in a single function, `cv2.Canny()` and the first three parameters are required, respectively being the input image, the first and second threshold value. `canny_01.py` implements this and compare that to the Sobel Edge detector we developed earlier:

![](assets/sobelvscanny.png)

## Learn By Building
In the `homework` directory, you'll find a picture of scattered lego bricks `lego.jpg`. Exactly the kind of stuff you don't want on your bedroom floor, as anyone living with kids at home would testify. 

Your job is to apply what you've learned in this lesson to combine what you've learned from the class in kernel convolutions and Edge Detection (`kernel.md`) to build a lego brick counter.

Note that there are many ways you can build an edge detection. Given what you've learned so far, there are at least 3 equally adequate routines you can apply for this particular problem set. 

For the sake of this exercise, your script should feature the use of a Sobel Operator (or a similar gradient-based edge detection method) since this is the main topic of this chapter. 

![](homework/lego.jpg)


## References
[^1]: S.Kaur, I.Singh, Comparison between Edge Detection Techniques, International Journal of Computer Applications, July 2016

[^2]: Carnegie Mellon University, Image Gradients and Gradient Filtering (16-385 Computer Vision) 

[^3]: Image Gradients, [OpenCV Documentation](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_gradients/py_gradients.html)

[^4]: University of Victoria, Electrical and Computer Engineering, Computer Vision: Image Segmentation

[^5]: Image Thresholding, [OpenCV Documentation](https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html)

[^6]: C.Leubner, A Framework for Segmentation and Contour Approximation in Computer-Vision Systems, 2002

[^7]: Contours: Getting Started, [OpenCV Documentation](https://docs.opencv.org/trunk/d4/d73/tutorial_py_contours_begin.html)

[^8]: Structural Analysis and Shape Descriptors, [OpenCV Documentation](https://docs.opencv.org/master/d3/dc0/group__imgproc__shape.html#ga819779b9857cc2f8601e6526a3a5bc71)

[^9]: Contours Hierarchy, [OpenCV Documentation](https://docs.opencv.org/trunk/d9/d8b/tutorial_py_contours_hierarchy.html)

[^10]: Shapiro, L. G. and Stockman, G. C, Computer Vision, London etc, 2001

[^11]: Bastan, M., Bukhari, S., and Breuel, T., Active Canny: Edge Detection and Recovery with Open Active Contour Models, Technical University of Kaiserslautern, 2016

[^12]: Maini, R. and Aggarwal, H., Study and Comparison of various Image Edge Detection Techniques, Internal Jounral of Image Processing (IJIP)

[^13]: [Example code for NMS, github](https://github.com/onlyphantom/Canny-edge-detector/blob/master/nonmax_suppression.py)


================================================
FILE: edgedetect/gaussianblur_01.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/canal.png")

meanblurred = cv2.blur(img, KERNEL_SIZE)
gaussianblurred = cv2.GaussianBlur(src=img, ksize=KERNEL_SIZE, sigmaX=0)

cv2.imshow("Mean Blurred", meanblurred)
cv2.waitKey(0)
cv2.imshow("Gaussian Blurred", gaussianblurred)
cv2.waitKey(0)


================================================
FILE: edgedetect/gradient.py
================================================
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import cv2

img = cv2.imread("assets/pen.jpg")
flat = (img[:, :, 0] + img[:, :, 1] + img[:, :, 2]) / 3
print(flat.shape)
sa = 16  # sample at every 16


fig, ax = plt.subplots(1, 1)
ret = ax.imshow(
    flat, zorder=0, alpha=1.0, cmap="Greys_r", origin="upper", interpolation="hermite",
)
plt.colorbar(ret)
Y, X = np.mgrid[0 : flat.shape[0] : sa, 0 : flat.shape[1] : sa]
dY, dX = np.gradient(flat[::sa, ::sa])
ax.quiver(X, Y, dX, dY, color="r")
plt.show()


================================================
FILE: edgedetect/img2surface.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

img = cv2.imread("assets/sarpi.png")
blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.cvtColor(blurred, cv2.COLOR_BGR2GRAY)


size = 0.3
width = int(blurred.shape[1] * size)
height = int(blurred.shape[0] * size)
blurred = cv2.resize(blurred, (width, height), interpolation=cv2.INTER_AREA)

print(f"Shape:{blurred.shape}")
cv2.imshow("Blurred", blurred)
cv2.waitKey(0)

xx, yy = np.mgrid[0 : blurred.shape[0], 0 : blurred.shape[1]]

fig = plt.figure()
ax = fig.gca(projection="3d")
ax.plot_surface(xx, yy, blurred, rstride=1, cstride=1, cmap=plt.cm.gray, linewidth=0)

plt.show()


================================================
FILE: edgedetect/intensitythresholding_01.py
================================================
import cv2
import matplotlib.pyplot as plt
import numpy as np

img = cv2.imread("assets/sudoku.jpg", flags=0)

_, img_threshold = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)
_, img_trunc = cv2.threshold(img, 90, 255, cv2.THRESH_TRUNC)
_, img_tozero = cv2.threshold(img, 55, 255, cv2.THRESH_TOZERO_INV)

plt.subplot(2, 2, 1), plt.imshow(img, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(img_threshold, cmap="gray")
plt.title("Binary Threshold"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(img_trunc, cmap="gray")
plt.title("To Zero Threshold"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(img_tozero, cmap="gray")
plt.title("To Zero"), plt.xticks([]), plt.yticks([])
plt.show()


================================================
FILE: edgedetect/kernel.html
================================================
<!DOCTYPE html><html><head>
      <title>kernel</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      
        <script type="text/x-mathjax-config">
          MathJax.Hub.Config({"extensions":["tex2jax.js"],"jax":["input/TeX","output/HTML-CSS"],"messageStyle":"none","tex2jax":{"processEnvironments":false,"processEscapes":true,"inlineMath":[["$","$"],["\\(","\\)"]],"displayMath":[["$$","$$"],["\\[","\\]"]]},"TeX":{"extensions":["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]},"HTML-CSS":{"availableFonts":["TeX"]}});
        </script>
        <script type="text/javascript" async src="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.0/node_modules/@shd101wyy/mume/dependencies/mathjax/MathJax.js" charset="UTF-8"></script>
        
      
      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview h1,
.markdown-preview.markdown-preview h2,
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  font-weight: bolder;
  text-decoration-line: underline;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <div><h1 class="mume-header" id="kernels">Kernels</h1>

<h2 class="mume-header" id="definition">Definition</h2>

<p>When performing an arithmetic computation on a given image, one approach is to apply said computation in a neighborhood-by-neighborhood manner. This approach is very braodly termed as a <strong>convolution</strong>. In other words, convolution is an operation between every part of an image (&quot;pixel neighborhood&quot;) and an operator (&quot;kernel&quot;)<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup><sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>.</p>
<p>As the computation slides over each pixel neighborhood, we perform some arithmetic using the kernel, with the kernel typically being represented as a matrix or a fixed size array.</p>
<p>This kernel describes how the pixels in that neighborhood are combined or transformed to yield a corresponding output.</p>
<ul>
<li class="task-list-item">
<p><input type="checkbox" class="task-list-item-checkbox"> <a href="https://www.youtube.com/watch?v=WMmHcrX4Obg">Watch Kernel Convolution Explained Visually</a></p>
  <iframe width="560" height="315" src="https://www.youtube.com/embed/WMmHcrX4Obg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</li>
</ul>
<h3 class="mume-header" id="mathematical-definitions">Mathematical Definitions</h3>

<p>You will notice from the video that the output image now has a <strong>shape that is smaller</strong> than the original input. Mathematically, the shape of this output would be:</p>
<p></p><div class="mathjax-exps">$$(\frac{X_m-M_i}{s_x})+1, (\frac{X_n-M_j}{s_y})+1$$</div><p></p>
<p>Where the input matrix has a size of <span class="mathjax-exps">$(X_m, X_n)$</span>, the kernel <span class="mathjax-exps">$M$</span> is of size <span class="mathjax-exps">$(M_i, M_j)$</span>, <span class="mathjax-exps">$s_x$</span> represents the stride over rows while <span class="mathjax-exps">$s_y$</span> represents the stride over columns.</p>
<p>In the linked video, we are sliding the kernel on both the x- and y- direction by 1 pixel at a time after each computation, giving a value of 1 for <span class="mathjax-exps">$s_x$</span> and <span class="mathjax-exps">$s_y$</span>. The input matrix in our video is of size 5, and our kernel is of size 3x3, giving us an output size of:</p>
<p></p><div class="mathjax-exps">$$(\frac{5-3}{1}+1, \frac{5-3}{1}+1)$$</div><p></p>
<p>Expressed mathematically, the full procedure as implemented in <code>opencv</code>looks like this for a convolution:</p>
<p><span class="mathjax-exps">$H(x, y) = \sum^{M_i-1}_{i=0}\sum^{M_j-1}_{j=0} I(x+i-a_i, y+j-a_j)K(i,j)$</span></p>
<p>We&apos;ll see the step-by-step given a kernel represented by matrix M:</p>
<p></p><div class="mathjax-exps">$$M = \begin{bmatrix} 1 &amp; 2 &amp; 0 \\ -1 &amp; 3 &amp; 0 \\ 0 &amp; -1 &amp; 0  \end{bmatrix}$$</div><p></p>
<ol>
<li>
<p>Place the kernel anchor (in this case, <span class="mathjax-exps">$3$</span>) on top of a determined pixel, with the rest of the kernel overlaying the corresponding local pixels in the image</p>
<ul>
<li>Typically the kernel anchor is the <em>central</em> of the kernel</li>
<li>Typically the &quot;determined pixel&quot; at the first step is the most upperleft region of the image</li>
</ul>
</li>
<li>
<p>Multiply the kernel coefficients by the corresponding image pixel values and sum the result</p>
</li>
<li>
<p>Replace the value at the location of the <em>anchor</em> in the input image with the result</p>
</li>
<li>
<p>Repeat the process for all pixels by sliding the kernel across the entire image, as specified by the stride</p>
</li>
</ol>
<h4 class="mume-header" id="a-note-on-padding">A Note on Padding</h4>

<p>Keen readers may observe from executing <code>meanblur_02.py</code> that the original dimension of our image is preserved <em>after</em> the convolution. This may seem unexpected given what we know about the formula to derive the output dimension.<br>
As it turns out, to preserve the dimension between the input and output images, a common technique known as &quot;padding&quot; is applied. From the documentation itself,</p>
<blockquote>
<p>For example, if you want to smooth an image using a Gaussian 3 * 3 filter, then, when processing the left-most pixels in each row, you need pixels to the left of them, that is, outside of the image. You can let these pixels be the same as the left-most image pixels (&#x201C;replicated border&#x201D; extrapolation method), or assume that all the non-existing pixels are zeros (&#x201C;constant border&#x201D; extrapolation method), and so on.</p>
</blockquote>
<p>The various border interpolation techniques available in <code>opencv</code> are as below (image boundaries are denoted with &apos;|&apos;):</p>
<ul>
<li>BORDER_REPLICATE:
<ul>
<li><code>aaaaaa|abcdefgh|hhhhhhh</code></li>
</ul>
</li>
<li>BORDER_REFLECT:
<ul>
<li><code>fedcba|abcdefgh|hgfedcb</code></li>
</ul>
</li>
<li>BORDER_REFLECT_101:
<ul>
<li><code>gfedcb|abcdefgh|gfedcba</code></li>
</ul>
</li>
<li>BORDER_WRAP:
<ul>
<li><code>cdefgh|abcdefgh|abcdefg</code></li>
</ul>
</li>
<li>BORDER_CONSTANT:
<ul>
<li><code>iiiiii|abcdefgh|iiiiiii</code>  with some specified &apos;i&apos;</li>
</ul>
</li>
</ul>
<p>It is useful to remember that OpenCV only supports convolving an image where the dimension of its output matches that of the input, so in almost all cases we need a way to extrapolate an extra layer of pixels around the borders. To specify an extrapolation method, supply the filtering method with an extra argument:</p>
<ul>
<li><code>cv2.GaussianBlur(..., borderType=BORDER_CONSTANT)</code></li>
</ul>
<p>Given what we&apos;ve just learned, we can rewrite our formula to determine the output dimensions more generally and this time incorporating the padding technique:</p>
<p></p><div class="mathjax-exps">$$(\frac{X_m - M_i + 2P_i}{s_x})+1, (\frac{X_n-M_j + 2P_j}{s_y})+1$$</div><p></p>
<h5 class="mume-header" id="dive-deeper">Dive Deeper</h5>

<p>Before moving on to the next section, try and think through the following problem:</p>
<p>In the case on a 333x333 input image, with a strides of 1 using a kernel of size 5*5, what is the amount of zero-padding you should add to the borders of your image such that the output image is also 333x333?</p>
<ul>
<li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox"> Done, I&apos;ve understood the convolution operation!</li>
</ul>
<h2 class="mume-header" id="smoothing-and-blurring">Smoothing and Blurring</h2>

<p>To fully appreciate the idea of kernel convolutions, we&apos;ll see some real examples. We&apos;ll use the <code>cv2.filter2D</code> to convolve over our image using the following kernel:</p>
<p></p><div class="mathjax-exps">$$K = \frac{1}{5\cdot5} \begin{bmatrix} 1 &amp; 1 &amp; 1 &amp; 1 &amp; 1 \\ 1 &amp; 1 &amp; 1 &amp; 1 &amp; 1 \\ 1 &amp; 1 &amp; 1 &amp; 1 &amp; 1 \\ 1 &amp; 1 &amp; 1 &amp; 1 &amp; 1  \\ 1 &amp; 1 &amp; 1 &amp; 1 &amp; 1  \end{bmatrix}$$</div><p></p>
<p>The kernel we specified above is equivalent to a <em>normalized box filter</em> of size 5. Having watched the video earlier, you may intuit that the outcome of such a convolution is that each pixel in the input image is replaced by the average of the 5x5 pixels around it. You are in fact correct. If you are skeptical and would rather see proof of it, we&apos;ll see proof of this in the <a href="#code-illustrations-mean-filtering">Code Illustrations: Mean Filtering</a> section of this coursebook.</p>
<p>Mathematically, by dividing our matrix by 25 (normalizing) we apply a control that stop our pixel values from being artificially increased since each pixel is now the weighted sum of its neighborhood.</p>
<blockquote>
<h4>A Note on Terminology</h4>
<h5>Kernels or Filters?</h5>
<p>When all we&apos;ve been talking about is kernels, why is it that we&apos;re using the &quot;filter&quot; terminology in <code>opencv</code> code instead? That depends on the context. In the case of a convolutional neural network, <em>kernel</em> and <em>filters</em> are used interchangably: they both refer to the same thing.<br>
Some computer vision researchers have proposed to use a stricter definition, prefering to use the term &quot;kernel&quot; for a 2D array of weights, like our matrix above, and the term &quot;filter&quot; for the 3D structure of multiple kernels stacked together<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>, a concept we&apos;ll explore further in the Convolutional Neural Network part of this course.</p>
<h5>Correlations vs Convolutions</h5>
<p>Imaging specialists may point to the fact that <code>opencv</code> does not mirror / flip the kernel around the anchor point and hence doesn&apos;t qualify as a convolution under strict definitions of digital imaging theory. For a pure implementation of a &quot;convolution&quot;, you should instead <code>scipy.ndimage.convolve(src, kernel)</code> instead or use <code>cv2.filter2D</code> in conjunction with a <code>flip</code> on the kernel<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>. This is in large part owed to the difference in scientific parlance adopted by the various scientific communities, a phenomenon more common than you&apos;d expect. As an additional example, deep learning scientists usings convolutional neural network (CNN) generally refer to a non-flipped kernel when performing convolution.</p>
</blockquote>
<h4 class="mume-header" id="code-illustrations-mean-filtering">Code Illustrations: Mean Filtering</h4>

<ol>
<li><code>meanblur_01.py</code> demonstrates the construction of a 5x5 mean average filter using <code>np.ones((5,5))/25</code>. Because every coefficient is basically the same, this merely replaces the value of each pixel in our input image with the average of the values in its 5x5 neighborhood.</li>
</ol>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/canal.png&quot;</span><span class="token punctuation">)</span>
mean_blur <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> dtype<span class="token operator">=</span><span class="token string">&quot;float32&quot;</span><span class="token punctuation">)</span> <span class="token operator">*</span> <span class="token punctuation">(</span><span class="token number">1.0</span> <span class="token operator">/</span> <span class="token punctuation">(</span><span class="token number">5</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
smoothed_col <span class="token operator">=</span> cv2<span class="token punctuation">.</span>filter2D<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> mean_blur<span class="token punctuation">)</span>
</pre><p>Alternatively, we can be explicit in our creation of the 5x5 kernel using <code>numpy</code>&apos;s array:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">mean_blur <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span>
<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    <span class="token punctuation">[</span><span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    <span class="token punctuation">[</span><span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    <span class="token punctuation">[</span><span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
    <span class="token punctuation">[</span><span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">,</span> <span class="token number">0.04</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</pre><ol start="2">
<li>
<p>To be fully convinced that the mean filtering operation is doing what we expect it to do, we can inspect the pixel values before and after the convolution, to verify that the math checks out by hand. We do this in <code>meanblur_02.py</code>.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/canal.png&quot;</span><span class="token punctuation">)</span>
gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>cvtColor<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>COLOR_BGR2GRAY<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;Gray: </span><span class="token interpolation"><span class="token punctuation">{</span>gray<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token punctuation">:</span><span class="token format-spec">5]</span><span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
<span class="token comment"># [[ 31  27  21  17  21]</span>
<span class="token comment"># [ 77  85  86  87  90]</span>
<span class="token comment"># [205 205 215 227 222]</span>
<span class="token comment"># [224 230 222 243 249]</span>
<span class="token comment"># [138 210 206 218 242]]</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    newval <span class="token operator">=</span> np<span class="token punctuation">.</span><span class="token builtin">round</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span>mean<span class="token punctuation">(</span>gray<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">5</span><span class="token punctuation">,</span> i<span class="token punctuation">:</span>i<span class="token operator">+</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;Mean of 25x25 pixel #</span><span class="token interpolation"><span class="token punctuation">{</span>i<span class="token operator">+</span><span class="token number">1</span><span class="token punctuation">}</span></span><span class="token string">: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">int</span><span class="token punctuation">(</span>newval<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
<span class="token comment"># output:</span>
<span class="token comment"># Mean of 25x25 pixel #1: 152</span>
<span class="token comment"># Mean of 25x25 pixel #2: 158</span>
<span class="token comment"># Mean of 25x25 pixel #3: 160</span>
</pre><p>The code above shows that the output of such a convolution operation beginning at the top-left region of the image would be 152. As we slide along the horizontal direction and re-compute the mean of the neighborhood, we get 158. As we slide our kernel along the horizontal direction for a second time and re-compute the mean of the neighborhood we obtain the value of 160.</p>
<p>If you prefer you can verify these values by hand, using the raw pixel values from <code>gray[:5, :5]</code> (5x5 top-left region of the image).</p>
<pre data-role="codeBlock" data-info="py" class="language-python">mean_blur <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span>KERNEL_SIZE<span class="token punctuation">,</span> dtype<span class="token operator">=</span><span class="token string">&quot;float32&quot;</span><span class="token punctuation">)</span> <span class="token operator">*</span> <span class="token punctuation">(</span><span class="token number">1.0</span> <span class="token operator">/</span> <span class="token punctuation">(</span><span class="token number">5</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
smoothed_gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>filter2D<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> mean_blur<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;Smoothed: </span><span class="token interpolation"><span class="token punctuation">{</span>smoothed_gray<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token punctuation">:</span><span class="token format-spec">5]</span><span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
<span class="token comment"># output:</span>
<span class="token comment"># [[122 123 125 127 128]</span>
<span class="token comment"># [126 127 128 131 132]</span>
<span class="token comment"># [148 149 152 158 160]</span>
<span class="token comment"># [177 179 184 196 202]</span>
<span class="token comment"># [197 199 204 222 229]]</span>
</pre><p>Notice that from the output of our mean-filter, the first anchor (center of the neighborhood) has transformed from 215 to 152, and the one to the right of it has transformed from 227 to 158, and so on. The math does work out and you can observe the blur effect directly by running <code>meanblur02.py</code>.</p>
</li>
<li>
<p>As it turns out, <code>opencv</code> provides a set of convenience functions to apply filtering onto our images. All the three approaches below yield the same output, as can be verified from the output pixel values after executing <code>meanblur_03.py</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># approach 1</span>
mean_blur <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span>KERNEL_SIZE<span class="token punctuation">,</span> dtype<span class="token operator">=</span><span class="token string">&quot;float32&quot;</span><span class="token punctuation">)</span> <span class="token operator">*</span> <span class="token punctuation">(</span><span class="token number">1.0</span> <span class="token operator">/</span> <span class="token punctuation">(</span><span class="token number">5</span> <span class="token operator">**</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
smoothed_gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>filter2D<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> mean_blur<span class="token punctuation">)</span> 

<span class="token comment"># approach 2</span>
smoothed_gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>blur<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> KERNEL_SIZE<span class="token punctuation">)</span>

<span class="token comment"># approach 3</span>
smoothed_gray <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boxFilter<span class="token punctuation">(</span>gray<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> KERNEL_SIZE<span class="token punctuation">)</span>
</pre></li>
</ol>
<p>There are several types of kernels we can apply to achieve a blur filter on our image. The averaging filter method serves as a good introductory point because it is easy to intuit about, but it is good to know that <code>opencv</code> provides a collection of convenience functions, each being an implementation of some blurring filter. See <a href="#handy-kernels-for-image-processing">Handy kernels for image processing</a> for a list of smoothing kernels implemented in <code>opencv</code>.</p>
<h2 class="mume-header" id="role-in-convolutional-neural-networks">Role in Convolutional Neural Networks</h2>

<p>Earlier, it was said that kernels play a play integral role in all modern convolutional neural networks architecture. Using TensorFlow, one will rely on the <code>tf.nn.conv2d</code> function to perform a 2D convolution. The syntax looks like this:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">tf<span class="token punctuation">.</span>nn<span class="token punctuation">.</span>conv2d<span class="token punctuation">(</span>
    <span class="token builtin">input</span><span class="token punctuation">,</span>
    <span class="token builtin">filter</span><span class="token punctuation">,</span>
    strides<span class="token punctuation">,</span>
    padding<span class="token punctuation">,</span>
    use_cudnn_on_gpu<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">,</span>
    data_format<span class="token operator">=</span><span class="token boolean">None</span><span class="token punctuation">,</span>
    name<span class="token operator">=</span><span class="token boolean">None</span>   
<span class="token punctuation">)</span>
</pre><p>Where:</p>
<ul>
<li><code>input</code> is assumed to be a tensor of shape <code>(batch, height, width, channels)</code> where <code>batch</code> is the number of images in a minibatch</li>
<li><code>filter</code> is a tensor of shape <code>(filter_height, filter_width, channels, out_channels)</code> that specifies the learnable weights for the nonlinear transformation learned in the convoliutional kernel</li>
<li><code>strides</code> contains the filter strides and is a list of length 4 (one for each input dimension)</li>
<li><code>padding</code> determines whether the input tensors are padded (with extra zeros) to guarantee the output <em>from the convolutional layer</em> has the same shape as the input. <code>padding=&quot;SAME&quot;</code> adds padding to the input and <code>padding=&quot;VALID&quot;</code> results in no padding</li>
</ul>
<p>Worthy to note is that the <code>input</code> and <code>filters</code> parameters follow what we&apos;ve implemented using <code>opencv</code> thus far. When we&apos;re applying a filter like the mean blur example earlier, we slide our kernel along the <code>stride</code> of 1. In TensorFlow code, we would have set <code>strides=[1,1,1,1]</code> such that the kernel would slide by 1 unit across all 4 dimensions (x, y, channel, and image index).</p>
<p>Example of a Convolutional Neural Network architecture<sup class="footnote-ref"><a href="#fn5" id="fnref5">[5]</a></sup>:<br>
<img src="assets/c6archit.png" alt></p>
<p>Notice from the image that the dimension of our output from the first convolution layer is smaller (28x28) than its input (32x32) when we perform the operation without padding. <code>C1</code> and <code>C3</code> are examples of this in the above illustration.</p>
<p>In <code>S1</code> and <code>S2</code>, we&apos;re applying a max-pooling filter to down-sample our image representation, allowing our network to learn the parameters from the higher-order representations in each region of the image. An example operation is depicted below:</p>
<p><img src="assets/c6pooling.png" alt></p>
<h2 class="mume-header" id="handy-kernels-for-image-processing">Handy Kernels for Image Processing</h2>

<ul>
<li>Averaging Filter: <code>cv2.blur(img, KERNEL_SIZE)</code>
<ul>
<li>As seen in <code>meanblur_03.py</code>, replace each pixel with the <strong>mean</strong> of its neighboring pixels</li>
</ul>
</li>
<li>Median Filter: <code>cv2.medianBlur(img, KERNEL_SIZE)</code>
<ul>
<li>Replace each pixel with the <strong>median</strong> of its neighboring pixels</li>
</ul>
</li>
<li>Gaussian Filter: <code>cv2.GaussianBlur(img, KERNEL_SIZE, 0)</code></li>
<li>Bilateral Filter: <code>cv2.bilateralFilter(img, d, sigmaColor, sigmaSpace)</code>
<ul>
<li>An edge-preserving smoothing that aims to keep edges sharp</li>
</ul>
</li>
</ul>
<h4 class="mume-header" id="gaussian-filtering">Gaussian Filtering</h4>

<p>Gaussian filter deserves its own section given its prevalence in image processing, and is achieved by convolving each point in the input array (read: each pixel in our image) with a <em>Gaussian kernel</em> and take the sum of them to produce the output array.</p>
<p>If you remember your lessons from statistics, you may recall a 1D gaussian distribution looks like this:<br>
<img src="assets/normaldist.png" style="width: 50%; margin-left:20%;"></p>
<p>For completeness&apos; sake, the code to graph the distribution above is in <code>utils/gaussiancurve.r</code>.</p>
<p>For a 1-dimensional image, the pixel located in the middle would be assigned the largest weight, with the weight of its neighbours decreasing as the spatial distance between them and the center pixel increases.</p>
<p>For the mathematically inclined, the graphed distribution above is generated from the Gaussian function<sup class="footnote-ref"><a href="#fn6" id="fnref6">[6]</a></sup>:</p>
<p></p><div class="mathjax-exps">$$g(x) = e^{\frac{-x^2}{2\sigma^2}}$$</div><p></p>
<p>Where <span class="mathjax-exps">$x$</span> is the spatial distance between the center pixel and the corresponding neighbor unit.</p>
<p>For a 1D kernel of size 7, each pixel would therefore be weighted accordingly:</p>
<p></p><div class="mathjax-exps">$$g(x) = \begin{bmatrix}.011 &amp; .13 &amp; .6 &amp; 1 &amp; .6 &amp; .13 &amp; .011\end{bmatrix}$$</div><p></p>
<p>The above should not be hard to intuit about, as if we refer back to the graphed distribution we can see that the center pixel (at position x=0) the <span class="mathjax-exps">$g(x)$</span> would evaluate to a value of <span class="mathjax-exps">$1$</span>.</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token keyword">import</span> numpy <span class="token keyword">as</span> np
weights <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
sd <span class="token operator">=</span> <span class="token number">1</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    weights <span class="token operator">+=</span> <span class="token punctuation">[</span>np<span class="token punctuation">.</span><span class="token builtin">round</span><span class="token punctuation">(</span>np<span class="token punctuation">.</span>exp<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token operator">-</span>i<span class="token operator">**</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token operator">/</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token operator">*</span>sd<span class="token operator">**</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">]</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>weights<span class="token punctuation">)</span>
<span class="token comment"># output:</span>
<span class="token comment"># [1.0, 0.607, 0.135, 0.011]</span>
</pre><p>For a 2D kernel, the formula would take the form of:<br>
</p><div class="mathjax-exps">$$g(x,y) = e^{\frac{-(x^2+y^2)}{2\sigma^2}}$$</div><p></p>
<p>When we compare the output of a mean filter to a gaussian filter, as in the example script in <code>gaussianblur_01.py</code>, we can then observe the difference in output visually:</p>
<p><img src="assets/meanvsgaussian.png" alt></p>
<p>This should also come as little surprise, since the mean filter just replace each pixels with the average values of its neighboring pixels, essentially giving a coefficient of 1 (without normalized) to a grid of 5x5 pixels.</p>
<p>Where on the other hand, gaussian filters <strong>weigh pixels using a gaussian distribution</strong> (think: bell curve in a 2d space) around the center pixel such that farther pixels are given a lower coefficient than nearer ones.</p>
<h4 class="mume-header" id="sharpening-kernels">Sharpening Kernels</h4>

<p>The opposite of blurring would be sharpening. There are again several approaches to this, and we&apos;ll start by looking at specifically two of them.</p>
<p>The first approach relies on the familiar <code>cv2.filter2D()</code> function to perform the following kernel and is implemented in <code>sharpening_01.py</code>:<br>
</p><div class="mathjax-exps">$$K = \begin{bmatrix} -1 &amp; -1 &amp; -1 \\ -1 &amp; 9 &amp; -1 \\ -1 &amp; -1 &amp; -1 \end{bmatrix}$$</div><p></p>
<p>The outcome:<br>
<img src="assets/sharpen.png" alt></p>
<h5 class="mume-header" id="approximate-gaussian-kernel-for-sharpening">Approximate Gaussian Kernel for Sharpening</h5>

<p>We can apply the same principles behind a Gaussian kernel for sharpening operations (as opposed to blurring). The full script is in <code>sharpening_02.py</code> but the essential parts are as follow:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">approx_gaussian <span class="token operator">=</span> <span class="token punctuation">(</span>
    np<span class="token punctuation">.</span>array<span class="token punctuation">(</span>
        <span class="token punctuation">[</span>
            <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">8</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
            <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
        <span class="token punctuation">]</span>
    <span class="token punctuation">)</span><span class="token operator">/</span> <span class="token number">8.0</span>
<span class="token punctuation">)</span>
sharpen_col <span class="token operator">=</span> cv2<span class="token punctuation">.</span>filter2D<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> approx_gaussian<span class="token punctuation">)</span>
</pre><p>Notice how this method uses an approximate Gaussian kernel and that the result is an overall more natural smoothing:<br>
<img src="assets/gaussiansharpen.png" alt></p>
<h5 class="mume-header" id="unsharp-masking">Unsharp Masking</h5>

<p>The second approach is known as &quot;unsharp masking&quot;, derived from that fact that the technique uses a blurred, or &quot;unsharp&quot;, negative image to create a mask of the original image<sup class="footnote-ref"><a href="#fn7" id="fnref7">[7]</a></sup>. This technique is one of the oldest tool in photographic processing (tracing back to 1930s) and popular tools such as Adobe Photoshop and GIMP have direct implementations of it named, appropriately, Unsharp Mask.</p>
<p>Lifted straight from the Wikipedia article itself, a &quot;typical blending formula for unsharp masking is <strong>sharpened = original + (original - blurred) * amount</strong>&quot;. <strong>Amount</strong> represents how much contrast is added to the edges.</p>
<p>To rewrite the formula, we get:<br>
</p><div class="mathjax-exps">$$\begin{aligned} Sharpened &amp; = O + (O-B) \cdot a \\ &amp; = O + Oa - Ba \\ &amp; = O (1+a) + B(-a)\end{aligned}$$</div><p></p>
<p>Where <span class="mathjax-exps">$a$</span> is the amount, <span class="mathjax-exps">$B$</span> is the blurred image (mask) and <span class="mathjax-exps">$O$</span> is the original image. The final form is convenient because we can plug it into <code>cv2.addWeighted</code> and get an output. From OpenCV&apos;s documentation, the function <code>addWeighted</code> calculates the weighted sum of two arrays as follows:<br>
</p><div class="mathjax-exps">$$dst(I) = saturate(src1(I) * alpha + src2(I) * beta + gamma)$$</div><p></p>
<p>When you perform the arithmetic above, you will find that the values (eg. <code>src1(I) * alpha</code> when alpha is &gt; 1.5 will produce values greater than 255) may fall outside the range of 0 and 255. Saturation clips the value in a way that is synonymous to the following:</p>
<p></p><div class="mathjax-exps">$$Saturate(x) = min(max(round(r), 0), 255)$$</div><p></p>
<p>The following code demonstrates the unsharp masking technique:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/sarpi.png&quot;</span><span class="token punctuation">)</span>

amt <span class="token operator">=</span> <span class="token number">1.5</span>
blurred <span class="token operator">=</span> cv2<span class="token punctuation">.</span>GaussianBlur<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">)</span>
unsharp <span class="token operator">=</span> cv2<span class="token punctuation">.</span>addWeighted<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token number">1</span><span class="token operator">+</span>amt<span class="token punctuation">,</span> blurred<span class="token punctuation">,</span> <span class="token operator">-</span>amt<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span>
unsharp_manual <span class="token operator">=</span> np<span class="token punctuation">.</span>clip<span class="token punctuation">(</span>img <span class="token operator">*</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">+</span>amt<span class="token punctuation">)</span> <span class="token operator">+</span> blurred <span class="token operator">*</span> <span class="token punctuation">(</span><span class="token operator">-</span>amt<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Unsharp Masking&quot;</span><span class="token punctuation">,</span> unsharp<span class="token punctuation">)</span>
</pre><p><img src="assets/unsharpsarpi.png" alt><br>
You can find the sample code for this in <code>unsharpmask_01.py</code> (using <code>addWeighted</code>) and in <code>unsharpmask_02.py</code> (manual calculation) respectively.</p>
<h2 class="mume-header" id="summary-and-key-points">Summary and Key Points</h2>

<p>Why go to such lengths on the mathematical ideas behind image filtering operations?</p>
<blockquote>
<p>Filtering is perhaps the most fundamental operation of image processing and computer vision. In the broadest sense of the term &quot;filtering&quot;, the value of the filtered image at a given location is a function of the values of the input image in a small neighborhood of the same location.<sup class="footnote-ref"><a href="#fn8" id="fnref8">[8]</a></sup></p>
</blockquote>
<p>It is fundamental to a host of common image processing techniques, from enhancements (sharpening, denoise, increase / reduce contrast), to edge detection, and texture detection, and in the case of deep learning, feature detections.</p>
<p>To help with your recall, I made a simple illustration below:</p>
<p><img src="assets/gaussiankernel.png" alt></p>
<p>Whenever you&apos;re ready, move on to <code>edgedetect.md</code> to learn the essentials of edge detection using kernel operations.</p>
<h2 class="mume-header" id="references">References</h2>

<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>Making your own linear filters, <a href="https://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/filter_2d/filter_2d.html">OpenCV Documentation</a> <a href="#fnref1" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Bradski, Kaehler, Learning OpenCV <a href="#fnref2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn3" class="footnote-item"><p>Stacks Exchange, <a href="https://stats.stackexchange.com/a/366940">https://stats.stackexchange.com/a/366940</a> <a href="#fnref3" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn4" class="footnote-item"><p><a href="http://docs.opencv.org/modules/imgproc/doc/filtering.html#filter2d">OpenCV Documentation</a> <a href="#fnref4" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn5" class="footnote-item"><p>R.Zadeh and B.Ramsundar, TensorFlow for Deep Learning, O&apos;Reilly Media <a href="#fnref5" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn6" class="footnote-item"><p>Wikipedia, Gaussian function, <a href="https://en.wikipedia.org/wiki/Gaussian_function">https://en.wikipedia.org/wiki/Gaussian_function</a> <a href="#fnref6" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn7" class="footnote-item"><p>W.Fulton, A few scanning tips, Sharpening - Unsharp Mask <a href="#fnref7" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn8" class="footnote-item"><p>C. Tomasi and R. Manduchi, &quot;Bilateral Filtering for Gray and Color Images&quot;, Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India. <a href="#fnref8" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
</ol>
</section>
</div>
      </div>
      <div class="md-sidebar-toc"><ul>
<li><a href="#kernels">Kernels</a>
<ul>
<li><a href="#definition">Definition</a>
<ul>
<li><a href="#mathematical-definitions">Mathematical Definitions</a>
<ul>
<li><a href="#a-note-on-padding">A Note on Padding</a>
<ul>
<li><a href="#dive-deeper">Dive Deeper</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#smoothing-and-blurring">Smoothing and Blurring</a>
<ul>
<li><a href="#code-illustrations-mean-filtering">Code Illustrations: Mean Filtering</a></li>
</ul>
</li>
<li><a href="#role-in-convolutional-neural-networks">Role in Convolutional Neural Networks</a></li>
<li><a href="#handy-kernels-for-image-processing">Handy Kernels for Image Processing</a>
<ul>
<li><a href="#gaussian-filtering">Gaussian Filtering</a></li>
<li><a href="#sharpening-kernels">Sharpening Kernels</a>
<ul>
<li><a href="#approximate-gaussian-kernel-for-sharpening">Approximate Gaussian Kernel for Sharpening</a></li>
<li><a href="#unsharp-masking">Unsharp Masking</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#summary-and-key-points">Summary and Key Points</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</div>
      <a id="sidebar-toc-btn">&#x2261;</a>
    
    
<script>

var sidebarTOCBtn = document.getElementById('sidebar-toc-btn')
sidebarTOCBtn.addEventListener('click', function(event) {
  event.stopPropagation()
  if (document.body.hasAttribute('html-show-sidebar-toc')) {
    document.body.removeAttribute('html-show-sidebar-toc')
  } else {
    document.body.setAttribute('html-show-sidebar-toc', true)
  }
})
</script>
      
  
    </body></html>

================================================
FILE: edgedetect/kernel.md
================================================
# Kernels
## Definition
When performing an arithmetic computation on a given image, one approach is to apply said computation in a neighborhood-by-neighborhood manner. This approach is very braodly termed as a **convolution**. In other words, convolution is an operation between every part of an image ("pixel neighborhood") and an operator ("kernel")[^1][^2].

As the computation slides over each pixel neighborhood, we perform some arithmetic using the kernel, with the kernel typically being represented as a matrix or a fixed size array. 

This kernel describes how the pixels in that neighborhood are combined or transformed to yield a corresponding output.

- [ ] [Watch Kernel Convolution Explained Visually](https://www.youtube.com/watch?v=WMmHcrX4Obg)

    <iframe width="560" height="315" src="https://www.youtube.com/embed/WMmHcrX4Obg" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

### Mathematical Definitions
You will notice from the video that the output image now has a **shape that is smaller** than the original input. Mathematically, the shape of this output would be:

$$(\frac{X_m-M_i}{s_x})+1, (\frac{X_n-M_j}{s_y})+1$$

Where the input matrix has a size of $(X_m, X_n)$, the kernel $M$ is of size $(M_i, M_j)$, $s_x$ represents the stride over rows while $s_y$ represents the stride over columns. 

In the linked video, we are sliding the kernel on both the x- and y- direction by 1 pixel at a time after each computation, giving a value of 1 for $s_x$ and $s_y$. The input matrix in our video is of size 5, and our kernel is of size 3x3, giving us an output size of:

$$(\frac{5-3}{1}+1, \frac{5-3}{1}+1)$$

Expressed mathematically, the full procedure as implemented in `opencv`looks like this for a convolution:

$H(x, y) = \sum^{M_i-1}_{i=0}\sum^{M_j-1}_{j=0} I(x+i-a_i, y+j-a_j)K(i,j)$

We'll see the step-by-step given a kernel represented by matrix M:

$$M = \begin{bmatrix} 1 & 2 & 0 \\ -1 & 3 & 0 \\ 0 & -1 & 0  \end{bmatrix}$$

1. Place the kernel anchor (in this case, $3$) on top of a determined pixel, with the rest of the kernel overlaying the corresponding local pixels in the image
    - Typically the kernel anchor is the _central_ of the kernel
    - Typically the "determined pixel" at the first step is the most upperleft region of the image

2. Multiply the kernel coefficients by the corresponding image pixel values and sum the result  

3. Replace the value at the location of the _anchor_ in the input image with the result

4. Repeat the process for all pixels by sliding the kernel across the entire image, as specified by the stride

#### A Note on Padding
Keen readers may observe from executing `meanblur_02.py` that the original dimension of our image is preserved _after_ the convolution. This may seem unexpected given what we know about the formula to derive the output dimension. 
As it turns out, to preserve the dimension between the input and output images, a common technique known as "padding" is applied. From the documentation itself, 
> For example, if you want to smooth an image using a Gaussian 3 * 3 filter, then, when processing the left-most pixels in each row, you need pixels to the left of them, that is, outside of the image. You can let these pixels be the same as the left-most image pixels (“replicated border” extrapolation method), or assume that all the non-existing pixels are zeros (“constant border” extrapolation method), and so on. 

The various border interpolation techniques available in `opencv` are as below (image boundaries are denoted with '|'):

 - BORDER_REPLICATE:
    - `aaaaaa|abcdefgh|hhhhhhh`
 - BORDER_REFLECT:
    - `fedcba|abcdefgh|hgfedcb`
 - BORDER_REFLECT_101:
    - `gfedcb|abcdefgh|gfedcba`
 - BORDER_WRAP:
    - `cdefgh|abcdefgh|abcdefg`
 - BORDER_CONSTANT:
    - `iiiiii|abcdefgh|iiiiiii`  with some specified 'i'
 
It is useful to remember that OpenCV only supports convolving an image where the dimension of its output matches that of the input, so in almost all cases we need a way to extrapolate an extra layer of pixels around the borders. To specify an extrapolation method, supply the filtering method with an extra argument:
 - `cv2.GaussianBlur(..., borderType=BORDER_CONSTANT)`

 Given what we've just learned, we can rewrite our formula to determine the output dimensions more generally and this time incorporating the padding technique:

 $$(\frac{X_m - M_i + 2P_i}{s_x})+1, (\frac{X_n-M_j + 2P_j}{s_y})+1$$

##### Dive Deeper
 Before moving on to the next section, try and think through the following problem:

 In the case on a 333x333 input image, with a strides of 1 using a kernel of size 5*5, what is the amount of zero-padding you should add to the borders of your image such that the output image is also 333x333?

- [ ] Done, I've understood the convolution operation!

## Smoothing and Blurring
To fully appreciate the idea of kernel convolutions, we'll see some real examples. We'll use the `cv2.filter2D` to convolve over our image using the following kernel:

$$K = \frac{1}{5\cdot5} \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1  \\ 1 & 1 & 1 & 1 & 1  \end{bmatrix}$$

The kernel we specified above is equivalent to a _normalized box filter_ of size 5. Having watched the video earlier, you may intuit that the outcome of such a convolution is that each pixel in the input image is replaced by the average of the 5x5 pixels around it. You are in fact correct. If you are skeptical and would rather see proof of it, we'll see proof of this in the [Code Illustrations: Mean Filtering](#code-illustrations-mean-filtering) section of this coursebook.

Mathematically, by dividing our matrix by 25 (normalizing) we apply a control that stop our pixel values from being artificially increased since each pixel is now the weighted sum of its neighborhood.

> #### A Note on Terminology
> ##### Kernels or Filters?
> When all we've been talking about is kernels, why is it that we're using the "filter" terminology in `opencv` code instead? That depends on the context. In the case of a convolutional neural network, _kernel_ and _filters_ are used interchangably: they both refer to the same thing.
> Some computer vision researchers have proposed to use a stricter definition, prefering to use the term "kernel" for a 2D array of weights, like our matrix above, and the term "filter" for the 3D structure of multiple kernels stacked together[^3], a concept we'll explore further in the Convolutional Neural Network part of this course.
> 
> ##### Correlations vs Convolutions
> Imaging specialists may point to the fact that `opencv` does not mirror / flip the kernel around the anchor point and hence doesn't qualify as a convolution under strict definitions of digital imaging theory. For a pure implementation of a "convolution", you should instead `scipy.ndimage.convolve(src, kernel)` instead or use `cv2.filter2D` in conjunction with a `flip` on the kernel[^4]. This is in large part owed to the difference in scientific parlance adopted by the various scientific communities, a phenomenon more common than you'd expect. As an additional example, deep learning scientists usings convolutional neural network (CNN) generally refer to a non-flipped kernel when performing convolution.

#### Code Illustrations: Mean Filtering
1. `meanblur_01.py` demonstrates the construction of a 5x5 mean average filter using `np.ones((5,5))/25`. Because every coefficient is basically the same, this merely replaces the value of each pixel in our input image with the average of the values in its 5x5 neighborhood. 

```py
img = cv2.imread("assets/canal.png")
mean_blur = np.ones((5, 5), dtype="float32") * (1.0 / (5 ** 2))
smoothed_col = cv2.filter2D(img, -1, mean_blur)
```

Alternatively, we can be explicit in our creation of the 5x5 kernel using `numpy`'s array:
```py
mean_blur = np.array(
[[0.04, 0.04, 0.04, 0.04, 0.04],
    [0.04, 0.04, 0.04, 0.04, 0.04],
    [0.04, 0.04, 0.04, 0.04, 0.04],
    [0.04, 0.04, 0.04, 0.04, 0.04],
    [0.04, 0.04, 0.04, 0.04, 0.04]])
```

2. To be fully convinced that the mean filtering operation is doing what we expect it to do, we can inspect the pixel values before and after the convolution, to verify that the math checks out by hand. We do this in `meanblur_02.py`.

    ```py
    img = cv2.imread("assets/canal.png")
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    print(f'Gray: {gray[:5, :5]}')
    # [[ 31  27  21  17  21]
    # [ 77  85  86  87  90]
    # [205 205 215 227 222]
    # [224 230 222 243 249]
    # [138 210 206 218 242]]
    for i in range(3):
        newval = np.round(np.mean(gray[:5, i:i+5]))
        print(f'Mean of 25x25 pixel #{i+1}: {np.int(newval)}')
    # output:
    # Mean of 25x25 pixel #1: 152
    # Mean of 25x25 pixel #2: 158
    # Mean of 25x25 pixel #3: 160
    ```
    The code above shows that the output of such a convolution operation beginning at the top-left region of the image would be 152. As we slide along the horizontal direction and re-compute the mean of the neighborhood, we get 158. As we slide our kernel along the horizontal direction for a second time and re-compute the mean of the neighborhood we obtain the value of 160. 
    
    If you prefer you can verify these values by hand, using the raw pixel values from `gray[:5, :5]` (5x5 top-left region of the image).

    ```py
    mean_blur = np.ones(KERNEL_SIZE, dtype="float32") * (1.0 / (5 ** 2))
    smoothed_gray = cv2.filter2D(gray, -1, mean_blur)
    print(f'Smoothed: {smoothed_gray[:5, :5]}')
    # output:
    # [[122 123 125 127 128]
    # [126 127 128 131 132]
    # [148 149 152 158 160]
    # [177 179 184 196 202]
    # [197 199 204 222 229]]
    ```
    Notice that from the output of our mean-filter, the first anchor (center of the neighborhood) has transformed from 215 to 152, and the one to the right of it has transformed from 227 to 158, and so on. The math does work out and you can observe the blur effect directly by running `meanblur02.py`.

3. As it turns out, `opencv` provides a set of convenience functions to apply filtering onto our images. All the three approaches below yield the same output, as can be verified from the output pixel values after executing `meanblur_03.py`:

    ```py
    # approach 1
    mean_blur = np.ones(KERNEL_SIZE, dtype="float32") * (1.0 / (5 ** 2))
    smoothed_gray = cv2.filter2D(gray, -1, mean_blur) 

    # approach 2
    smoothed_gray = cv2.blur(gray, KERNEL_SIZE)
    
    # approach 3
    smoothed_gray = cv2.boxFilter(gray, -1, KERNEL_SIZE)
    ```

There are several types of kernels we can apply to achieve a blur filter on our image. The averaging filter method serves as a good introductory point because it is easy to intuit about, but it is good to know that `opencv` provides a collection of convenience functions, each being an implementation of some blurring filter. See [Handy kernels for image processing](#handy-kernels-for-image-processing) for a list of smoothing kernels implemented in `opencv`.

## Role in Convolutional Neural Networks
Earlier, it was said that kernels play a play integral role in all modern convolutional neural networks architecture. Using TensorFlow, one will rely on the `tf.nn.conv2d` function to perform a 2D convolution. The syntax looks like this:
```py
tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None   
)
```

Where:
- `input` is assumed to be a tensor of shape `(batch, height, width, channels)` where `batch` is the number of images in a minibatch  
- `filter` is a tensor of shape `(filter_height, filter_width, channels, out_channels)` that specifies the learnable weights for the nonlinear transformation learned in the convoliutional kernel  
- `strides` contains the filter strides and is a list of length 4 (one for each input dimension)  
- `padding` determines whether the input tensors are padded (with extra zeros) to guarantee the output _from the convolutional layer_ has the same shape as the input. `padding="SAME"` adds padding to the input and `padding="VALID"` results in no padding

Worthy to note is that the `input` and `filters` parameters follow what we've implemented using `opencv` thus far. When we're applying a filter like the mean blur example earlier, we slide our kernel along the `stride` of 1. In TensorFlow code, we would have set `strides=[1,1,1,1]` such that the kernel would slide by 1 unit across all 4 dimensions (x, y, channel, and image index).

Example of a Convolutional Neural Network architecture[^5]:
![](assets/c6archit.png)

Notice from the image that the dimension of our output from the first convolution layer is smaller (28x28) than its input (32x32) when we perform the operation without padding. `C1` and `C3` are examples of this in the above illustration.

In `S1` and `S2`, we're applying a max-pooling filter to down-sample our image representation, allowing our network to learn the parameters from the higher-order representations in each region of the image. An example operation is depicted below:
 
![](assets/c6pooling.png)

## Handy Kernels for Image Processing
- Averaging Filter: `cv2.blur(img, KERNEL_SIZE)`  
    - As seen in `meanblur_03.py`, replace each pixel with the **mean** of its neighboring pixels
- Median Filter: `cv2.medianBlur(img, KERNEL_SIZE)`
    - Replace each pixel with the **median** of its neighboring pixels
- Gaussian Filter: `cv2.GaussianBlur(img, KERNEL_SIZE, 0)`
- Bilateral Filter: `cv2.bilateralFilter(img, d, sigmaColor, sigmaSpace)`
    - An edge-preserving smoothing that aims to keep edges sharp


#### Gaussian Filtering 
Gaussian filter deserves its own section given its prevalence in image processing, and is achieved by convolving each point in the input array (read: each pixel in our image) with a _Gaussian kernel_ and take the sum of them to produce the output array.

If you remember your lessons from statistics, you may recall a 1D gaussian distribution looks like this:
<img src="assets/normaldist.png" style="width: 50%; margin-left:20%;">

For completeness' sake, the code to graph the distribution above is in `utils/gaussiancurve.r`.

For a 1-dimensional image, the pixel located in the middle would be assigned the largest weight, with the weight of its neighbours decreasing as the spatial distance between them and the center pixel increases. 

For the mathematically inclined, the graphed distribution above is generated from the Gaussian function[^6]:

$$g(x) = e^{\frac{-x^2}{2\sigma^2}}$$

Where $x$ is the spatial distance between the center pixel and the corresponding neighbor unit.

For a 1D kernel of size 7, each pixel would therefore be weighted accordingly:

$$g(x) = \begin{bmatrix}.011 & .13 & .6 & 1 & .6 & .13 & .011\end{bmatrix}$$

The above should not be hard to intuit about, as if we refer back to the graphed distribution we can see that the center pixel (at position x=0) the $g(x)$ would evaluate to a value of $1$.

```py
import numpy as np
weights = []
sd = 1
for i in range(4):
    weights += [np.round(np.exp((-i**2)/(2*sd**2)),3)]
print(weights)
# output:
# [1.0, 0.607, 0.135, 0.011]
```

For a 2D kernel, the formula would take the form of:
$$g(x,y) = e^{\frac{-(x^2+y^2)}{2\sigma^2}}$$

When we compare the output of a mean filter to a gaussian filter, as in the example script in `gaussianblur_01.py`, we can then observe the difference in output visually:

![](assets/meanvsgaussian.png)

This should also come as little surprise, since the mean filter just replace each pixels with the average values of its neighboring pixels, essentially giving a coefficient of 1 (without normalized) to a grid of 5x5 pixels.  

Where on the other hand, gaussian filters **weigh pixels using a gaussian distribution** (think: bell curve in a 2d space) around the center pixel such that farther pixels are given a lower coefficient than nearer ones. 


#### Sharpening Kernels
The opposite of blurring would be sharpening. There are again several approaches to this, and we'll start by looking at specifically two of them.

The first approach relies on the familiar `cv2.filter2D()` function to perform the following kernel and is implemented in `sharpening_01.py`:
$$K = \begin{bmatrix} -1 & -1 & -1 \\ -1 & 9 & -1 \\ -1 & -1 & -1 \end{bmatrix}$$

The outcome:
![](assets/sharpen.png)


##### Approximate Gaussian Kernel for Sharpening
We can apply the same principles behind a Gaussian kernel for sharpening operations (as opposed to blurring). The full script is in `sharpening_02.py` but the essential parts are as follow:

```py
approx_gaussian = (
    np.array(
        [
            [-1, -1, -1, -1, -1],
            [-1, 2, 2, 2, -1],
            [-1, 2, 8, 2, -1],
            [-1, 2, 2, 2, -1],
            [-1, -1, -1, -1, -1],
        ]
    )/ 8.0
)
sharpen_col = cv2.filter2D(img, -1, approx_gaussian)
```

Notice how this method uses an approximate Gaussian kernel and that the result is an overall more natural smoothing:
![](assets/gaussiansharpen.png)

##### Unsharp Masking
The second approach is known as "unsharp masking", derived from that fact that the technique uses a blurred, or "unsharp", negative image to create a mask of the original image[^7]. This technique is one of the oldest tool in photographic processing (tracing back to 1930s) and popular tools such as Adobe Photoshop and GIMP have direct implementations of it named, appropriately, Unsharp Mask. 

Lifted straight from the Wikipedia article itself, a "typical blending formula for unsharp masking is **sharpened = original + (original - blurred) * amount**". **Amount** represents how much contrast is added to the edges.

To rewrite the formula, we get:
$$\begin{aligned}
Sharpened & = O + (O-B) \cdot a \\
& = O + Oa - Ba \\
& = O (1+a) + B(-a)\end{aligned}$$

Where $a$ is the amount, $B$ is the blurred image (mask) and $O$ is the original image. The final form is convenient because we can plug it into `cv2.addWeighted` and get an output. From OpenCV's documentation, the function `addWeighted` calculates the weighted sum of two arrays as follows:
$$dst(I) = saturate(src1(I) * alpha + src2(I) * beta + gamma)$$

When you perform the arithmetic above, you will find that the values (eg. `src1(I) * alpha` when alpha is > 1.5 will produce values greater than 255) may fall outside the range of 0 and 255. Saturation clips the value in a way that is synonymous to the following:

$$Saturate(x) = min(max(round(r), 0), 255)$$

The following code demonstrates the unsharp masking technique:
```py
img = cv2.imread("assets/sarpi.png")

amt = 1.5
blurred = cv2.GaussianBlur(img, (5,5), 10)
unsharp = cv2.addWeighted(img, 1+amt, blurred, -amt, 0)
unsharp_manual = np.clip(img * (1+amt) + blurred * (-amt), 0, 255)
cv2.imshow("Unsharp Masking", unsharp)
```

![](assets/unsharpsarpi.png)
You can find the sample code for this in `unsharpmask_01.py` (using `addWeighted`) and in `unsharpmask_02.py` (manual calculation) respectively.

## Summary and Key Points
Why go to such lengths on the mathematical ideas behind image filtering operations?

> Filtering is perhaps the most fundamental operation of image processing and computer vision. In the broadest sense of the term "filtering", the value of the filtered image at a given location is a function of the values of the input image in a small neighborhood of the same location.[^8]

It is fundamental to a host of common image processing techniques, from enhancements (sharpening, denoise, increase / reduce contrast), to edge detection, and texture detection, and in the case of deep learning, feature detections. 

To help with your recall, I made a simple illustration below:

![](assets/gaussiankernel.png)

Whenever you're ready, move on to `edgedetect.md` to learn the essentials of edge detection using kernel operations. 

## References
[^1]: Making your own linear filters, [OpenCV Documentation](https://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/filter_2d/filter_2d.html)

[^2]: Bradski, Kaehler, Learning OpenCV

[^3]: Stacks Exchange, https://stats.stackexchange.com/a/366940

[^4]: [OpenCV Documentation](http://docs.opencv.org/modules/imgproc/doc/filtering.html#filter2d)

[^5]: R.Zadeh and B.Ramsundar, TensorFlow for Deep Learning, O'Reilly Media

[^6]: Wikipedia, Gaussian function, https://en.wikipedia.org/wiki/Gaussian_function

[^7]: W.Fulton, A few scanning tips, Sharpening - Unsharp Mask

[^8]: C. Tomasi and R. Manduchi, "Bilateral Filtering for Gray and Color Images", Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India.


================================================
FILE: edgedetect/meanblur_01.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/canal.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

cv2.imshow("Gray", gray)
cv2.waitKey(0)

# Create the following 5x5 
# np.array(
# [[0.04, 0.04, 0.04, 0.04, 0.04],
# [0.04, 0.04, 0.04, 0.04, 0.04],
# [0.04, 0.04, 0.04, 0.04, 0.04],
# [0.04, 0.04, 0.04, 0.04, 0.04],
# [0.04, 0.04, 0.04, 0.04, 0.04]])

mean_blur = np.ones(KERNEL_SIZE, dtype="float32") * (1.0 / (5 ** 2))
smoothed_col = cv2.filter2D(img, -1, mean_blur)
smoothed_gray = cv2.filter2D(gray, -1, mean_blur)

cv2.imshow("Smoothed Colored", smoothed_col)
cv2.waitKey(0)

cv2.imshow("Smoothed Gray", smoothed_gray)
cv2.waitKey(0)


================================================
FILE: edgedetect/meanblur_02.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/canal.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(f'Gray: {gray[:5, :5]}')
print(f'Shape of Original: {gray.shape}')

for i in range(3):
    newval = np.round(np.mean(gray[:5, i:i+5]))
    print(f'Mean of 25x25 pixel #{i+1}: {np.int(newval)}')

cv2.imshow("Gray", gray)
cv2.waitKey(0)

mean_blur = np.ones(KERNEL_SIZE, dtype="float32") * (1.0 / (5 ** 2))
smoothed_col = cv2.filter2D(img, -1, mean_blur)
smoothed_gray = cv2.filter2D(gray, -1, mean_blur)

cv2.imshow("Smoothed Colored", smoothed_col)
cv2.waitKey(0)

cv2.imshow("Smoothed Gray", smoothed_gray)
cv2.waitKey(0)
print(f'Smoothed: {smoothed_gray[:5, :5]}')
print(f'Shape of Smoothed: {smoothed_gray.shape}')


================================================
FILE: edgedetect/meanblur_03.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/canal.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(f'Gray: {gray[:5, :5]}')
print(f'Shape of Original: {gray.shape}')

for i in range(3):
    newval = np.round(np.mean(gray[:5, i:i+5]))
    print(f'Mean of 25x25 pixel #{i+1}: {np.int(newval)}')

cv2.imshow("Gray", gray)
cv2.waitKey(0)

smoothed_col = cv2.blur(img, KERNEL_SIZE)

# equivalently:
# smoothed_gray = cv2.boxFilter(gray, -1, KERNEL_SIZE)
smoothed_gray = cv2.blur(gray, KERNEL_SIZE)

cv2.imshow("Smoothed Colored", smoothed_col)
cv2.waitKey(0)

cv2.imshow("Smoothed Gray", smoothed_gray)
cv2.waitKey(0)
print(f'Smoothed: {smoothed_gray[:5, :5]}')
print(f'Shape of Smoothed: {smoothed_gray.shape}')


================================================
FILE: edgedetect/sharpening_01.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/canal.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

for i in range(3):
    newval = np.round(np.mean(gray[:5, i : i + 5]))
    print(f"Mean of 25x25 pixel #{i+1}: {np.int(newval)}")

cv2.imshow("Gray", gray)
cv2.waitKey(0)

sharpen = np.array([[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]])
sharpen_col = cv2.filter2D(img, -1, sharpen)
sharpen_gray = cv2.filter2D(gray, -1, sharpen)

cv2.imshow("Sharpen Colored", sharpen_col)
cv2.waitKey(0)

cv2.imshow("Sharpen Gray", sharpen_gray)
cv2.waitKey(0)


================================================
FILE: edgedetect/sharpening_02.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/canal.png")

cv2.imshow("Original", img)
cv2.waitKey(0)

approx_gaussian = (
    np.array(
        [
            [-1, -1, -1, -1, -1],
            [-1, 2, 2, 2, -1],
            [-1, 2, 8, 2, -1],
            [-1, 2, 2, 2, -1],
            [-1, -1, -1, -1, -1],
        ]
    )
    / 8.0
)
sharpen_col = cv2.filter2D(img, -1, approx_gaussian)

cv2.imshow("Sharpen (approx. Gaussian)", sharpen_col)
cv2.waitKey(0)
cv2.waitKey(0)


================================================
FILE: edgedetect/sobel_01.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("assets/sudoku.jpg", 0)
img = cv2.medianBlur(img, 5)
img = cv2.GaussianBlur(img, (7, 7), 0)
cv2.imshow("Image", img)
cv2.waitKey(0)

gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)
print(f"Range: {np.min(gradient_x)} | {np.max(gradient_x)}")

gradient_x = np.uint8(np.absolute(gradient_x))
gradient_y = np.uint8(np.absolute(gradient_y))
print(f"Range uint8: {np.min(gradient_x)} | {np.max(gradient_x)}")

cv2.imshow("Gradient X", gradient_x)
cv2.waitKey(0)
cv2.imshow("Gradient Y", gradient_y)
cv2.waitKey(0)

# plt.imshow(gradient_x, cmap="gray")
# plt.show()


================================================
FILE: edgedetect/sobel_02.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt

img_original = cv2.imread("assets/castello.png")
img_original = cv2.cvtColor(img_original, cv2.COLOR_BGR2RGB)
img = cv2.cvtColor(img_original, cv2.COLOR_BGR2GRAY)
img = cv2.medianBlur(img, 9)
img = cv2.GaussianBlur(img, (9, 9), 0)

gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)

gradient_x = cv2.convertScaleAbs(gradient_x)
gradient_y = cv2.convertScaleAbs(gradient_y)
print(f"Range: {np.min(gradient_x)} | {np.max(gradient_x)}")

gradient_xy = cv2.addWeighted(gradient_x, 0.5, gradient_y, 0.5, 0)

plt.subplot(2, 2, 1), plt.imshow(img_original)
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(gradient_x, cmap="gray")
plt.title("Gradient X"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(gradient_y, cmap="gray")
plt.title("Gradient Y"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(gradient_xy, cmap="gray")
plt.title("Gradient X and Y"), plt.xticks([]), plt.yticks([])
plt.show()


================================================
FILE: edgedetect/sobel_03.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("assets/castello.png", flags=0)
img = cv2.medianBlur(img, 9)
img = cv2.GaussianBlur(img, (9, 9), 0)

gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=3)
gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=3)

gradient_x = cv2.convertScaleAbs(gradient_x)
gradient_y = cv2.convertScaleAbs(gradient_y)
print(f"Range: {np.min(gradient_x)} | {np.max(gradient_x)}")

gradient_xy = cv2.addWeighted(gradient_x, 0.5, gradient_y, 0.5, 0)

plt.imshow(gradient_xy, cmap="gray")
plt.title("Sobel Edge")
plt.show()

================================================
FILE: edgedetect/unsharpmask_01.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/sarpi.png")
cv2.imshow("Original", img)
cv2.waitKey(0)

amt = 1.5
blurred = cv2.GaussianBlur(img, (5,5), 10)
unsharp = cv2.addWeighted(img, 1+amt, blurred, -amt, 0)

cv2.imshow("Unsharp Masking", unsharp)
cv2.waitKey(0)


================================================
FILE: edgedetect/unsharpmask_02.py
================================================
import numpy as np
import cv2

KERNEL_SIZE = (5, 5)

img = cv2.imread("assets/sarpi.png")
cv2.imshow("Original", img)
cv2.waitKey(0)

amt = 1.5
blurred = cv2.GaussianBlur(img, (5,5), 10)

unsharp_manual = np.clip(img * (1+amt) + blurred * (-amt), 0, 255)
# unsharp_manual = img * (1+amt) + blurred * (-amt)
# unsharp_manual = np.maximum(unsharp_manual, np.zeros(unsharp_manual.shape))
# unsharp_manual = np.minimum(unsharp_manual, 255 * np.ones(unsharp_manual.shape))
unsharp_manual = unsharp_manual.round().astype(np.uint8)

cv2.imshow("Unsharp Masking Manual", unsharp_manual)
cv2.waitKey(0)


================================================
FILE: edgedetect/utils/gaussiancurve.r
================================================
x <- seq(-3, 3, length=1000000)
y <- dnorm(x, mean=0, sd=1)
plot(x, y, type="l", lwd=1, ylab="g(x)")

================================================
FILE: quiz.md
================================================
## Affine Transformation

1. Which of the following constructs the correct transformation matrix to perform a 2x scaling? 
    - [ ] `np.float32([[2, 0, 0], [0, 2, 0]])`
    - [ ] `np.float32([[0, 2, 0], [0, 2, 0]])`
    - [ ] `np.float32([[2, 2, 2], [0, 0, 0]])`
    - [ ] `np.float32([[2, 1, 1], [1, 2, 1]])`

2. In the case on a 333x333 input image, with a strides of 1 using a kernel of size 5*5, what is the amount of zero-padding you should add to the borders of your image such that the output image is also 333x333?
    - [ ] 1
    - [ ] 2
    - [ ] 3
    - [ ] No zero-padding

## Kernels and Convolution

3. For an input image of size 140W (Width) x 600H (Height), supposed we perform a convolution with slide S=1 using a filter of size 7W x 7H and two pixels of constant-padding (padding our image with a constant value of 5), what would the dimension of our image be?
    - [ ] 135 Width x 595 Height
    - [ ] 140 Width x 600 Height
    - [ ] 138 Width x 598 Height
    - [ ] None of the answers above 

## Tresholding Edge Detection
4. In an image with lighting conditions that result in some parts of the image being shaded differently than the others, which of the thresholding techniques may yield a more robust output?
    - [ ] Pixel-intensity based thresholding
    - [ ] Otsu's global thresholding method
    - [ ] Adaptive thresholding

5. We want to retrieve only the extreme outer contours. We do not need to store all the boundary points to minimise redundancy and save memory requirements. Which are the values to be passed into the findContours() function?
    - [ ] RETR_EXTERNAL, CHAIN_APPROX_SIMPLE
    - [ ] RETR_EXTERNAL, CHAIN_APPROX_NONE
    - [ ] RETR_OUTER, CHAIN_APPROX_SIMPLE
    - [ ] RETR_OUTER, CHAIN_APPROX_NONE
    - [ ] RETR_LIST, CHAIN_APPROX_NONE

6. The function call cv2.Canny(img, 50, 180) will determine which of the intensity gradients as definite edges?
    - [ ] 40
    - [ ] 100
    - [ ] 200

7. Which of the following is NOT part of the Canny Edge procedure?
    - [ ] Compute gradient in each direction
    - [ ] Suppress edges that are non-maximal
    - [ ] Discard pixels that are more likely noise than true edges
    - [ ] Retrieve only the extreme outer contours from the edges

================================================
FILE: requirements.txt
================================================
cycler==0.10.0  
decorator==4.4.1   
imageio==2.6.1   
imutils==0.5.3   
joblib==0.14.0  
kiwisolver==1.1.0   
mahotas==1.4.9   
matplotlib==3.1.1   
networkx==2.4     
numpy==1.17.4  
opencv-contrib-python==4.1.1.26
Pillow==8.1.1   
pip==21.1  
pyparsing==2.4.5   
python-dateutil==2.8.1   
PyWavelets==1.1.1   
scikit-image==0.16.2  
scikit-learn==0.21.3  
scipy==1.3.2   
setuptools==41.6.0  
six==1.13.0  
wheel==0.33.6  


================================================
FILE: summarynotes/class2201.md
================================================
# Computer Vision (Chapter 1 to 3)

## Administrative Details
- Prerequisites:
    - Python 3
    - OpenCV
    - Numpy (automatically installed as dependency to opencv)
    - Tip: Use `pip install -r requirements.txt` to install from the requirement file (`requirements.txt`) in the repo. Get help from Teaching Assistant (Tommy) or myself before the beginning of the class

- Any code editor
    - Atom, VSCode, Sublime etc... 
    - Personally, I use VSCode (free)

- Materials
    - https://github.com/onlyphantom/cvessentials

- WiFi 
    - Network: Accelerice
    - Password: gapura19 

## Day 1
1. Synonymous role to data preprocessing
Data Analysis
    - Read data (usually using pandas as pd)
    - Inspect your data (dat.shape)
    - Data Preprocessing
        - Reshape, ...

2. Basic Routine
    ```
    import cv2
    import numpy as np

    img = cv2.imread("Desktop/family.png")
    print(img.shape) # output: (h, w, c)
    
    gray = cv2.cvtColor(img, cv2.BGR2GRAY)
    cv2.imshow("Gray Image", gray)
    cv2.waitKey(0)
    ```

3. Affine Transformation
    ```
    import cv2
    import numpy as np

    img = cv2.imread("Desktop/family.png")
    (h, w, c) = img.shape
    print(f'Height: {h}; Width: {w}')

    gray = cv2.cvtColor(img, cv2.BGR2GRAY)

    # option 1: create 2x3 matrix
    mat = np.float32([[1, 0, 0], [0, 1, 0]])
    # option 2: ask for a 2x3 matrix
    mat = cv2.getRotationMatrix2D(center, angle=180, scale=1)
    mat = cv2.getAffineTransform(src, dst)

    transformed = cv2.warpAffine(gray, mat, dsize=(h,w))

    cv2.imshow("Transformed", transformed)
    cv2.waitKey(0)
    ```

================================================
FILE: transformation/lecture_affine.html
================================================
<!DOCTYPE html><html><head>
      <title>lecture_affine</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      <link rel="stylesheet" href="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.0/node_modules/@shd101wyy/mume/dependencies/katex/katex.min.css">
      
      
      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview {
  font-size: 0.8rem;
  line-height: 1.2rem;
}
.markdown-preview.markdown-preview pre {
  font-size: 0.7rem;
}
.markdown-preview.markdown-preview h1 {
  font-size: 1.4rem;
  margin-bottom: 1%;
}
.markdown-preview.markdown-preview h2 {
  font-size: 1.1rem;
  margin-bottom: 1%;
}
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  margin-bottom: 1%;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <div><h1 class="mume-header" id="affine-transformation">Affine Transformation</h1>

<h2 class="mume-header" id="definition">Definition</h2>

<p>Any transformation that can be expressed in the form of a <em>matrix multiplication</em> (linear transformation) followed by a <em>vector addition</em> (translation).</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mi>A</mi><mo>&#x22C5;</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>+</mo><mi>B</mi></mrow><annotation encoding="application/x-tex">T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span></span></p>
<p>In which:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>00</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>01</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>10</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>11</mn></msub></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo separator="true">;</mo><mi>B</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>00</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>10</mn></msub></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">A = \begin{bmatrix} a_{00} &amp; a_{01} \\ a_{10} &amp; a_{11} \end{bmatrix};   B = \begin{bmatrix} b_{00} \\ b_{10} \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<p>When concatenated horizontally, this can be expressed in a larger Matrix:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>A</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>B</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>00</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>01</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>00</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>10</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>11</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>10</mn></msub></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">M = \begin{bmatrix} A &amp; B \end{bmatrix} = \begin{bmatrix} a_{00} &amp; a_{01} &amp; b_{00} \\  a_{10} &amp; a_{11} &amp; b_{10} \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size1">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">A</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size1">]</span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<p>By the definition above (<em>matmul</em> + <em>vector addition</em>), affine transformation can be used to achieve:</p>
<ul>
<li>Scaling (linear transformation)</li>
<li>Rotations (linear transformation)</li>
<li>Translations (vector additions)</li>
</ul>
<p>Affine transformation preserves points, straight lines, and planes. Parallel lines will remain parallel. It does not however preserve the distance and angles between points.</p>
<p>We represent an Affine Transformation using a <strong>2x3 matrix</strong>.</p>
<h3 class="mume-header" id="mathematical-definitions">Mathematical Definitions</h3>

<p>Consider the goal of transforming a 2D vector <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">X = \begin{bmatrix} x \\ y \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span> using <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span> to obtain <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi></mrow><annotation encoding="application/x-tex">T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span></span></span></span>, we can do it like such:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mi>A</mi><mo>&#x22C5;</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>+</mo><mi>B</mi></mrow><annotation encoding="application/x-tex">T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span></span></p>
<p>Or equivalently:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mi>M</mi><mo>&#x22C5;</mo><mo stretchy="false">[</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo separator="true">,</mo><mn>1</mn><msup><mo stretchy="false">]</mo><mi>T</mi></msup><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><msub><mi>a</mi><mn>00</mn></msub><mi>x</mi><mo>+</mo><msub><mi>a</mi><mn>01</mn></msub><mi>y</mi><mo>+</mo><msub><mi>b</mi><mn>00</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><msub><mi>a</mi><mn>10</mn></msub><mi>x</mi><mo>+</mo><msub><mi>a</mi><mn>11</mn></msub><mi>y</mi><mo>+</mo><msub><mi>b</mi><mn>10</mn></msub></mrow></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">T = M \cdot [x,y,1]^T = \begin{bmatrix} 
a_{00}x + a_{01}y + b_{00} \\ a_{10}x + a_{11}y + b_{10}  \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.1413309999999999em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mord mathdefault">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord">1</span><span class="mclose"><span class="mclose">]</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913309999999999em;"><span style="top:-3.113em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathdefault mtight" style="margin-right:0.13889em;">T</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<h4 class="mume-header" id="practical-examples">Practical Examples</h4>

<p>In <code>scale_04.py</code> from the <strong>Examples and Illustrations</strong> section, you&apos;ll see that the  2x3 matrix <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span> is simple defined as such:<br>
<code>np.float32([[3, 0, 0], [0, 3, 0]])</code></p>
<p>When you explicitly specify a 2x3 matrix, think of the first two columns as the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span></span></span></span> component, or the matrix-multiplication process. The third column, naturally, represents the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span> component, or the vector addition process. This may sound a little abstract, so I encourage you to pause and take a look at the code below:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token punctuation">(</span>h<span class="token punctuation">,</span> w<span class="token punctuation">)</span> <span class="token operator">=</span> img<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">2</span><span class="token punctuation">]</span>
mat <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">140</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">20</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
translated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpAffine<span class="token punctuation">(</span>img<span class="token punctuation">,</span> mat<span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Translated&quot;</span><span class="token punctuation">,</span> translated<span class="token punctuation">)</span>
</pre><p>Notice that our <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span></span></span></span> is an <em>identity matrix</em> of size 2. An identity matrix is the matrix equivalent of a scalar 1. Multiplying a matrix by its identity matrix doesn&apos;t change it by anything.</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0</mn></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1</mn></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>&#x22C5;</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>+</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>&#x2212;</mo><mn>140</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>20</mn></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">T  = \begin{bmatrix} 1 &amp; 0 \\ 0 &amp; 1 \end{bmatrix}  \cdot \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} -140 \\ 20 \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">&#x2212;</span><span class="mord">1</span><span class="mord">4</span><span class="mord">0</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span><span class="mord">0</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<p>Which leads to:<br>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mn>1</mn><mo>&#x22C5;</mo><mi>x</mi><mo>+</mo><mn>0</mn><mo>&#x22C5;</mo><mi>y</mi><mo>&#x2212;</mo><mn>140</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mn>0</mn><mo>&#x22C5;</mo><mi>x</mi><mo>+</mo><mn>1</mn><mo>&#x22C5;</mo><mi>y</mi><mo>+</mo><mn>20</mn></mrow></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">T  = \begin{bmatrix} 1 \cdot x + 0 \cdot y -140 \\ 0 \cdot x + 1 \cdot y + 20 \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">0</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x2212;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">1</span><span class="mord">4</span><span class="mord">0</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">0</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault">x</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord">2</span><span class="mord">0</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<p>And our <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span>, the vector addition component, moves each pixel -- or more formally, translate each pixel -- on the image by -140 in the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord mathdefault">x</span></span></span></span> direction and 20 on the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span> direction. Find the full code example on <code>translate_01.py</code>.</p>
<h2 class="mume-header" id="motivation">Motivation</h2>

<ol>
<li>
<p>Imaging systems in the real-world are often subject to <strong>geometric distortion</strong>. The distortion may be introduced by perspective irregularities, physical constraints (e.g camera placements), or other reasons.</p>
</li>
<li>
<p>In the field of GIS (geographic information systems), routinely one would use affine transformation to &quot;convert&quot; geographic coordinates into screen coordinates such that it can <strong>be displayed and presented</strong> on our handheld / navigational devices.</p>
</li>
<li>
<p>One may also overlay coordinate data on a spatial data that reference a different coordinate systems; Or to <strong>&quot;stitch&quot; together</strong> different sources of data using a series of transformation</p>
</li>
</ol>
<p>These are but a handful of examples where one may expect to see routine use of affine transformations. If you&apos;re spending any amount of time in computer vision, a high degree of familiarity with these remapping routines in OpenCV will come in very handy.</p>
<p>In your learn-by-building section, you will find a less-than-perfectly-digitalized map, <code>belitung_raw.jpg</code>. Your job is to apply what you&apos;ve apply the necessary affine transformation to correct its perspective distortion and the resize the map accordingly.</p>
<h2 class="mume-header" id="getting-affine-transformation">Getting Affine Transformation</h2>

<p>Given the importance of such a relation between two images, it should come as no surprise that <code>opencv</code> packs a number of convenience methods to help us specify this transformation. The two common use-cases are:</p>
<ul>
<li>
<ol>
<li>We <strong>specify</strong> our 2D vector representing the original image, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.07847em;">X</span></span></span></span> and our 2x3 transformation matrix <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span> constructed in <code>numpy</code>.</li>
</ol>
<ul>
<li>Example code:</li>
</ul>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;our_image.png&quot;</span><span class="token punctuation">)</span>
mat <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
result <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpAffine<span class="token punctuation">(</span>img<span class="token punctuation">,</span> M<span class="token operator">=</span>mat<span class="token punctuation">,</span> dsize<span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">600</span><span class="token punctuation">,</span> <span class="token number">600</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Transformed&quot;</span><span class="token punctuation">,</span> result<span class="token punctuation">)</span>
</pre></li>
<li>
<ol start="2">
<li>We <strong>obtain</strong> our 2x3 transformation matrix <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span> by deriving the geometric relation using three points. Three points form a triangle, which is the minimal case required to find the affine transformation before applying the transformation to the whole image.</li>
</ol>
<ul>
<li>Example code:</li>
</ul>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;our_image.png&quot;</span><span class="token punctuation">)</span>
coords_s <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">80</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">80</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
coords_d <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">95</span><span class="token punctuation">,</span> <span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">,</span> <span class="token number">80</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
mat <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getAffineTransform<span class="token punctuation">(</span>src<span class="token operator">=</span>coords_s<span class="token punctuation">,</span> dst<span class="token operator">=</span>coords_d<span class="token punctuation">)</span>
result <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpAffine<span class="token punctuation">(</span>img<span class="token punctuation">,</span> M<span class="token operator">=</span>mat<span class="token punctuation">,</span> dsize<span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">200</span><span class="token punctuation">,</span> <span class="token number">200</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Transformed&quot;</span><span class="token punctuation">,</span> result<span class="token punctuation">)</span>
</pre><p>Have we printed out <code>mat</code> from the snippet of code above, we would see a 2x3 matrix that looks like this:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token punctuation">[</span><span class="token punctuation">[</span> <span class="token number">1.21428571</span>  <span class="token number">0</span><span class="token punctuation">.</span>         <span class="token operator">-</span><span class="token number">2.14285714</span><span class="token punctuation">]</span>
 <span class="token punctuation">[</span> <span class="token number">0</span><span class="token punctuation">.</span>          <span class="token number">1</span><span class="token punctuation">.</span>          <span class="token number">0</span><span class="token punctuation">.</span>        <span class="token punctuation">]</span><span class="token punctuation">]</span>
</pre></li>
<li>
<p>2b <em>[Optional]</em>. As an extension to point (2) above, consider how we would use <code>cv2.warpAffine</code> to achieve a 90 degree clockwise rotation. If you have attended my Unsupervised Learning course from the Machine Learning Specialization, you will undoubtedly have seen this quick reference:<br>
<img src="assets/rotationmatrix.gif" alt></p>
<p>To plug that directly into the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span></span></span></span> of our original formula:<br>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>=</mo><mi>A</mi><mo>&#x22C5;</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>+</mo><mi>B</mi></mrow><annotation encoding="application/x-tex">T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.13889em;">T</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">&#x22C5;</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">x</span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.03588em;">y</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span></span></p>
<p>A 90-degree clockwise rotation could be implemented as a 270-degree anti-clockwise rotation. Let&apos;s see this implementation in <code>opencv</code>:</p>
<ul>
<li>Example code:</li>
</ul>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/cvess.png&quot;</span><span class="token punctuation">)</span>
<span class="token punctuation">(</span>h<span class="token punctuation">,</span> w<span class="token punctuation">)</span> <span class="token operator">=</span> img<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">2</span><span class="token punctuation">]</span>
center <span class="token operator">=</span> <span class="token punctuation">(</span>w <span class="token operator">//</span> <span class="token number">2</span><span class="token punctuation">,</span> h <span class="token operator">//</span> <span class="token number">2</span><span class="token punctuation">)</span>
mat3 <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getRotationMatrix2D<span class="token punctuation">(</span>center<span class="token punctuation">,</span> angle<span class="token operator">=</span><span class="token number">270</span><span class="token punctuation">,</span> scale<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;270 degree anti-clockwise: \n </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">round</span><span class="token punctuation">(</span>mat3<span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
rotated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpAffine<span class="token punctuation">(</span>img<span class="token punctuation">,</span> mat<span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Rotated&quot;</span><span class="token punctuation">,</span> rotated<span class="token punctuation">)</span>
<span class="token comment"># </span>
<span class="token comment"># print output:</span>
<span class="token comment"># </span>
<span class="token comment"># 270 degree anti-clockwise: </span>
<span class="token comment"># [[ -0.  -1. 400.]</span>
<span class="token comment"># [  1.  -0.   0.]] </span>
</pre><p>We learned earlier that:<br>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>A</mi></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mi>B</mi></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.15999999999999992em" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>00</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>01</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>00</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>10</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>a</mi><mn>11</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><msub><mi>b</mi><mn>10</mn></msub></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">M = \begin{bmatrix} A &amp; B \end{bmatrix} = \begin{bmatrix} a_{00} &amp; a_{01} &amp; b_{00} \\  a_{10} &amp; a_{11} &amp; b_{10} \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.20001em;vertical-align:-0.35001em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size1">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault">A</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8500000000000001em;"><span style="top:-3.01em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.35000000000000003em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size1">]</span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em;"></span><span class="arraycolsep" style="width:0.5em;"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">0</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord mathdefault">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">1</span><span class="mord mtight">0</span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">&#x200B;</span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">]</span></span></span></span></span></span></span></p>
<p>So <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault">A</span></span></span></span> would be the <code>[[0, -1], [1, 0]]</code> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.05017em;">B</span></span></span></span> would be <code>[400, 0]</code>. Fundamentally, the <code>cv2.getRotationMatrix2D</code> is still applying an affine transformation to map the pixels from one point to another using a 2x3 matrix.</p>
<ul>
<li>Skeptical and want further mathematical proof?
<ul>
<li>Hop to the <strong>Trigonometry Proof</strong> section.</li>
</ul>
</li>
<li>Want to experiment?
<ul>
<li>Modify the script in <code>rotate_01.py</code> to obtain <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span> for a 180-degree rotation, and a 30-degree counter-clockwise rotation</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 class="mume-header" id="dive-deeper">Dive Deeper</h3>

<p>Let&apos;s also look at another application of <code>getAffineTransform</code> to strengthen our understanding.</p>
<p>Supposed we specify <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span> to be <code>mat = np.float32([[1, 0, 0], [0, 1, 0]])</code>, what do you expect the transformation to be?</p>
<p>Take a minute to discuss with your classmates or refer back to the Mathematical Definitions section above and try to internalize this before moving forward.</p>
<p>To verify your answer, run <code>scale_03.py</code> and see if your hunch was right.</p>
<p>For an extra challenge, let&apos;s assume <code>our_image.png</code> is an image of 200x200. Pay attention to the specification of <code>mat</code> (<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi></mrow><annotation encoding="application/x-tex">M</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.68333em;vertical-align:0em;"></span><span class="mord mathdefault" style="margin-right:0.10903em;">M</span></span></span></span>), what do you expect the outcome <code>result</code> to be?</p>
<p>Take a minute to discuss before moving forward.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/our_image.png&quot;</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Original&quot;</span><span class="token punctuation">,</span> img<span class="token punctuation">)</span>

<span class="token comment"># custom transformation matrix</span>
mat <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>mat<span class="token punctuation">)</span>
result <span class="token operator">=</span> cv2<span class="token punctuation">.</span>warpAffine<span class="token punctuation">(</span>img<span class="token punctuation">,</span> M<span class="token operator">=</span>mat<span class="token punctuation">,</span> dsize<span class="token operator">=</span><span class="token punctuation">(</span><span class="token number">200</span><span class="token punctuation">,</span> <span class="token number">200</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
</pre><p>You may have expected the 2x3 matrix <code>mat</code> to have a scaling effect on our original image. However, the required argument of <code>dsize</code> in our <code>warpAffine()</code> call constrained the output to its original dimension, 200x200, thus &quot;cropping out&quot; only the top left corner of the image.</p>
<p>Supposed we&apos;ll like to see the transformed image (scaled by 3x) in its entirety, how would we have changed the value passed to the <code>dsize</code> argument?</p>
<p>Refer to <code>scale_04.py</code> to verify that you&apos;ve got this right.</p>
<h4 class="mume-header" id="trigonometry-proof">Trigonometry Proof</h4>

<p><em>This section is optional; you may choose to skip this section.</em></p>
<ul>
<li class="task-list-item">
<p><input type="checkbox" class="task-list-item-checkbox"> <a href="https://www.youtube.com/watch?v=tIixrNtLJ8U">Watch Rotation Matrix Explained Visually </a></p>
  <iframe width="560" height="315" src="https://www.youtube.com/embed/pWfXR_HmyUw" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<ul>
<li><a href="https://www.youtube.com/watch?v=pWfXR_HmyUw">Bahasa Indonesia voiceover</a> is also available</li>
</ul>
</li>
</ul>
<p>If you&apos;re done watching the video, see the same example being presented in code:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">a <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
x <span class="token operator">=</span> np<span class="token punctuation">.</span>float32<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">,</span> <span class="token number">6</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
np<span class="token punctuation">.</span>matmul<span class="token punctuation">(</span>a<span class="token punctuation">,</span> x<span class="token punctuation">)</span>
<span class="token comment"># output:</span>
<span class="token comment"># array([-6.,  3.], dtype=float32)</span>
</pre><h2 class="mume-header" id="code-illustrations">Code Illustrations</h2>

<ul>
<li>Code example of using <code>getRotationMatrix2D()</code> to get a 2x3 matrix: <strong><code>rotate_01.py</code></strong></li>
<li>Code example of using three points to <code>getAffineTransform()</code>, obtaining a 2x3 matrix of <code>[[1,0,0], [0,1,0]]</code> (no transformation): <code>scale_01.py</code></li>
<li>Code example of explicit specification for our 2x3 matrix using <code>np.float32([[1,0,0], [0,1,0]])</code>: <strong><code>scale_02.py</code></strong></li>
<li>Code example of setting the <code>dsize</code> parameter in <code>cv2.warpAffine</code> without transformation: <strong><code>scale_03.py</code></strong></li>
<li>Code example of a scale transformation and setting the <code>dsize</code> parameter accordingly: <strong><code>scale_04.py</code></strong></li>
<li>Code example of using three points to <code>getAffineTransform()</code>, obtaining a 2x3 matrix of <code>[[1,0,0], [0,1,0]]</code>: <strong><code>scale_05.py</code></strong></li>
<li>Code example of translating (shifting an image) using a 2x3 matrix: <strong><code>translate_01.py</code></strong></li>
</ul>
<h2 class="mume-header" id="summary-and-key-points">Summary and Key Points</h2>

<ol>
<li>
<p>Images from imaging systems and capturing systems are often &quot;subject to geometric distortion introduced by perspective irregularities&quot;<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup> or &quot;deformations that occur with non-ideal camera angles&quot;<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>.</p>
</li>
<li>
<p>In the case of translation or scaling, we typically specify our 2x3 matrix using <code>np.float()</code> and feed this matrix to <code>cv2.warpAffine()</code></p>
</li>
<li>
<p>In the case of rotation, we typically use the convenience function <code>cv2.getAffineTransform()</code> to obtain the 2x3 matrix before feeding it to <code>cv2.warpAffine()</code></p>
</li>
</ol>
<blockquote>
<p><code>cv2.getAffineTransform(src, dst)</code></p>
<p><strong>Parameters:</strong></p>
<ul>
<li><strong>src</strong> - Coordinates of triangle vertices in the source image</li>
<li><strong>dst</strong> - Coordinates of corresponding triangle vertices in the destination triange</li>
</ul>
</blockquote>
<h2 class="mume-header" id="learn-by-building">Learn-by-Building</h2>

<p>In the <code>homework</code> directory, you&apos;ll find a digital map <code>belitung_raw.jpg</code>. Your job is to apply what you&apos;ve learned in this lesson to restore the map by correcting its skew and resize it appropriately.</p>
<p><img src="assets/hw1_belitung.png" alt></p>
<h2 class="mume-header" id="references">References</h2>

<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>Fisher, R., Perkins, S., Walker, A., Wolfart, E., Hypermedia Image Processing Learning (HIPR2) Resources, 2003 <a href="#fnref1" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn2" class="footnote-item"><p><a href="https://www.mathworks.com/discovery/affine-transformation.html">MathWorks</a>, Linear mapping method using affine transformation, Affine Transformation <a href="#fnref2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
</ol>
</section>
</div>
      </div>
      <div class="md-sidebar-toc"><ul>
<li><a href="#affine-transformation">Affine Transformation</a>
<ul>
<li><a href="#definition">Definition</a>
<ul>
<li><a href="#mathematical-definitions">Mathematical Definitions</a>
<ul>
<li><a href="#practical-examples">Practical Examples</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#motivation">Motivation</a></li>
<li><a href="#getting-affine-transformation">Getting Affine Transformation</a>
<ul>
<li><a href="#dive-deeper">Dive Deeper</a>
<ul>
<li><a href="#trigonometry-proof">Trigonometry Proof</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#code-illustrations">Code Illustrations</a></li>
<li><a href="#summary-and-key-points">Summary and Key Points</a></li>
<li><a href="#learn-by-building">Learn-by-Building</a></li>
<li><a href="#references">References</a></li>
</ul>
</li>
</ul>
</div>
      <a id="sidebar-toc-btn">&#x2261;</a>
    
    
<script>

var sidebarTOCBtn = document.getElementById('sidebar-toc-btn')
sidebarTOCBtn.addEventListener('click', function(event) {
  event.stopPropagation()
  if (document.body.hasAttribute('html-show-sidebar-toc')) {
    document.body.removeAttribute('html-show-sidebar-toc')
  } else {
    document.body.setAttribute('html-show-sidebar-toc', true)
  }
})
</script>
      
  
    </body></html>

================================================
FILE: transformation/lecture_affine.md
================================================
# Affine Transformation

## Definition
Any transformation that can be expressed in the form of a _matrix multiplication_ (linear transformation) followed by a _vector addition_ (translation). 

$$T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B$$

In which:

$$A = \begin{bmatrix} a_{00} & a_{01} \\ a_{10} & a_{11} \end{bmatrix};   B = \begin{bmatrix} b_{00} \\ b_{10} \end{bmatrix}$$

When concatenated horizontally, this can be expressed in a larger Matrix:

$$M = \begin{bmatrix} A & B \end{bmatrix} = \begin{bmatrix} a_{00} & a_{01} & b_{00} \\  a_{10} & a_{11} & b_{10} \end{bmatrix}$$

By the definition above (_matmul_ + _vector addition_), affine transformation can be used to achieve:
- Scaling (linear transformation)
- Rotations (linear transformation)
- Translations (vector additions)

Affine transformation preserves points, straight lines, and planes. Parallel lines will remain parallel. It does not however preserve the distance and angles between points.

We represent an Affine Transformation using a **2x3 matrix**.

### Mathematical Definitions
Consider the goal of transforming a 2D vector $X = \begin{bmatrix} x \\ y \end{bmatrix}$ using $A$ and $B$ to obtain $T$, we can do it like such:

$$T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B$$ 

Or equivalently:

$$T = M \cdot [x,y,1]^T = \begin{bmatrix} 
a_{00}x + a_{01}y + b_{00} \\ a_{10}x + a_{11}y + b_{10}  \end{bmatrix}$$

#### Practical Examples
In `scale_04.py` from the **Examples and Illustrations** section, you'll see that the  2x3 matrix $M$ is simply defined as such:
`np.float32([[3, 0, 0], [0, 3, 0]])`

The code above is equivalent to the one below:
```py
x = np.array([[3, 0, 0], [0, 3, 0]], dtype='float32')
# alternative:
x = np.array([[3, 0, 0], [0, 3, 0]])
x = x.astype('float32')
x.dtype # dtype('float32')
```

When you explicitly specify a 2x3 matrix, think of the first two columns as the $A$ component, or the matrix-multiplication process. The third column, naturally, represents the $B$ component, or the vector addition process. This may sound a little abstract, so I encourage you to pause and take a look at the code below:
```py
(h, w) = img.shape[:2]
mat = np.float32([[1, 0, -140], [0, 1, 20]])
translated = cv2.warpAffine(img, mat, (w, h))
cv2.imshow("Translated", translated)
```

Notice that our $A$ is an _identity matrix_ of size 2. An identity matrix is the matrix equivalent of a scalar 1. Multiplying a matrix by its identity matrix doesn't change it by anything. 

$$T  = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}  \cdot \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} -140 \\ 20 \end{bmatrix}$$

Which leads to:
$$T  = \begin{bmatrix} 1 \cdot x + 0 \cdot y -140 \\ 0 \cdot x + 1 \cdot y + 20 \end{bmatrix}$$

And our $B$, the vector addition component, moves each pixel -- or more formally, translate each pixel -- on the image by -140 in the $x$ direction and 20 on the $y$ direction. Find the full code example on `translate_01.py`.


## Motivation
1. Imaging systems in the real-world are often subject to **geometric distortion**. The distortion may be introduced by perspective irregularities, physical constraints (e.g camera placements), or other reasons. 

2. In the field of GIS (geographic information systems), routinely one would use affine transformation to "convert" geographic coordinates into screen coordinates such that it can **be displayed and presented** on our handheld / navigational devices. 

3. One may also overlay coordinate data on a spatial data that reference a different coordinate systems; Or to **"stitch" together** different sources of data using a series of transformation

These are but a handful of examples where one may expect to see routine use of affine transformations. If you're spending any amount of time in computer vision, a high degree of familiarity with these remapping routines in OpenCV will come in very handy.

In your learn-by-building section, you will find a less-than-perfectly-digitalized map, `belitung_raw.jpg`. Your job is to apply what you've apply the necessary affine transformation to correct its perspective distortion and the resize the map accordingly.

## Getting Affine Transformation 
Given the importance of such a relation between two images, it should come as no surprise that `opencv` packs a number of convenience methods to help us specify this transformation. The two common use-cases are:
- 1. We **specify** our 2D vector representing the original image, $X$ and our 2x3 transformation matrix $M$ constructed in `numpy`.
    - Example code: 
    ```py
    img = cv2.imread("our_image.png")
    mat = np.float32([[3, 0, 0], [0, 3, 0]])
    result = cv2.warpAffine(img, M=mat, dsize=(600, 600))
    cv2.imshow("Transformed", result)
    ```

- 2.  We **obtain** our 2x3 transformation matrix $M$ by deriving the geometric relation using three points. Three points form a triangle, which is the minimal case required to find the affine transformation before applying the transformation to the whole image.
    - Example code: 
    ```py
    img = cv2.imread("our_image.png")
    coords_s = np.float32([[10, 10], [80, 10], [10, 80]])
    coords_d = np.float32([[10, 10], [95, 10], [10, 80]])
    mat = cv2.getAffineTransform(src=coords_s, dst=coords_d)
    result = cv2.warpAffine(img, M=mat, dsize=(200, 200))
    cv2.imshow("Transformed", result)
    ```
    Have we printed out `mat` from the snippet of code above, we would see a 2x3 matrix that looks like this:
    ```py
    [[ 1.21428571  0.         -2.14285714]
     [ 0.          1.          0.        ]]
    ```

- 2b _[Optional]_. As an extension to point (2) above, consider how we would use `cv2.warpAffine` to achieve a 90 degree clockwise rotation. If you have attended my Unsupervised Learning course from the Machine Learning Specialization, you will undoubtedly have seen this quick reference:
    ![](assets/rotationmatrix.gif) 

    To plug that directly into the $A$ of our original formula:
    $$T = A \cdot \begin{bmatrix} x \\ y \end{bmatrix} + B$$

    A 90-degree clockwise rotation could be implemented as a 270-degree anti-clockwise rotation. Let's see this implementation in `opencv`:

    - Example code: 
    ```py
    img = cv2.imread("assets/cvess.png")
    (h, w) = img.shape[:2]
    center = (w // 2, h // 2)
    mat3 = cv2.getRotationMatrix2D(center, angle=270, scale=1)
    print(f'270 degree anti-clockwise: \n {np.round(mat3, 2)}')
    rotated = cv2.warpAffine(img, mat, (w, h))
    cv2.imshow("Rotated", rotated)
    # 
    # print output:
    # 
    # 270 degree anti-clockwise: 
    # [[ -0.  -1. 400.]
    # [  1.  -0.   0.]] 
    ```

    We learned earlier that:
    $$M = \begin{bmatrix} A & B \end{bmatrix} = \begin{bmatrix} a_{00} & a_{01} & b_{00} \\  a_{10} & a_{11} & b_{10} \end{bmatrix}$$

    So $A$ would be the `[[0, -1], [1, 0]]` and $B$ would be `[400, 0]`. Fundamentally, the `cv2.getRotationMatrix2D` is still applying an affine transformation to map the pixels from one point to another using a 2x3 matrix.

    - Skeptical and want further mathematical proof? 
        - Hop to the **Trigonometry Proof** section. 
    - Want to experiment? 
        - Modify the script in `rotate_01.py` to obtain $M$ for a 180-degree rotation, and a 30-degree counter-clockwise rotation

### Dive Deeper

Let's also look at another application of `getAffineTransform` to strengthen our understanding. 

Supposed we specify $M$ to be `mat = np.float32([[1, 0, 0], [0, 1, 0]])`, what do you expect the transformation to be? 

Take a minute to discuss with your classmates or refer back to the Mathematical Definitions section above and try to internalize this before moving forward.

To verify your answer, run `scale_03.py` and see if your hunch was right.

For an extra challenge, let's assume `our_image.png` is an image of 200x200. Pay attention to the specification of `mat` ($M$), what do you expect the outcome `result` to be? 

Take a minute to discuss before moving forward.

```py
img = cv2.imread("assets/our_image.png")
cv2.imshow("Original", img)

# custom transformation matrix
mat = np.float32([[3, 0, 0], [0, 3, 0]])
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(200, 200))
```

You may have expected the 2x3 matrix `mat` to have a scaling effect on our original image. However, the required argument of `dsize` in our `warpAffine()` call constrained the output to its original dimension, 200x200, thus "cropping out" only the top left corner of the image. 

Supposed we'll like to see the transformed image (scaled by 3x) in its entirety, how would we have changed the value passed to the `dsize` argument? 

Refer to `scale_04.py` to verify that you've got this right.

#### Trigonometry Proof
_This section is optional; you may choose to skip this section._

- [ ] [Watch Rotation Matrix Explained Visually](https://www.youtube.com/watch?v=tIixrNtLJ8U)
    <iframe width="560" height="315" src="https://www.youtube.com/embed/pWfXR_HmyUw" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

    - [Bahasa Indonesia voiceover](https://www.youtube.com/watch?v=pWfXR_HmyUw) is also available

If you're done watching the video, see the same example being presented in code:
```py
a = np.float32([[0, -1], [1, 0]])
x = np.float32([3, 6])
np.matmul(a, x)
# output:
# array([-6.,  3.], dtype=float32)
```

## Code Illustrations
- Code example of using `getRotationMatrix2D()` to get a 2x3 matrix: **`rotate_01.py`**  
- Code example of using three points to `getAffineTransform()`, obtaining a 2x3 matrix of `[[1,0,0], [0,1,0]]` (no transformation): `scale_01.py`
- Code example of explicit specification for our 2x3 matrix using `np.float32([[1,0,0], [0,1,0]])`: **`scale_02.py`**
- Code example of setting the `dsize` parameter in `cv2.warpAffine` without transformation: **`scale_03.py`**
- Code example of a scale transformation and setting the `dsize` parameter accordingly: **`scale_04.py`**
- Code example of using three points to `getAffineTransform()`, obtaining a 2x3 matrix of `[[1,0,0], [0,1,0]]`: **`scale_05.py`**  
- Code example of translating (shifting an image) using a 2x3 matrix: **`translate_01.py`**

## Summary and Key Points
1. Images from imaging systems and capturing systems are often "subject to geometric distortion introduced by perspective irregularities"[^1] or "deformations that occur with non-ideal camera angles"[^2].  

2. In the case of translation or scaling, we typically specify our 2x3 matrix using `np.float()` and feed this matrix to `cv2.warpAffine()`  

3. In the case of rotation, we typically use the convenience function `cv2.getAffineTransform()` to obtain the 2x3 matrix before feeding it to `cv2.warpAffine()`

> `cv2.getAffineTransform(src, dst)`
>
> **Parameters:**
> - **src** - Coordinates of triangle vertices in the source image
> - **dst** - Coordinates of corresponding triangle vertices in the destination triange


## Learn-by-Building
In the `homework` directory, you'll find a digital map `belitung_raw.jpg`. Your job is to apply what you've learned in this lesson to restore the map by correcting its skew and resize it appropriately. 

![](assets/hw1_belitung.png)


## References
[^1]: Fisher, R., Perkins, S., Walker, A., Wolfart, E., Hypermedia Image Processing Learning (HIPR2) Resources, 2003

[^2]: [MathWorks](https://www.mathworks.com/discovery/affine-transformation.html), Linear mapping method using affine transformation, Affine Transformation

================================================
FILE: transformation/rotate_01.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/cvess.png")
cv2.imshow("Original", img)
cv2.waitKey(0)
(h, w) = img.shape[:2]

center = (w // 2, h // 2)
# getRotationMatrix2D creates our 2x3 matrix
mat = cv2.getRotationMatrix2D(center, angle=270, scale=1)
print(f'270 degree anti-clockwise: \n {np.round(mat, 2)}')
rotated = cv2.warpAffine(img, mat, (w, h))
cv2.imshow("Rotated", rotated)
cv2.waitKey(0)

================================================
FILE: transformation/scale_01.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/corgi.png")
cv2.circle(img, (10, 10), 4, (0, 255, 255), -1)
cv2.circle(img, (80, 10), 4, (0, 255, 255), -1)
cv2.circle(img, (10, 80), 4, (0, 255, 255), -1)
cv2.imshow("Original", img)

coords_s = np.float32([[10, 10], [80, 10], [10, 80]])
coords_d = np.float32([[10, 10], [80, 10], [10, 80]])

# getAffineTransform creates our 2x3 matrix
mat = cv2.getAffineTransform(src=coords_s, dst=coords_d)
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(200, 200))
cv2.imshow("Warped", result)

cv2.waitKey(0)
cv2.destroyAllWindows()


================================================
FILE: transformation/scale_02.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/corgi.png")
cv2.circle(img, (10, 10), 4, (255, 0, 0), -1)
cv2.circle(img, (80, 10), 4, (0, 255, 0), -1)
cv2.circle(img, (10, 80), 4, (0, 0, 255), -1)
cv2.imshow("Original", img)

# custom transformation matrix
mat = np.float32([[1, 0, 0], [0, 1, 0]])
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(200, 200))
cv2.imshow("Warped", result)

cv2.waitKey(0)
cv2.destroyAllWindows()


================================================
FILE: transformation/scale_03.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/corgi.png")
cv2.imshow("Original", img)

# custom transformation matrix
mat = np.float32([[1, 0, 0], [0, 1, 0]])
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(600, 600))
cv2.imshow("600x600", result)

cv2.waitKey(0)
cv2.destroyAllWindows()


================================================
FILE: transformation/scale_04.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/corgi.png")
cv2.imshow("Original", img)

# custom transformation matrix
mat = np.float32([[3, 0, 0], [0, 3, 0]])
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(600, 600))
cv2.imshow("600x600 with Scaling", result)

cv2.waitKey(0)
cv2.destroyAllWindows()


================================================
FILE: transformation/scale_05.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/corgi.png")
cv2.circle(img, (10, 10), 4, (0, 255, 255), -1)
cv2.circle(img, (80, 10), 4, (0, 255, 255), -1)
cv2.circle(img, (10, 80), 4, (0, 255, 255), -1)
cv2.imshow("Original", img)

coords_s = np.float32([[10, 10], [80, 10], [10, 80]])
coords_d = np.float32([[10, 10], [95, 10], [10, 80]])

# getAffineTransform creates our 2x3 matrix
mat = cv2.getAffineTransform(src=coords_s, dst=coords_d)
print(mat)
result = cv2.warpAffine(img, M=mat, dsize=(200, 200))
cv2.imshow("Warped", result)

cv2.waitKey(0)
cv2.destroyAllWindows()


================================================
FILE: transformation/translate_01.py
================================================
import numpy as np
import cv2

img = cv2.imread("assets/cvess.png")
cv2.imshow("Original", img)
cv2.waitKey(0)
(h, w) = img.shape[:2]

# Specify our 2x3 matrix
mat = np.float32([[1, 0, -140], [0, 1, 20]])
translated = cv2.warpAffine(img, mat, (w, h))
cv2.imshow("Translated", translated)
cv2.waitKey(0)