Full Code of onlyphantom/cvessentials for AI

master e32691a5f1af cached
48 files
459.3 KB
133.6k tokens
4 symbols
1 requests
Download .txt
Showing preview only (479K chars total). Download the full file or copy to clipboard to get everything.
Repository: onlyphantom/cvessentials
Branch: master
Commit: e32691a5f1af
Files: 48
Total size: 459.3 KB

Directory structure:
gitextract_nnlovcxn/

├── .gitignore
├── README.md
├── digitrecognition/
│   ├── contourarea_01.py
│   ├── contourarea_02.py
│   ├── contourarea_03.py
│   ├── digit_01.py
│   ├── digitrec.html
│   ├── digitrec.md
│   ├── morphological_01.py
│   ├── morphological_02.py
│   ├── roi_01.py
│   ├── roi_02.py
│   └── utils/
│       └── enumerate.py
├── edgedetect/
│   ├── adaptivethresholding_01.py
│   ├── canny_01.py
│   ├── contour_01.py
│   ├── contourapprox.py
│   ├── edgedetect.html
│   ├── edgedetect.md
│   ├── gaussianblur_01.py
│   ├── gradient.py
│   ├── img2surface.py
│   ├── intensitythresholding_01.py
│   ├── kernel.html
│   ├── kernel.md
│   ├── meanblur_01.py
│   ├── meanblur_02.py
│   ├── meanblur_03.py
│   ├── sharpening_01.py
│   ├── sharpening_02.py
│   ├── sobel_01.py
│   ├── sobel_02.py
│   ├── sobel_03.py
│   ├── unsharpmask_01.py
│   ├── unsharpmask_02.py
│   └── utils/
│       └── gaussiancurve.r
├── quiz.md
├── requirements.txt
├── summarynotes/
│   └── class2201.md
└── transformation/
    ├── lecture_affine.html
    ├── lecture_affine.md
    ├── rotate_01.py
    ├── scale_01.py
    ├── scale_02.py
    ├── scale_03.py
    ├── scale_04.py
    ├── scale_05.py
    └── translate_01.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
solutions/
.DS_Store
.vscode/
answers.md


================================================
FILE: README.md
================================================
# Essentials of Computer Vision  

![](assets/blurb.png)

A math-first approach to learning computer vision in Python. The repository will contain all HTML, PDF, Markdown, Python Scripts, data, and media assets (images or links to supplementary videos). If you wish to contribute, I need translations for Bahasa Indonesia. Please submit a Pull Request.

## Study Guide
### Chapter 1
- Affine Transformation
    - [Definition](transformation/lecture_affine.html#definition)
        - [Mathematical Definitions](transformation/lecture_affine.html#mathematical-definitions)
    - [Practical Examples](transformation/lecture_affine.html#practical-examples)
    - [Motivation](transformation/lecture_affine.html#motivation)
    - [Getting Affine Transformation](transformation/lecture_affine.html#getting_affine-transformation)
        - [Trigonometry Proof](transformation/lecture_affine.html#trigonometry-proof)
    - [Code Illustrations](transformation/lecture_affine.html#code-illustrations)
    - [Summary and Key Points](transformation/lecture_affine.html#summary-and-key-points)
    - Optional video 
        - [Rotation Matrix Explained Visually](https://www.youtube.com/watch?v=tIixrNtLJ8U)
            - [w/ Bahasa Indonesia voiceover](https://www.youtube.com/watch?v=pWfXR_HmyUw)
    - References and learn-by-building modules

### Chapter 2
- Kernel Convolutions
    - [Definition](edgedetect/kernel.html#definition)
        - Optional video
            -  [Kernel Convolutions Explained Visually](https://www.youtube.com/watch?v=WMmHcrX4Obg)
        - [Mathematical Definitions](edgedetect/kernel.html#mathematical-definitions)
        - [Padding](edgedetect/kernel.html#a-note-on-padding)
    - [Smoothing and Blurring](edgedetect/kernel.html#smoothing-and-blurring)
    - [A Note on Terminology](edgedetect/kernel.html#a-note-on-terminology)
        - Kernels or Filters?
        -   Correlations vs Convolutions?
    - [Code Illustrations: Mean Filtering](edgedetect/kernel.html#code-illustrations-mean-filtering)
    - [Role in Convolution Neural Networks](edgedetect/kernel.html#role-in-convolutional-neural-networks)
    - [Handy Kernels for Image Processing](edgedetect/kernel.html#handy-kernels-for-image-processing)
        - [Gaussian Filtering](edgedetect/kernel.html#gaussian-filtering)
        - [Sharpening Kernels](edgedetect/kernel.html#sharpening-kernels)
        - [Gaussian Kernels for Sharpening](edgedetect/kernel.html#approximate-gaussian-kernel-for-sharpening)
        - [Unsharp Masking](edgedetect/kernel.html#unsharp-masking)
    - [Summary and Key Points](edgedetect/kernel.html#summary-and-key-points)
    - References and learn-by-building modules

### Chapter 3
- Edge Detection
    - [Definition](edgedetect/edgedetect.html#definition)
    - [Gradient-based Edge Detection](edgedetect/edgedetect.html#gradient-based-edge-detection)
        - [Sobel Operator](edgedetect/edgedetect.html#sobel-operator)
            - [Discrete Derivative](edgedetect/edgedetect.html#intuition-discrete-derivative)
            - [Code Illustrations: Sobel Operator](edgedetect/edgedetect.html#code-illustrations-sobel-operator)
        - [Gradient Orientation & Magnitude](edgedetect/edgedetect.html#dive-deeper-gradient-orientation-magnitude)
    - [Image Segmentation](edgedetect/edgedetect.html#image-segmentation)
        - [Intensity-based Segmentation](edgedetect/edgedetect.html#intensity-based-segmentation)
            - [Simple Thresholding](edgedetect/edgedetect.html#simple-thresholding)
            - [Adaptive Thresholding](edgedetect/edgedetect.html#adaptive-thresholding)
        - [Edge-based Contour Estimation](edgedetect/edgedetect.html#edge-based-contour-estimation)
            - [Contour Retrieval and Approximation](edgedetect/edgedetect.html#contour-retrieval-and-approximation)
    - [Canny Edge Detector](edgedetect/edgedetect.html#canny-edge-detector)
        - [Edge Thinning](edgedetect/edgedetect.html#edge-thinning)
        - [Hysteresis Thresholding](edgedetect/edgedetect.html#hysteresis-thresholding)
    - References and learn-by-building modules

### Chapter 4
- Digit Classification
    - [A Note on Deep Learning](digitrecognition/digitrec.html#what-about-deep-learning)
        - [Why not MNIST?](digitrecognition/digitrec.html#region-of-interest)
    - Region of Interest
        - [ROI identification](digitrecognition/digitrec.html#selecting-region-of-interest)
        - [Arc Length and Area Size](digitrecognition/digitrec.html#arc-length-and-area-size)
            - [Dive Deeper: ROI](digitrecognition/digitrec.html#dive-deeper-roi)
        - [ROI extraction](digitrecognition/digitrec.html#roi-extraction)
    - [Morphological Transformations](digitrecognition/digitrec.html#morphological-transformations)
        - [Erosion](digitrecognition/digitrec.html#erosion)
        - [Dilation](digitrecognition/digitrec.html#dilation)
        - [Opening and Closing](digitrecognition/digitrec.html#opening-and-closing)
        - [Learn-by-building: Morphological Transformation](digitrecognition/digitrec.html#learn-by-building-morphological-transformation)
    - [Seven-segment display](digitrecognition/digitrec.html#seven-segment-display)
        - [Practical Strategies](digitrecognition/digitrec.html#practical-strategies)
            - [Contour Properties](digitrecognition/digitrec.html#contour-properties)
    - [References and learn-by-building modules](digitrecognition/digitrec.html#references)

### Chapter 5
- Facial Recognition

## Approach and Motivation
The course is foundational to anyone who wish to work with computer vision in Python. It covers some of the most common image processing routines, and have in-depth coverage on mathematical concepts present in the materials: 
- Math-first approach
- Tons of sample python scripts (.py)
    - 45+ python scripts from chapter 1 to 4 for plug-and-play experiments
- Multimedia (image illustrations, video explanation, quiz)
    - 57 image assets from chapter 1 to 4 for practical illustrations
    - 4 PDFs, and 4 HTMLs, one for each chapter
- Practical tips on real-world applications

The course's **only dependency** is `OpenCV`. Getting started is as easy as `pip install opencv-contrib-python` and you're set to go.

##### Question: What about deep learning libraries?

No; While using deep learning for images made for interesting topics, they are probably better suited as an altogether separate course series. This course series (tutorial series) focused on the **essentials of computer vision** and,
for pedagogical reasons, try not to be overly ambitious with the scope it intends to cover. 

There will be similarity in concepts and principles, as modern neural network architectures draw plenty of inspirations from "classical" computer vision techniques that predate it. By first learning how computer vision problems are solved, the student can compare that to the deep learning equivalent, which result in a more comprehensive appreciation of what deep learning offer to modern day computer scientists. 

## Course Materials Preview:
### Python scripts
![](digitrecognition/assets/croproi.gif)

### PDF and HTML
![](assets/ecv_caption.gif)


# Workshops
I conduct in-person lectures using the materials you find in this repository. These workshops are usually paid because there are upfront costs to afford a venue and crew. Not just any venue, but a learning environment that is fully equipped (audio, desks, charging points for everyone, massive screen projector, walking space fo teaching assistants, dinner). 

You can follow me [on LinkedIn](http://linkedin.com/in/chansamuel/) to be updated about the latest workshops. I also make long-form programming tutorials and lessons on computer vision on [my YouTube channel](https://www.youtube.com/@SamuelChan)

### Introduction to AI in Computer Vision
- 4th January 2020, Jakarta
    - Kantorkuu, Citywalk sudirman, Jakarta Pusat
    - Time: 1300-1600
    - 3 hour
    - Fee: Free for Algoritma Alumni, 100k IDR for public

### Computer Vision: Principles and Practice
- 21st and 22nd January 2020, Jakarta
    - Accelerice, Jl. Rasuna Said, Jakarta Selatan
    - Time: 1830-2130
    - 6 Hour
    - Fee: Free for Algoritma Alumni, 1.5m IDR for public

- 24th and 25th Feburary 2020, Bangkok
    - JustCo, Samyan Mitrtown
    - Time: 1830-2130
    - 6 Hour
    - Fee: Free for Algoritma Alumni, 9000 THB for public


## Image Assets
- `car2.png`, `pen.jpg`, `lego.jpg` and `sudoku.jpg` are under Creative Commons (CC) license.

- `sarpi.jpg`, `castello.png`, `canal.png` and all other photography used are taken during my trip to Venice and you are free to use them. 

- All assets in Chapter 4 (the `digitrecognition` folder) are mine and you are free to use them.

- All other illustrations are created by me in Keynote. 

- Videos are created by me, and Bahasa Indonesia voice over on my videos is by [Tiara Dwiputri](https://github.com/tiaradwiputri)

## New to programming? 50-minute Quick Start
Here's a video: [Computer Vision Essentials 1](https://youtu.be/NWXY4ASRlgA) I created to get you through the installation and taking the first step into this lesson path.

If you need help in the course, attend my in-person workshops on this topic (Computer Vision Essentials, free) throughout the course of the year.

## Follow me
- [YouTube](https://www.youtube.com/@SamuelChan)
- [LinkedIn](http://linkedin.com/in/chansamuel/)
- [GitHub](https://github.com/onlyphantom)


================================================
FILE: digitrecognition/contourarea_01.py
================================================
import cv2

BCOLOR = (75, 0, 130)
THICKNESS = 4

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

cv2.drawContours(img_color, cnts, 0, BCOLOR, THICKNESS)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, BCOLOR, THICKNESS)
    print(f"ContourArea:{cv2.contourArea(cnt)}")
    cv2.imshow("Contour one by one", img_color)
    cv2.waitKey(0)


================================================
FILE: digitrecognition/contourarea_02.py
================================================
import cv2

PURPLE = (75, 0, 130)
YELLOW = (0, 255, 255)
THICKNESS = 4
FONT = cv2.FONT_HERSHEY_SIMPLEX

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:10]

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, PURPLE, THICKNESS)
    x, y, w, h = cv2.boundingRect(cnt)
    cv2.rectangle(img_color, (x, y), (x + w, y + h), YELLOW, THICKNESS)
    area = round(cv2.contourArea(cnt), 1)
    peri = round(cv2.arcLength(cnt, closed=True), 1)
    print(f"ContourArea:{area}, Peri: {peri}")
    cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE, 1)
    cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4, PURPLE, 1)

cv2.imshow("Contours", img_color)
cv2.waitKey(0)


================================================
FILE: digitrecognition/contourarea_03.py
================================================
import cv2

PURPLE = (75, 0, 130)
YELLOW = (0, 255, 255)
THICKNESS = 4
FONT = cv2.FONT_HERSHEY_SIMPLEX

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

cv2.drawContours(img_color, cnts, 0, PURPLE, THICKNESS)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

for i in range(len(cnts)):
    cv2.drawContours(img_color, cnts, i, PURPLE, THICKNESS)
    print(f"ContourArea:{cv2.contourArea(cnts[i])}")
    x, y, w, h = cv2.boundingRect(cnts[i])
    cv2.rectangle(img_color, (x, y), (x + w, y + h), YELLOW, THICKNESS)

    area = round(cv2.contourArea(cnts[i]), 1)
    peri = round(cv2.arcLength(cnts[i], closed=True), 1)
    print(f"ContourArea:{area}, Peri: {peri}")
    cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE, 1)
    cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4, PURPLE, 1)

    cv2.imshow("Contour one by one", img_color)
    cv2.waitKey(0)



================================================
FILE: digitrecognition/digit_01.py
================================================
import cv2
import numpy as np

FONT = cv2.FONT_HERSHEY_SIMPLEX
CYAN = (255, 255, 0)
DIGITSDICT = {
    (1, 1, 1, 1, 1, 1, 0): 0,
    (0, 1, 1, 0, 0, 0, 0): 1,
    (1, 1, 0, 1, 1, 0, 1): 2,
    (1, 1, 1, 1, 0, 0, 1): 3,
    (0, 1, 1, 0, 0, 1, 1): 4,
    (1, 0, 1, 1, 0, 1, 1): 5,
    (1, 0, 1, 1, 1, 1, 1): 6,
    (1, 1, 1, 0, 0, 1, 0): 7,
    (1, 1, 1, 1, 1, 1, 1): 8,
    (1, 1, 1, 1, 0, 1, 1): 9,
}


# roi_color = cv2.imread("inter/dbs-roi.png")
roi_color = cv2.imread("inter/ocbc-roi.png")
roi = cv2.cvtColor(roi_color, cv2.COLOR_BGR2GRAY)

RATIO = roi.shape[0] * 0.2

roi = cv2.bilateralFilter(roi, 5, 30, 60)

trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
roi_color = roi_color[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]
cv2.imshow("Blurred and Trimmed", trimmed)
cv2.waitKey(0)

edged = cv2.adaptiveThreshold(
    trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)
cv2.imshow("Edged", edged)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)

cv2.imshow("Dilated", dilated)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
dilated = cv2.dilate(dilated, kernel, iterations=1)

cv2.imshow("Dilated x2", dilated)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2, 1),)
eroded = cv2.erode(dilated, kernel, iterations=1)

cv2.imshow("Eroded", eroded)
cv2.waitKey(0)

h = roi.shape[0]
ratio = int(h * 0.07)
eroded[-ratio:,] = 0
eroded[:, :ratio] = 0

cv2.imshow("Eroded + Black", eroded)
cv2.waitKey(0)

cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
digits_cnts = []

canvas = trimmed.copy()
cv2.drawContours(canvas, cnts, -1, (255, 255, 255), 1)
cv2.imshow("All Contours", canvas)
cv2.waitKey(0)

canvas = trimmed.copy()
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    if h > 20:
        digits_cnts += [cnt]
        cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
        cv2.drawContours(canvas, cnt, 0, (255, 255, 255), 1)
        cv2.imshow("Digit Contours", canvas)
        cv2.waitKey(0)

print(f"No. of Digit Contours: {len(digits_cnts)}")


cv2.imshow("Digit Contours", canvas)
cv2.waitKey(0)


sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])

canvas = trimmed.copy()


for i, cnt in enumerate(sorted_digits):
    (x, y, w, h) = cv2.boundingRect(cnt)
    cv2.rectangle(canvas, (x, y), (x + w, y + h), (0, 0, 0), 1)
    cv2.putText(canvas, str(i), (x, y - 3), FONT, 0.3, (0, 0, 0), 1)

cv2.imshow("All Contours sorted", canvas)
cv2.waitKey(0)

digits = []
canvas = roi_color.copy()
for cnt in sorted_digits:
    (x, y, w, h) = cv2.boundingRect(cnt)
    roi = eroded[y : y + h, x : x + w]
    print(f"W:{w}, H:{h}")
    # convenience units
    qW, qH = int(w * 0.25), int(h * 0.15)
    fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)

    # seven segments in the order of wikipedia's illustration
    sevensegs = [
        ((0, 0), (w, qH)),  # a (top bar)
        ((w - qW, 0), (w, halfH)),  # b (upper right)
        ((w - qW, halfH), (w, h)),  # c (lower right)
        ((0, h - qH), (w, h)),  # d (lower bar)
        ((0, halfH), (qW, h)),  # e (lower left)
        ((0, 0), (qW, halfH)),  # f (upper left)
        # ((0, halfH - fractionH), (w, halfH + fractionH)) # center
        (
            (0 + fractionW, halfH - fractionH),
            (w - fractionW, halfH + fractionH),
        ),  # center
    ]

    # initialize to off
    on = [0] * 7

    for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
        region = roi[p1y:p2y, p1x:p2x]
        print(
            f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
        )
        if np.sum(region == 255) > region.size * 0.5:
            on[i] = 1
        print(f"State of ON: {on}")

    digit = DIGITSDICT[tuple(on)]
    print(f"Digit is: {digit}")
    digits += [digit]
    cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
    cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
    cv2.imshow("Digit", canvas)
    cv2.waitKey(0)

print(f"Digits on the token are: {digits}")



================================================
FILE: digitrecognition/digitrec.html
================================================
<!DOCTYPE html><html><head>
      <title>digitrec</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      
        <script type="text/x-mathjax-config">
          MathJax.Hub.Config({"extensions":["tex2jax.js"],"jax":["input/TeX","output/HTML-CSS"],"messageStyle":"none","tex2jax":{"processEnvironments":false,"processEscapes":true,"inlineMath":[["$","$"],["\\(","\\)"]],"displayMath":[["$$","$$"],["\\[","\\]"]]},"TeX":{"extensions":["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]},"HTML-CSS":{"availableFonts":["TeX"]}});
        </script>
        <script type="text/javascript" async src="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.1/node_modules/@shd101wyy/mume/dependencies/mathjax/MathJax.js" charset="UTF-8"></script>
        
      
      

      
      
      
      
      
      
      

      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview h1,
.markdown-preview.markdown-preview h2,
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  font-weight: bolder;
  text-decoration-line: underline;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <h1 class="mume-header" id="background">Background</h1>

<p>In Chapter 4: Digit Recognition, we&apos;ll add a few new techniques to our image processing toolset by attempting to build a digit recognition pipeline from start to finish. Throughout the exercise, we will get to practice the image preprocessing tricks we&apos;ve picked up from previous chapters:</p>
<ul>
<li>Image manipulations such as resizing, cropping, rotation, color conversion</li>
<li>Blurring and sharpening operations</li>
<li>Thresholding and Edge Detection</li>
<li>Contour approximation</li>
</ul>
<p>New method and strategies that you&apos;ll be learning include:</p>
<ul>
<li>Drawing operations (rectangles, text) on our image</li>
<li>Region of interest and bounding rectangles</li>
<li>Morphological transformations</li>
<li>The Seven-Segment Display</li>
</ul>
<h2 class="mume-header" id="what-about-deep-learning">What about Deep Learning?</h2>

<p>To be clear, specialised deep learning libraries that have sprung out in recent years are a lot more robust in their approach. By utilizing machine learning principles (cost function, gradient descent etc), these specialised libraries can handle highly complex object recognition and OCR (optical character recognition) tasks at the cost of brute computing power.</p>
<p>The overarching motivation of this free course however, was to make clear to beginners what constitutes artificial intelligence, and to illustrate the principle benefits of machine learning. I try to achieve that by demonstrating -- over multiple chapters of this course -- how computer visions were traditionally, or rather &quot;classically&quot;, performed prior to the emergence of deep learning.</p>
<p>By learning the classical approaches to computer vision, the student (you) can compare the effort it takes to hand-tuning parameters and this adds a new dimension of appreciation towards self-learning methods that we&apos;ll discuss in the near future.</p>
<h2 class="mume-header" id="region-of-interest">Region of Interest</h2>

<p>Do a quick google search on &quot;digit recognition&quot; or &quot;digit classification&quot; and it&apos;s hard to find an introductory deep learning course that <strong>doesn&apos;t use</strong> the famous MNIST (Modified National Institute of Standards and Technology)<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup> database. This is a handwritten digit database that has long become the <em>de facto</em> in pretty much any machine learning tutorials:</p>
<p><img src="assets/mnist.png" alt></p>
<p>But I&apos;d argue, that for a budding computer vision developer, your learning objectives are better served by taking a different approach.</p>
<p>By choosing real life images, you are confronted with a few more key challenges that are not present from using a well-curated database such as MNIST. These challenges present new opportunities to learn about key concepts such as <strong>region of interest</strong>, and <strong>morphological operations</strong>, that you will come to rely upon greatly in the future.</p>
<p>First, take a look at 4 real-life pictures of security tokens issued by banks and institutional agencies (left-to-right: Bank Central Asia, DBS, OCBC Bank, OneKey for Singapore Government e-services):</p>
<p><img src="assets/securitytokens.png" alt></p>
<p>Notice how noisy these images are, as each image is shot with a different background, different lighting conditions, each token is of a different size and shape, and the different colors in each security token etc.</p>
<p>Your task, as a computer vision developer, is to develop a pipeline that, in each phase, take you closer to the goal. Roughly speaking, given the above task, we would formulate a pipeline that looks like the following:</p>
<ol>
<li>Preprocessing, noise reduction</li>
<li>Contour approximation</li>
<li>Find region of interest (ROI), that is the area of the LCD display in each of these pictures</li>
<li>Extract ROI for further preprocessing, discarding the rest of the image</li>
<li>Isolate each digit from the ROI</li>
<li>Iteratively classify each digit in the image</li>
<li>Combine the per-digit classification to a final string (&quot;output&quot;)</li>
</ol>
<p>In practice, step (1) and (2) above is the &quot;application&quot; of the methods you&apos;ve learned in previous chapters of this series. As we&apos;ll soon observe, we will use a combination of blurring operations and edge detection to draw our contours. Among the contours, one of them would be the LCD display containing the digits to be classified. That is our <strong>Region of Interest</strong>.</p>
<p><img src="assets/croproi.gif" alt></p>
<h3 class="mume-header" id="selecting-region-of-interest">Selecting Region of Interest</h3>

<p>The GIF above demonstrates the code in <code>roi_01.py</code> but essentially it shows the <code>selectROI</code> method in action. You&apos;ll commonly combined the <code>selectROI</code> method with a either a slicing operation to crop your region of interest, or a drawing operation to call attention to the specific region of the image.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">x<span class="token punctuation">,</span>y<span class="token punctuation">,</span>w<span class="token punctuation">,</span>h <span class="token operator">=</span> cv2<span class="token punctuation">.</span>selectROI<span class="token punctuation">(</span><span class="token string">&quot;Region of interest&quot;</span><span class="token punctuation">,</span> img<span class="token punctuation">)</span>
cropped <span class="token operator">=</span> img<span class="token punctuation">[</span>y<span class="token punctuation">:</span>y<span class="token operator">+</span>h<span class="token punctuation">,</span> x<span class="token punctuation">:</span>x<span class="token operator">+</span>w<span class="token punctuation">]</span>
<span class="token comment"># draw rectangle </span>
cv2<span class="token punctuation">.</span>rectangle<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span>y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token operator">+</span>w<span class="token punctuation">,</span>y<span class="token operator">+</span>h<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">255</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">)</span>
</pre><p>In most cases, it simply wouldn&apos;t be realistic to render an image before manually specifying our region of interest. We&apos;ll need this operation to be as close to automatic as possible. But how exactly? That depends greatly on the specific problem set.</p>
<p>In some cases, the obvious choice of strategy would be simply shape recognition, say by counting the number of vertices from each contour. The following code is an example implementation of that:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># cnt = contour</span>
peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token comment"># contour approximation</span>
cnt_appro <span class="token operator">=</span> cv2<span class="token punctuation">.</span>approxPolyDP<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> <span class="token number">0.03</span> <span class="token operator">*</span> peri<span class="token punctuation">,</span> <span class="token boolean">True</span><span class="token punctuation">)</span>
<span class="token keyword">if</span> <span class="token builtin">len</span><span class="token punctuation">(</span>cnt_approx<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">3</span><span class="token punctuation">:</span>
    est_shape <span class="token operator">=</span> <span class="token string">&apos;triangle&apos;</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
<span class="token keyword">elif</span> <span class="token builtin">len</span><span class="token punctuation">(</span>cnt_approx<span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">5</span><span class="token punctuation">:</span>
    est_shape <span class="token operator">=</span> <span class="token string">&apos;pentagon&apos;</span>
<span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span>
</pre><p>In other cases, you may employ a strategy that try to match contour based on Hu moments (which we&apos;ll study in details in future chapters).</p>
<p>Other methods may involve a saliency map, or a visual attention map, for ROI extraction. These methods create a new representation of the original image where each pixel&apos;s <strong>unique quality</strong> are amplified or emphasized. One example implementation on Wikipedia<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup> demonstrates how straightforward this concept really is:</p>
<p></p><div class="mathjax-exps">$$SALS(I_K) = \sum^{N}_{i=1}|I_k-I_i|$$</div><p></p>
<p>As you add new tools and strategies to your computer vision toolbox, you will pick up new approaches to ROI extraction. It is an interesting field of research that has been gaining a lot in popularity with the emergence of deep learning.</p>
<p>As for the images of bank security tokens, can you think of an approach that may be a good fit? Our region of interest is the LCD screen at the top of the button pad on each device, and they all seem to be rather consistent in shape and size. Give it some thought and read on to find out.</p>
<h3 class="mume-header" id="arc-length-and-area-size">Arc Length and Area Size</h3>

<p>I&apos;ve hinted at the shape and size being a factor, so maybe that would be a good starting point. The good news is the OpenCV made this incredibly easy through the <code>contourArea()</code> and <code>arcLength()</code> function.</p>
<p>The following snippet of code, lifted from <code>contourarea_01.py</code>, finds all contours and sort them by area size in descending order before storing the first 10 in <code>cnts</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token comment"># sort contours by contourArea, and get the first 10</span>
cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">9</span><span class="token punctuation">]</span>
</pre><p>We can also obtain the contour area and parameter iteratively in a for-loop, like the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token builtin">len</span><span class="token punctuation">(</span>cnts<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    area <span class="token operator">=</span> cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span>
    peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">,</span> closed<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&apos;Area:</span><span class="token interpolation"><span class="token punctuation">{</span>area<span class="token punctuation">}</span></span><span class="token string">, Perimeter:</span><span class="token interpolation"><span class="token punctuation">{</span>peri<span class="token punctuation">}</span></span><span class="token string">&apos;</span></span><span class="token punctuation">)</span>
</pre><p>In effect, we&apos;re looping through each contour that the <code>findContours()</code> operation found, and computing two values each time, <code>area</code> and <code>peri</code>.</p>
<p>Note that the contour perimeter is also known as the arc length. The second argument <code>closed</code> specify whether the shape is a closed contour (<code>True</code>) or just a curve (<code>closed=False</code>).</p>
<p>Execute <code>contourarea_01.py</code> and observe how each contour is displayed, from the one with the largest area to the one with the least, for a total of 10 contours. As you run the script on different pictures of bank security tokens, you see that it does a reliable job at finding the contours, sorting them, and returning our LCD display screen as the first in the list. This makes sense, because visually it is apparent that the LCD display occupy the largest area among other closed shapes in our picture.</p>
<h4 class="mume-header" id="dive-deeper-roi">Dive Deeper: ROI</h4>

<ol>
<li>
<p>Use <code>assets/dbs.jpg</code> instead of <code>assets/ocbc.jpg</code> in <code>contourarea_01.py</code>. Were you able to extract the region of interest (LCD Display) successfully without any changes to the script?</p>
</li>
<li>
<p>Could we have successfully extract our region of interest have we used <code>arcLength</code> in our strategy?</p>
</li>
<li>
<p>Supposed we only wanted to extract the region of interest and not the rest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.</p>
</li>
<li>
<p>Supposed we wanted the contours sorted according to their respective area, from the smallest to the largest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.</p>
</li>
</ol>
<p>While working through the exercises above, you may find it helpful to also draw the text describing the area size and perimeter next to each contour. I&apos;ve shown you how this can be done in <code>contourarea_02.py</code> but the essential addition we make to the earlier code is the two calls to <code>putText()</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">PURPLE <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">75</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">130</span><span class="token punctuation">)</span>
THICKNESS <span class="token operator">=</span> <span class="token number">1</span>
FONT <span class="token operator">=</span> cv2<span class="token punctuation">.</span>FONT_HERSHEY_SIMPLEX
cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token string">&quot;Area:&quot;</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>area<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y <span class="token operator">-</span> <span class="token number">15</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.4</span><span class="token punctuation">,</span> PURPLE<span class="token punctuation">,</span>THICKNESS<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> <span class="token string">&quot;Perimeter:&quot;</span> <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>peri<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y <span class="token operator">-</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.4</span><span class="token punctuation">,</span>PURPLE<span class="token punctuation">,</span> THICKNESS<span class="token punctuation">)</span>
</pre><p><img src="assets/textcontour.png" alt></p>
<h3 class="mume-header" id="roi-extraction">ROI extraction</h3>

<p>With these foundations, we are now ready to write a simple utility script that:</p>
<ol>
<li>Find our region of interest</li>
<li>Crop ROI into a new image</li>
<li>Save it into an folder named <code>/inter</code> (intermediary) for the actual digit recognition later</li>
</ol>
<p>Much of what you need to do has already been presented so far, but the core pieces are, lifted from <code>roi_02.py</code> the following few lines of code:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">)</span>
blurred <span class="token operator">=</span> cv2<span class="token punctuation">.</span>GaussianBlur<span class="token punctuation">(</span>img<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">7</span><span class="token punctuation">,</span> <span class="token number">7</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span>
edged <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Canny<span class="token punctuation">(</span>blurred<span class="token punctuation">,</span> <span class="token number">130</span><span class="token punctuation">,</span> <span class="token number">150</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">)</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">1</span><span class="token punctuation">]</span>

x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnts<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
roi <span class="token operator">=</span> img<span class="token punctuation">[</span>y <span class="token punctuation">:</span> y <span class="token operator">+</span> h<span class="token punctuation">,</span> x <span class="token punctuation">:</span> x <span class="token operator">+</span> w<span class="token punctuation">]</span>
cv2<span class="token punctuation">.</span>imwrite<span class="token punctuation">(</span><span class="token string">&quot;roi.png&quot;</span><span class="token punctuation">,</span> roi<span class="token punctuation">)</span>
</pre><p>The <code>roi_02.py</code> utility script uses the <code>argparse</code> library so user can specify a file path with a flag <code>-p</code> (or <code>--path</code>) like such:</p>
<pre data-role="codeBlock" data-info="bash" class="language-bash">python roi_02.py -p assets/ocbc.jpg
<span class="token comment"># equivalent:</span>
python roi_02.py --path assets/ocbc.jpg
</pre><p>If the user do not specify a file path using the <code>-p</code> flag, the default value would be <code>assets/ocbc.jpg</code>. If you wish to change this, edit <code>roi_02.py</code> and specify a different value for the <code>default</code> parameter.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">parser <span class="token operator">=</span> argparse<span class="token punctuation">.</span>ArgumentParser<span class="token punctuation">(</span><span class="token punctuation">)</span>
parser<span class="token punctuation">.</span>add_argument<span class="token punctuation">(</span><span class="token string">&quot;-p&quot;</span><span class="token punctuation">,</span> <span class="token string">&quot;--path&quot;</span><span class="token punctuation">,</span> default<span class="token operator">=</span><span class="token string">&quot;assets/ocbc.jpg&quot;</span><span class="token punctuation">)</span>
</pre><p>You should run this exercise using <code>dbs.jpg</code>, <code>ocbc2.jpg</code>, or <code>onekey.jpg</code> at least once. Execute the script and check the <code>inter</code> folder to confirm that the ROI has been saved. When you&apos;re done, you are ready to move on to the next phase of the digit recognition pipeline.</p>
<h2 class="mume-header" id="morphological-transformations">Morphological Transformations</h2>

<p>Once the region of interest is obtained, we now have an image that may still contain noises. This is especially the case when our ROI is obtained by means of thresholding methods, since you can expect some &quot;non-features&quot; (noises) to also be included in the resulting image.</p>
<p>To account for these imperfections, we will now perform a series of operations on our image. We&apos;ll learn what they are formally, but let&apos;s begin by seeing what is it that they <em>offer</em> to our image processing pipeline. I&apos;ve included a picture with some random noise, as follow:</p>
<p><img src="assets/0417s.png" alt></p>
<p>The digit &quot;0417&quot; is clearly discernible to the human eye despite the presence of noise. However, consider the perspective of a global thresholding operation; These pixel values are &quot;noise&quot; to us but a computer has no such notion of which pixel values are meaningful and what others are not. A thresold value such as the global mean will take all values into account indiscriminately. A contour finding operation will, instead of 4, return thousands of tiny round segments (they may be tiny, but they are completely valid contours).</p>
<p>An image processing pipeline that fail to account for these may result in sub-optimal performance or, very often, completely undesired results.</p>
<p>Enter two of the most fundamental morphological transformations: <strong>erosion</strong> and <strong>dilation</strong>.</p>
<h3 class="mume-header" id="erosion">Erosion</h3>

<p>Erosion &quot;erodes away the boundaries of foreground object&quot;<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup> by sliding a kernel through the image and set a pixel to 1 <strong>only if all the pixels under the kernel is 1</strong>.</p>
<p>This in effect discard pixels near the boundary and any floating pixels that are not part of a larger blob (which is what the human eye is interested in). Because pixels are eroded, your foreground object will shrink in size.</p>
<h3 class="mume-header" id="dilation">Dilation</h3>

<p>The opposite of erosion, Dilation sets a pixel to 1 if <strong>at least one pixel under the kernel is 1</strong>, essentially &quot;growing&quot; the foreground object.</p>
<p>Because of how these operations work, there are a couple of things to note:</p>
<ol>
<li>Morphological transformations are usually performed on binary images. Recall that pixel values in binary images are either a full white (i.e 1) or black (i.e 0).</li>
<li>As per convention, we want to keep our foregound in white and background in black</li>
<li>Because erosion results in a shrinking foreground and dilation results in a growing foreground, these two operations are also commonly used in combinations, i.e erosion followed by dilation, or vice versa</li>
</ol>
<p><img src="assets/morphexample.png" alt></p>
<p>As we read our image in grayscale mode (<code>flags=0</code>), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled &quot;Original&quot; above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image.</p>
<p>An erosion operation is then performed (step 3). This works by creating our kernel (either through <code>numpy</code> or through <code>opencv</code>&apos;s structuring element) and sliding that kernel across our image to remove white noises in our image.</p>
<p>The side-effect is that our foreground object has now shrunk in size as it&apos;s boundaries are eroded away. We grow it back by applying a dilation (step 4) and finally show the output as illustrated in the bottom-right pane of the image above.</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># read as grayscale</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;assets/0417s.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>
<span class="token comment"># step 1: </span>
_<span class="token punctuation">,</span> thresh <span class="token operator">=</span> cv2<span class="token punctuation">.</span>threshold<span class="token punctuation">(</span>roi<span class="token punctuation">,</span> <span class="token number">170</span><span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY<span class="token punctuation">)</span>
<span class="token comment"># step 2:</span>
inv <span class="token operator">=</span> cv2<span class="token punctuation">.</span>bitwise_not<span class="token punctuation">(</span>thresh<span class="token punctuation">)</span>
<span class="token comment"># step 3 (option 1):</span>
kernel <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">)</span>
<span class="token comment"># step 3 (option 2):</span>
kernel <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getStructuringElement<span class="token punctuation">(</span>cv2<span class="token punctuation">.</span>MORPH_ELLIPSE<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
eroded <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>inv<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># step 4:</span>
dilated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Transformed&quot;</span><span class="token punctuation">,</span> dilated<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
</pre><p>OpenCV provides the three shapes for our kernel:</p>
<ul>
<li>Rectangular box: <code>MORPH_RECT</code></li>
<li>Cross: <code>MORPH_CROSS</code></li>
<li>Ellipse: <code>MORPH_ELLIPSE</code></li>
</ul>
<p>They are fed as the first argument into <code>cv2.getStructuringElement()</code>, with the second being the kernel size (<code>ksize</code>) itself. The third argument is the <em>anchor point</em>, which defaults to the center.</p>
<h3 class="mume-header" id="opening-and-closing">Opening and Closing</h3>

<p>Another name for <strong>Erosion, followed by Dilation</strong> is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first <strong>perform Dilation followed by Erosion</strong>, particularly suited for closing small holes inside foreground objects.</p>
<p>OpenCV includes the more generic <code>morphologyEx</code> method for all other morphological operations beyond Erosion and Dilation. The function takes an image as the first argument, an operation as the second operation and finally the kernel. Compare how your code will differ between <code>cv2.erode</code> and <code>cv2.dilate</code>, and their respective equivalence in <code>cv2.morphologyEx()</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token keyword">import</span> cv2
<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np

img <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&apos;image.png&apos;</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span>
kernel <span class="token operator">=</span> np<span class="token punctuation">.</span>ones<span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">5</span><span class="token punctuation">,</span><span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">,</span>np<span class="token punctuation">.</span>uint8<span class="token punctuation">)</span>
erosion <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>img<span class="token punctuation">,</span>kernel<span class="token punctuation">,</span>iterations <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># Equivalent:</span>
<span class="token comment"># cv2.morphologyEx(img, cv2.MORPH_ERODE, kernel,iterations=1)</span>
dilation <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>img<span class="token punctuation">,</span>kernel<span class="token punctuation">,</span>iterations <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">)</span>
<span class="token comment"># Equivalent:</span>
<span class="token comment"># cv2.morphologyEx(img, cv2.MORPH_DILATE, kernel,iterations=1)</span>
opening <span class="token operator">=</span> cv2<span class="token punctuation">.</span>morphologyEx<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>MORPH_OPEN<span class="token punctuation">,</span> kernel<span class="token punctuation">)</span>
closing <span class="token operator">=</span> cv2<span class="token punctuation">.</span>morphologyEx<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>MORPH_CLOSE<span class="token punctuation">,</span> kernel<span class="token punctuation">)</span>
</pre><h3 class="mume-header" id="learn-by-building-morphological-transformation">Learn-by-building: Morphological Transformation</h3>

<p>In the <code>homework</code> directory, you&apos;ll find <code>0417h.png</code>. Your job is to apply what you&apos;ve learned in this lesson to clean up the image. Your output should have these qualities:</p>
<ol>
<li>As free of noise as possible (remove the lines, and the red splatted dots across the image)</li>
<li>If you run <code>findContours()</code> on the output, you should have exactly 4 contours</li>
<li>Foreground object in white, background in black</li>
</ol>
<p><img src="homework/0417h.png" alt></p>
<p>You are free to pick your strategy, but a reference solution would look like the following:</p>
<p><img src="assets/0417reference.png" alt></p>
<h2 class="mume-header" id="seven-segment-display">Seven-segment display</h2>

<p>The seven-segment display (known also as &quot;seven-segment indicator&quot;) is a form of electronic display device for displaying decimal numerals<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup> widely used in digital clocks, electronic meters, calculators and banking security tokens.</p>
<p><img src="assets/sevenseg.png" alt></p>
<p>This is relevant because it is the character representation of our digits in each of these security tokens. If we can isolate each digit from each other, we can iteratively predict the &quot;class&quot; of each digit (0 to 9). Specifically, we are going to perform a classification task based on the state of each segment.</p>
<p>To ease our understanding, let&apos;s refer to each segment using the letters A to G:</p>
<p><img src="assets/sevenseg1.png" alt></p>
<p>We can then create a lookup table that match the collective states to the corresponding class:</p>
<table>
<thead>
<tr>
<th>Class</th>
<th>a</th>
<th>b</th>
<th>c</th>
<th>d</th>
<th>e</th>
<th>f</th>
<th>g</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>6</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>7</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>9</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>
<p>How would we represent such a lookup table in our Python code and how would we use it? The obvious answer to the first question is a dictionary. Notice that <code>DIGITSDICT</code> is just a representation of the &quot;binary state&quot; of each segment. The digit &quot;8&quot; for example correspond to all seven segments being activated, or &quot;on&quot; (state of <code>1</code>).</p>
<pre data-role="codeBlock" data-info="py" class="language-python">DIGITSDICT <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">0</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">1</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">2</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">3</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">4</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">5</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">6</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">7</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">8</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span><span class="token number">9</span>
<span class="token punctuation">}</span>
</pre><p>Then, for each digit, we would look at the pixel values in each of the seven segments, and if the majority of pixels are white, we would classify that segment as being in an activated state (<code>1</code>), otherwise in a state of <code>0</code>. As we iterate over the 7 segments, we now have an array of length 7, each element a binary value(<code>0</code> or <code>1</code>).</p>
<p>We would then find the corresponding value in our dictionary using that array. Your code would resemble the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token comment"># define the rectangle areas corresponding each segment</span>
sevensegs <span class="token operator">=</span> <span class="token punctuation">[</span>
    <span class="token punctuation">(</span><span class="token punctuation">(</span>x0<span class="token punctuation">,</span> y0<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x1<span class="token punctuation">,</span> y1<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token punctuation">(</span>x2<span class="token punctuation">,</span> y2<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x3<span class="token punctuation">,</span> y3<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>
    <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span> <span class="token comment"># 7 of them</span>
<span class="token punctuation">]</span>

<span class="token comment"># initialize the state to OFF</span>
on <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">7</span> 

<span class="token comment"># set each segment to ON / OFF based on majority</span>
<span class="token keyword">for</span> <span class="token punctuation">(</span>i<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>p1x<span class="token punctuation">,</span> p1y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>p2x<span class="token punctuation">,</span> p2y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>sevensegs<span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token comment"># numpy slicing to extract only one region</span>
    region <span class="token operator">=</span> roi<span class="token punctuation">[</span>p1y<span class="token punctuation">:</span>p2y<span class="token punctuation">,</span> p1x<span class="token punctuation">:</span>p2x<span class="token punctuation">]</span>
    <span class="token comment"># if majority pixels are white, set state to ON</span>
    <span class="token keyword">if</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> region<span class="token punctuation">.</span>size <span class="token operator">*</span><span class="token number">0.5</span><span class="token punctuation">:</span>
        on<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1</span>

<span class="token comment"># lookup on dictionary</span>
digit <span class="token operator">=</span> DIGITSDICT<span class="token punctuation">[</span><span class="token builtin">tuple</span><span class="token punctuation">(</span>on<span class="token punctuation">)</span><span class="token punctuation">]</span> <span class="token comment"># digit is one of 0-9</span>
</pre><p>There are multiple ways to write a for-loop but it&apos;s important that you are aware of the order in which your for-loop your executing. Referring to our seven-segment illustration below,the first iteration is only concerned with the state of &apos;A&apos; while the second interation handles the state of &apos;B&apos;, and so on.</p>
<p><img src="assets/sevenseg1.png" alt></p>
<p>Using <code>enumerate</code>, we obtain an additional counter (<code>i</code>) to our iterable (<code>sevensegs</code>); This is convenient for the purpose of setting states. At the first iteration, the first element is our list is conditionally set to 1 if more than half of the pixels in segment &apos;A&apos; are white. A more detailed example of python&apos;s enumeration is in <code>utils/enumerate.py</code>.</p>
<h3 class="mume-header" id="practical-strategies">Practical Strategies</h3>

<p>If you are paying close attention to the digit &apos;0&apos; in our LCD display, you will notice that the absence of the &apos;G&apos; segment causes a pretty visible and significant gap. When you test your digit recognition script without special consideration to this attribute, you will find it consistently failing to account for the numbers &quot;0&quot;,&quot;1&quot; and &quot;7&quot;. In fact, you may not even be able to isolate the aforementioned numbers altogether using the <code>findContour</code> operation, because they were treated as two disjointed pieces instead of a whole piece.</p>
<p>A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you&apos;ve learned earlier.</p>
<p>Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand.</p>
<p>As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;roi.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>
RATIO <span class="token operator">=</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">0.2</span>
trimmed <span class="token operator">=</span> roi<span class="token punctuation">[</span>
    <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span><span class="token punctuation">,</span> 
    <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span><span class="token punctuation">]</span>
</pre><p>That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution <strong>only</strong> working on those specific images and not others, a phenomenon fondly termed as &quot;overfitting&quot; in the machine learning community.</p>
<p>I&apos;ve re-executed the solution code against some sample image sets, once with the &quot;trimming&quot; in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more &quot;blank space&quot; at the top half compared to the bottom half.</p>
<h4 class="mume-header" id="contour-properties">Contour Properties</h4>

<p>Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits &quot;40710382&quot;, our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as &quot;1740238&quot;.</p>
<p>There are a few strategies you can employ here. We&apos;ve seen in  <code>contourarea_01.py</code> and <code>contourarea_02.py</code> how contour has attributes that can be retrieved using the <code>contourArea()</code> and <code>arcLength()</code> functions. Inspect the following snippet and it should help jog your memory:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">cnts <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>cnts<span class="token punctuation">,</span> key<span class="token operator">=</span>cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">,</span> reverse<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token number">9</span><span class="token punctuation">]</span>

<span class="token keyword">for</span> i<span class="token punctuation">,</span> cnt <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>cnts<span class="token punctuation">)</span><span class="token punctuation">:</span>
    cv2<span class="token punctuation">.</span>drawContours<span class="token punctuation">(</span>img_color<span class="token punctuation">,</span> cnts<span class="token punctuation">,</span> i<span class="token punctuation">,</span> BCOLOR<span class="token punctuation">,</span> THICKNESS<span class="token punctuation">)</span>
    area <span class="token operator">=</span> cv2<span class="token punctuation">.</span>contourArea<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    peri <span class="token operator">=</span> cv2<span class="token punctuation">.</span>arcLength<span class="token punctuation">(</span>cnt<span class="token punctuation">,</span> closed<span class="token operator">=</span><span class="token boolean">True</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Area:</span><span class="token interpolation"><span class="token punctuation">{</span>area<span class="token punctuation">}</span></span><span class="token string">; Perimeter: </span><span class="token interpolation"><span class="token punctuation">{</span>peri<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><p>Indeed, we&apos;re using countour area as a good indicator to search for our region of interest. When we take this idea a little further, we can further place a constraint on our search criteria. In the following code, we draw a bounding rectangle and for an extra layer of precaution, only takes any bounding boxes that are taller than 20 pixels (step 1).</p>
<p>Calling <code>boundingRect()</code> on a contour returns 4 values, respectively the x and y coordinate along with the width and height of the contour.</p>
<p>We then use another property of the contour, its top-left coordinate to determine the logical order of our digits. Specifically, we use the first returned value (<code>cv2.boundingRect(cnt)[0]</code>) since that&apos;s the x value for the top-left coordinate of each region. By sorting against this value, our digits are stored in the Python list in an ordered fashion, determined by their respective coordinate value.</p>
<pre data-role="codeBlock" data-info="py" class="language-python">digits_cnts <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> cnts<span class="token punctuation">:</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    <span class="token comment"># step 1</span>
    <span class="token keyword">if</span> h <span class="token operator">&gt;</span> <span class="token number">20</span><span class="token punctuation">:</span>
        digits_cnts <span class="token operator">+=</span> <span class="token punctuation">[</span>cnt<span class="token punctuation">]</span>
<span class="token comment"># step 2</span>
sorted_digits <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>digits_cnts<span class="token punctuation">,</span> key<span class="token operator">=</span><span class="token keyword">lambda</span> cnt<span class="token punctuation">:</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</pre><p>When we put these together, we now have a complete pipeline:<br>
<img src="assets/digitrecflow.png" alt></p>
<p>The full solution code is in <code>digit_01.py</code> but the essential parts are as follow:</p>
<pre data-role="codeBlock" data-info="py" class="language-python"><span class="token keyword">import</span> cv2
<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np
<span class="token comment"># step 1:</span>
DIGITSDICT <span class="token operator">=</span> <span class="token punctuation">{</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">0</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">1</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">2</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">3</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">4</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">5</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">6</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">7</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">8</span><span class="token punctuation">,</span>
    <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">:</span> <span class="token number">9</span><span class="token punctuation">,</span>
<span class="token punctuation">}</span>

<span class="token comment"># step 2</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>imread<span class="token punctuation">(</span><span class="token string">&quot;inter/ocbc-roi.png&quot;</span><span class="token punctuation">,</span> flags<span class="token operator">=</span><span class="token number">0</span><span class="token punctuation">)</span>

<span class="token comment"># step 3</span>
RATIO <span class="token operator">=</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">0.2</span>
roi <span class="token operator">=</span> cv2<span class="token punctuation">.</span>bilateralFilter<span class="token punctuation">(</span>roi<span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">30</span><span class="token punctuation">,</span> <span class="token number">60</span><span class="token punctuation">)</span>
trimmed <span class="token operator">=</span> roi<span class="token punctuation">[</span><span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span> <span class="token punctuation">:</span> roi<span class="token punctuation">.</span>shape<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span> <span class="token operator">-</span> <span class="token builtin">int</span><span class="token punctuation">(</span>RATIO<span class="token punctuation">)</span><span class="token punctuation">]</span>

<span class="token comment"># step 4</span>
edged <span class="token operator">=</span> cv2<span class="token punctuation">.</span>adaptiveThreshold<span class="token punctuation">(</span>
    trimmed<span class="token punctuation">,</span> <span class="token number">255</span><span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>ADAPTIVE_THRESH_GAUSSIAN_C<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>THRESH_BINARY_INV<span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">,</span> <span class="token number">5</span>
<span class="token punctuation">)</span>

<span class="token comment"># step 5</span>
kernel <span class="token operator">=</span> cv2<span class="token punctuation">.</span>getStructuringElement<span class="token punctuation">(</span>cv2<span class="token punctuation">.</span>MORPH_RECT<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">5</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
dilated <span class="token operator">=</span> cv2<span class="token punctuation">.</span>dilate<span class="token punctuation">(</span>edged<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>
eroded <span class="token operator">=</span> cv2<span class="token punctuation">.</span>erode<span class="token punctuation">(</span>dilated<span class="token punctuation">,</span> kernel<span class="token punctuation">,</span> iterations<span class="token operator">=</span><span class="token number">1</span><span class="token punctuation">)</span>

<span class="token comment"># step 6</span>
cnts<span class="token punctuation">,</span> _ <span class="token operator">=</span> cv2<span class="token punctuation">.</span>findContours<span class="token punctuation">(</span>eroded<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>RETR_EXTERNAL<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CHAIN_APPROX_SIMPLE<span class="token punctuation">)</span>
digits_cnts <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> cnts<span class="token punctuation">:</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    <span class="token keyword">if</span> h <span class="token operator">&gt;</span> <span class="token number">20</span><span class="token punctuation">:</span>
        digits_cnts <span class="token operator">+=</span> <span class="token punctuation">[</span>cnt<span class="token punctuation">]</span>

<span class="token comment"># step 7</span>
sorted_digits <span class="token operator">=</span> <span class="token builtin">sorted</span><span class="token punctuation">(</span>digits_cnts<span class="token punctuation">,</span> key<span class="token operator">=</span><span class="token keyword">lambda</span> cnt<span class="token punctuation">:</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span>

<span class="token comment"># step 8</span>
digits <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> cnt <span class="token keyword">in</span> sorted_digits<span class="token punctuation">:</span>
    <span class="token comment"># step 8a</span>
    <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">,</span> w<span class="token punctuation">,</span> h<span class="token punctuation">)</span> <span class="token operator">=</span> cv2<span class="token punctuation">.</span>boundingRect<span class="token punctuation">(</span>cnt<span class="token punctuation">)</span>
    roi <span class="token operator">=</span> eroded<span class="token punctuation">[</span>y <span class="token punctuation">:</span> y <span class="token operator">+</span> h<span class="token punctuation">,</span> x <span class="token punctuation">:</span> x <span class="token operator">+</span> w<span class="token punctuation">]</span>
    qW<span class="token punctuation">,</span> qH <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>w <span class="token operator">*</span> <span class="token number">0.25</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.15</span><span class="token punctuation">)</span>
    fractionH<span class="token punctuation">,</span> halfH<span class="token punctuation">,</span> fractionW <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.05</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>h <span class="token operator">*</span> <span class="token number">0.5</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token builtin">int</span><span class="token punctuation">(</span>w <span class="token operator">*</span> <span class="token number">0.25</span><span class="token punctuation">)</span>

    <span class="token comment"># step 8b</span>
    sevensegs <span class="token operator">=</span> <span class="token punctuation">[</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> qH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># a (top bar)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span>w <span class="token operator">-</span> qW<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># b (upper right)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span>w <span class="token operator">-</span> qW<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># c (lower right)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> h <span class="token operator">-</span> qH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>w<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># d (lower bar)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>qW<span class="token punctuation">,</span> h<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># e (lower left)</span>
        <span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>qW<span class="token punctuation">,</span> halfH<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># f (upper left)</span>
        <span class="token comment"># ((0, halfH - fractionH), (w, halfH + fractionH)) # center</span>
        <span class="token punctuation">(</span>
            <span class="token punctuation">(</span><span class="token number">0</span> <span class="token operator">+</span> fractionW<span class="token punctuation">,</span> halfH <span class="token operator">-</span> fractionH<span class="token punctuation">)</span><span class="token punctuation">,</span>
            <span class="token punctuation">(</span>w <span class="token operator">-</span> fractionW<span class="token punctuation">,</span> halfH <span class="token operator">+</span> fractionH<span class="token punctuation">)</span><span class="token punctuation">,</span>
        <span class="token punctuation">)</span><span class="token punctuation">,</span>  <span class="token comment"># center</span>
    <span class="token punctuation">]</span>

    <span class="token comment"># step 8c</span>
    on <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span> <span class="token operator">*</span> <span class="token number">7</span>
    <span class="token keyword">for</span> <span class="token punctuation">(</span>i<span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>p1x<span class="token punctuation">,</span> p1y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>p2x<span class="token punctuation">,</span> p2y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token keyword">in</span> <span class="token builtin">enumerate</span><span class="token punctuation">(</span>sevensegs<span class="token punctuation">)</span><span class="token punctuation">:</span>
        region <span class="token operator">=</span> roi<span class="token punctuation">[</span>p1y<span class="token punctuation">:</span>p2y<span class="token punctuation">,</span> p1x<span class="token punctuation">:</span>p2x<span class="token punctuation">]</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>
            <span class="token string-interpolation"><span class="token string">f&quot;</span><span class="token interpolation"><span class="token punctuation">{</span>i<span class="token punctuation">}</span></span><span class="token string">: Sum of 1: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">, Sum of 0: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">, Shape: </span><span class="token interpolation"><span class="token punctuation">{</span>region<span class="token punctuation">.</span>shape<span class="token punctuation">}</span></span><span class="token string">, Size: </span><span class="token interpolation"><span class="token punctuation">{</span>region<span class="token punctuation">.</span>size<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span>
        <span class="token punctuation">)</span>
        <span class="token keyword">if</span> np<span class="token punctuation">.</span><span class="token builtin">sum</span><span class="token punctuation">(</span>region <span class="token operator">==</span> <span class="token number">255</span><span class="token punctuation">)</span> <span class="token operator">&gt;</span> region<span class="token punctuation">.</span>size <span class="token operator">*</span> <span class="token number">0.5</span><span class="token punctuation">:</span>
            on<span class="token punctuation">[</span>i<span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token number">1</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;State of ON: </span><span class="token interpolation"><span class="token punctuation">{</span>on<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
    <span class="token comment"># step 8d</span>
    digit <span class="token operator">=</span> DIGITSDICT<span class="token punctuation">[</span><span class="token builtin">tuple</span><span class="token punctuation">(</span>on<span class="token punctuation">)</span><span class="token punctuation">]</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Digit is: </span><span class="token interpolation"><span class="token punctuation">{</span>digit<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
    digits <span class="token operator">+=</span> <span class="token punctuation">[</span>digit<span class="token punctuation">]</span>
    <span class="token comment"># step 9</span>
    cv2<span class="token punctuation">.</span>rectangle<span class="token punctuation">(</span>canvas<span class="token punctuation">,</span> <span class="token punctuation">(</span>x<span class="token punctuation">,</span> y<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x <span class="token operator">+</span> w<span class="token punctuation">,</span> y <span class="token operator">+</span> h<span class="token punctuation">)</span><span class="token punctuation">,</span> CYAN<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>putText<span class="token punctuation">(</span>canvas<span class="token punctuation">,</span> <span class="token builtin">str</span><span class="token punctuation">(</span>digit<span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token punctuation">(</span>x <span class="token operator">-</span> <span class="token number">5</span><span class="token punctuation">,</span> y <span class="token operator">+</span> <span class="token number">6</span><span class="token punctuation">)</span><span class="token punctuation">,</span> FONT<span class="token punctuation">,</span> <span class="token number">0.3</span><span class="token punctuation">,</span> <span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Digit&quot;</span><span class="token punctuation">,</span> canvas<span class="token punctuation">)</span>
    cv2<span class="token punctuation">.</span>waitKey<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Digits on the token are: </span><span class="token interpolation"><span class="token punctuation">{</span>digits<span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><ul>
<li>Step 1: Initialize the lookup dictionary</li>
<li>Step 2: Read our ROI image using OpenCV</li>
<li>Step 3: Noise reduction and trim away asymmetrical white space in our ROI</li>
<li>Step 4: Binarize our image using adaptive thresholding</li>
<li>Step 5: Morphological transformation to remove noise and fill the small holes in our digit</li>
<li>Step 6: Find contours in our image with a height greater than 20px</li>
<li>Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)</li>
<li>Step 8
<ul>
<li>Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (<code>h</code>) of our rectangular box</li>
<li>Step 8b: Slice the seven segments; The first segment (&quot;A&quot;) is from point (0,0) to (w, <code>int(h * 0.15)</code>); This segment is <code>w</code> in width and 15% the height of the full digit contour, starting from position (0, 0)</li>
<li>Step 8c: Initialize the state to <code>0</code> for each of the 7 segments, then conditionally set regions with more white than black pixels to <code>1</code></li>
<li>Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the <code>digits</code> list created at the beginning of step 8</li>
</ul>
</li>
<li>Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the <code>digits</code> list.</li>
</ul>
<h1 class="mume-header" id="references">References</h1>

<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278&#x2013;2324 <a href="#fnref1" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Saliency map, Wikipedia <a href="#fnref2" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn3" class="footnote-item"><p>Morphological Transformations, OpenCV Documentation <a href="#fnref3" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
<li id="fn4" class="footnote-item"><p>Seven-segment display, Wikipedia <a href="#fnref4" class="footnote-backref">&#x21A9;&#xFE0E;</a></p>
</li>
</ol>
</section>

      </div>
      <div class="md-sidebar-toc"><ul>
<li><a href="#background">Background</a>
<ul>
<li><a href="#what-about-deep-learning">What about Deep Learning?</a></li>
<li><a href="#region-of-interest">Region of Interest</a>
<ul>
<li><a href="#selecting-region-of-interest">Selecting Region of Interest</a></li>
<li><a href="#arc-length-and-area-size">Arc Length and Area Size</a>
<ul>
<li><a href="#dive-deeper-roi">Dive Deeper: ROI</a></li>
</ul>
</li>
<li><a href="#roi-extraction">ROI extraction</a></li>
</ul>
</li>
<li><a href="#morphological-transformations">Morphological Transformations</a>
<ul>
<li><a href="#erosion">Erosion</a></li>
<li><a href="#dilation">Dilation</a></li>
<li><a href="#opening-and-closing">Opening and Closing</a></li>
<li><a href="#learn-by-building-morphological-transformation">Learn-by-building: Morphological Transformation</a></li>
</ul>
</li>
<li><a href="#seven-segment-display">Seven-segment display</a>
<ul>
<li><a href="#practical-strategies">Practical Strategies</a>
<ul>
<li><a href="#contour-properties">Contour Properties</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#references">References</a></li>
</ul>
</div>
      <a id="sidebar-toc-btn">&#x2261;</a>
    
    
    
    
    
    
    
    
<script>

var sidebarTOCBtn = document.getElementById('sidebar-toc-btn')
sidebarTOCBtn.addEventListener('click', function(event) {
  event.stopPropagation()
  if (document.body.hasAttribute('html-show-sidebar-toc')) {
    document.body.removeAttribute('html-show-sidebar-toc')
  } else {
    document.body.setAttribute('html-show-sidebar-toc', true)
  }
})
</script>
      
  
    </body></html>

================================================
FILE: digitrecognition/digitrec.md
================================================
# Background
In Chapter 4: Digit Recognition, we'll add a few new techniques to our image processing toolset by attempting to build a digit recognition pipeline from start to finish. Throughout the exercise, we will get to practice the image preprocessing tricks we've picked up from previous chapters:
- Image manipulations such as resizing, cropping, rotation, color conversion  
- Blurring and sharpening operations
- Thresholding and Edge Detection
- Contour approximation

New method and strategies that you'll be learning include:
- Drawing operations (rectangles, text) on our image  
- Region of interest and bounding rectangles
- Morphological transformations
- The Seven-Segment Display 

## What about Deep Learning?
To be clear, specialised deep learning libraries that have sprung out in recent years are a lot more robust in their approach. By utilizing machine learning principles (cost function, gradient descent etc), these specialised libraries can handle highly complex object recognition and OCR (optical character recognition) tasks at the cost of brute computing power.

The overarching motivation of this free course however, was to make clear to beginners what constitutes artificial intelligence, and to illustrate the principle benefits of machine learning. I try to achieve that by demonstrating -- over multiple chapters of this course -- how computer visions were traditionally, or rather "classically", performed prior to the emergence of deep learning. 

By learning the classical approaches to computer vision, the student (you) can compare the effort it takes to hand-tuning parameters and this adds a new dimension of appreciation towards self-learning methods that we'll discuss in the near future.

## Region of Interest
Do a quick google search on "digit recognition" or "digit classification" and it's hard to find an introductory deep learning course that **doesn't use** the famous MNIST (Modified National Institute of Standards and Technology)[^1] database. This is a handwritten digit database that has long become the _de facto_ in pretty much any machine learning tutorials:

![](assets/mnist.png)

But I'd argue, that for a budding computer vision developer, your learning objectives are better served by taking a different approach. 

By choosing real life images, you are confronted with a few more key challenges that are not present from using a well-curated database such as MNIST. These challenges present new opportunities to learn about key concepts such as **region of interest**, and **morphological operations**, that you will come to rely upon greatly in the future. 

First, take a look at 4 real-life pictures of security tokens issued by banks and institutional agencies (left-to-right: Bank Central Asia, DBS, OCBC Bank, OneKey for Singapore Government e-services): 

![](assets/securitytokens.png)

Notice how noisy these images are, as each image is shot with a different background, different lighting conditions, each token is of a different size and shape, and the different colors in each security token etc. 

Your task, as a computer vision developer, is to develop a pipeline that, in each phase, take you closer to the goal. Roughly speaking, given the above task, we would formulate a pipeline that looks like the following:
1. Preprocessing, noise reduction
2. Contour approximation
3. Find region of interest (ROI), that is the area of the LCD display in each of these pictures
4. Extract ROI for further preprocessing, discarding the rest of the image
5. Isolate each digit from the ROI
6. Iteratively classify each digit in the image
7. Combine the per-digit classification to a final string ("output")

In practice, step (1) and (2) above is the "application" of the methods you've learned in previous chapters of this series. As we'll soon observe, we will use a combination of blurring operations and edge detection to draw our contours. Among the contours, one of them would be the LCD display containing the digits to be classified. That is our **Region of Interest**.

![](assets/croproi.gif)

### Selecting Region of Interest
The GIF above demonstrates the code in `roi_01.py` but essentially it shows the `selectROI` method in action. You'll commonly combined the `selectROI` method with a either a slicing operation to crop your region of interest, or a drawing operation to call attention to the specific region of the image.

```py
x,y,w,h = cv2.selectROI("Region of interest", img)
cropped = img[y:y+h, x:x+w]
# draw rectangle 
cv2.rectangle(img_color, (x,y), (x+w,y+h), (255,0,0), 2)
```

In most cases, it simply wouldn't be realistic to render an image before manually specifying our region of interest. We'll need this operation to be as close to automatic as possible. But how exactly? That depends greatly on the specific problem set. 

In some cases, the obvious choice of strategy would be simply shape recognition, say by counting the number of vertices from each contour. The following code is an example implementation of that:

```py
# cnt = contour
peri = cv2.arcLength(cnt, True)
# contour approximation
cnt_appro = cv2.approxPolyDP(cnt, 0.03 * peri, True)
if len(cnt_approx) == 3:
    est_shape = 'triangle'
...
elif len(cnt_approx) == 5:
    est_shape = 'pentagon'
...
```

In other cases, you may employ a strategy that try to match contour based on Hu moments (which we'll study in details in future chapters). 

Other methods may involve a saliency map, or a visual attention map, for ROI extraction. These methods create a new representation of the original image where each pixel's **unique quality** are amplified or emphasized. One example implementation on Wikipedia[^2] demonstrates how straightforward this concept really is:

$$SALS(I_K) = \sum^{N}_{i=1}|I_k-I_i|$$

As you add new tools and strategies to your computer vision toolbox, you will pick up new approaches to ROI extraction. It is an interesting field of research that has been gaining a lot in popularity with the emergence of deep learning.

As for the images of bank security tokens, can you think of an approach that may be a good fit? Our region of interest is the LCD screen at the top of the button pad on each device, and they all seem to be rather consistent in shape and size. Give it some thought and read on to find out.

### Arc Length and Area Size
I've hinted at the shape and size being a factor, so maybe that would be a good starting point. The good news is the OpenCV made this incredibly easy through the `contourArea()` and `arcLength()` function. 

The following snippet of code, lifted from `contourarea_01.py`, finds all contours and sort them by area size in descending order before storing the first 10 in `cnts`:
```py
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by contourArea, and get the first 10
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]
```

We can also obtain the contour area and parameter iteratively in a for-loop, like the following:
```py
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for i in range(len(cnts)):
    area = cv2.contourArea(cnts[i])
    peri = cv2.arcLength(cnts[i], closed=True)
    print(f'Area:{area}, Perimeter:{peri}')
```

In effect, we're looping through each contour that the `findContours()` operation found, and computing two values each time, `area` and `peri`. 

Note that the contour perimeter is also known as the arc length. The second argument `closed` specify whether the shape is a closed contour (`True`) or just a curve (`closed=False`). 

Execute `contourarea_01.py` and observe how each contour is displayed, from the one with the largest area to the one with the least, for a total of 10 contours. As you run the script on different pictures of bank security tokens, you see that it does a reliable job at finding the contours, sorting them, and returning our LCD display screen as the first in the list. This makes sense, because visually it is apparent that the LCD display occupy the largest area among other closed shapes in our picture.

#### Dive Deeper: ROI
1. Use `assets/dbs.jpg` instead of `assets/ocbc.jpg` in `contourarea_01.py`. Were you able to extract the region of interest (LCD Display) successfully without any changes to the script?

2. Could we have successfully extract our region of interest have we used `arcLength` in our strategy?

3. Supposed we only wanted to extract the region of interest and not the rest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly. 

4. Supposed we wanted the contours sorted according to their respective area, from the smallest to the largest, which line of code would you change? Reflect the change in the code and execute it to confirm that you have performed this exercise correctly.

While working through the exercises above, you may find it helpful to also draw the text describing the area size and perimeter next to each contour. I've shown you how this can be done in `contourarea_02.py` but the essential addition we make to the earlier code is the two calls to `putText()`:

```py
PURPLE = (75, 0, 130)
THICKNESS = 1
FONT = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img_color, "Area:" + str(area), (x, y - 15), FONT, 0.4, PURPLE,THICKNESS)
cv2.putText(img_color, "Perimeter:" + str(peri), (x, y - 5), FONT, 0.4,PURPLE, THICKNESS)
```

![](assets/textcontour.png)

### ROI extraction
With these foundations, we are now ready to write a simple utility script that:
1. Find our region of interest
2. Crop ROI into a new image
3. Save it into an folder named `/inter` (intermediary) for the actual digit recognition later

Much of what you need to do has already been presented so far, but the core pieces are, lifted from `roi_02.py` the following few lines of code:

```py
img = cv2.imread(...)
blurred = cv2.GaussianBlur(img, (7, 7), 0)
edged = cv2.Canny(blurred, 130, 150, 255)
cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:1]

x, y, w, h = cv2.boundingRect(cnts[0])
roi = img[y : y + h, x : x + w]
cv2.imwrite("roi.png", roi)
```

The `roi_02.py` utility script uses the `argparse` library so user can specify a file path with a flag `-p` (or `--path`) like such:
```bash
python roi_02.py -p assets/ocbc.jpg
# equivalent:
python roi_02.py --path assets/ocbc.jpg
```

If the user do not specify a file path using the `-p` flag, the default value would be `assets/ocbc.jpg`. If you wish to change this, edit `roi_02.py` and specify a different value for the `default` parameter.

```py
parser = argparse.ArgumentParser()
parser.add_argument("-p", "--path", default="assets/ocbc.jpg")
```

You should run this exercise using `dbs.jpg`, `ocbc2.jpg`, or `onekey.jpg` at least once. Execute the script and check the `inter` folder to confirm that the ROI has been saved. When you're done, you are ready to move on to the next phase of the digit recognition pipeline. 

## Morphological Transformations
Once the region of interest is obtained, we now have an image that may still contain noises. This is especially the case when our ROI is obtained by means of thresholding methods, since you can expect some "non-features" (noises) to also be included in the resulting image. 

To account for these imperfections, we will now perform a series of operations on our image. We'll learn what they are formally, but let's begin by seeing what is it that they _offer_ to our image processing pipeline. I've included a picture with some random noise, as follow:

![](assets/0417s.png)

The digit "0417" is clearly discernible to the human eye despite the presence of noise. However, consider the perspective of a global thresholding operation; These pixel values are "noise" to us but a computer has no such notion of which pixel values are meaningful and what others are not. A thresold value such as the global mean will take all values into account indiscriminately. A contour finding operation will, instead of 4, return thousands of tiny round segments (they may be tiny, but they are completely valid contours). 

An image processing pipeline that fail to account for these may result in sub-optimal performance or, very often, completely undesired results. 

Enter two of the most fundamental morphological transformations: **erosion** and **dilation**. 

### Erosion
Erosion "erodes away the boundaries of foreground object"[^3] by sliding a kernel through the image and set a pixel to 1 **only if all the pixels under the kernel is 1**.

This in effect discard pixels near the boundary and any floating pixels that are not part of a larger blob (which is what the human eye is interested in). Because pixels are eroded, your foreground object will shrink in size.

### Dilation
The opposite of erosion, Dilation sets a pixel to 1 if **at least one pixel under the kernel is 1**, essentially "growing" the foreground object. 

Because of how these operations work, there are a couple of things to note:
1. Morphological transformations are usually performed on binary images. Recall that pixel values in binary images are either a full white (i.e 1) or black (i.e 0). 
2. As per convention, we want to keep our foregound in white and background in black  
3. Because erosion results in a shrinking foreground and dilation results in a growing foreground, these two operations are also commonly used in combinations, i.e erosion followed by dilation, or vice versa

![](assets/morphexample.png)

The full code solution is in `morphological_02.py`.

As we read our image in grayscale mode (`flags=0`), we obtain a white blackground and a mostly-black foreground. This is illustrated in the subplot titled "Original" above. We begin our preprocessing steps by first binarizing the image (step 1), followed by inverting the colors (step 2) to get a white-on-black image. 

An erosion operation is then performed (step 3). This works by creating our kernel (either through `numpy` or through `opencv`'s structuring element) and sliding that kernel across our image to remove white noises in our image. 

The side-effect is that our foreground object has now shrunk in size as it's boundaries are eroded away. We grow it back by applying a dilation (step 4) and finally show the output as illustrated in the bottom-right pane of the image above.

```py
# read as grayscale
roi = cv2.imread("assets/0417s.png", flags=0)
# step 1: 
_, thresh = cv2.threshold(roi, 170, 255, cv2.THRESH_BINARY)
# step 2:
inv = cv2.bitwise_not(thresh)
# step 3 (option 1):
kernel = np.ones((5,5), np.uint8)
# step 3 (option 2):
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
eroded = cv2.erode(inv, kernel, iterations=1)
# step 4:
dilated = cv2.dilate(eroded, kernel, iterations=1)
cv2.imshow("Transformed", dilated)
cv2.waitKey(0)
```

OpenCV provides the three shapes for our kernel:
- Rectangular box: `MORPH_RECT`
- Cross: `MORPH_CROSS`
- Ellipse: `MORPH_ELLIPSE`

They are fed as the first argument into `cv2.getStructuringElement()`, with the second being the kernel size (`ksize`) itself. The third argument is the _anchor point_, which defaults to the center.

### Opening and Closing
Another name for **Erosion, followed by Dilation** is the Opening. It is useful in removing noise in our image. The reverse of Opening is Closing, where we first **perform Dilation followed by Erosion**, particularly suited for closing small holes inside foreground objects.

OpenCV includes the more generic `morphologyEx` method for all other morphological operations beyond Erosion and Dilation. The function takes an image as the first argument, an operation as the second operation and finally the kernel. Compare how your code will differ between `cv2.erode` and `cv2.dilate`, and their respective equivalence in `cv2.morphologyEx()`:

```py
import cv2
import numpy as np

img = cv2.imread('image.png',0)
kernel = np.ones((5,5),np.uint8)
erosion = cv2.erode(img,kernel,iterations = 1)
# Equivalent:
# cv2.morphologyEx(img, cv2.MORPH_ERODE, kernel,iterations=1)
dilation = cv2.dilate(img,kernel,iterations = 1)
# Equivalent:
# cv2.morphologyEx(img, cv2.MORPH_DILATE, kernel,iterations=1)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
```

### Learn-by-building: Morphological Transformation
In the `homework` directory, you'll find `0417h.png`. Your job is to apply what you've learned in this lesson to clean up the image. Your output should have these qualities:
1. As free of noise as possible (remove the lines, and the red splatted dots across the image)
2. If you run `findContours()` on the output, you should have exactly 4 contours
3. Foreground object in white, background in black

![](homework/0417h.png)

You are free to pick your strategy, but a reference solution would look like the following:

![](assets/0417reference.png)

## Seven-segment display
The seven-segment display (known also as "seven-segment indicator") is a form of electronic display device for displaying decimal numerals[^4] widely used in digital clocks, electronic meters, calculators and banking security tokens.

![](assets/sevenseg.png)

This is relevant because it is the character representation of our digits in each of these security tokens. If we can isolate each digit from each other, we can iteratively predict the "class" of each digit (0 to 9). Specifically, we are going to perform a classification task based on the state of each segment. 

To ease our understanding, let's refer to each segment using the letters A to G:

![](assets/sevenseg1.png)

We can then create a lookup table that match the collective states to the corresponding class:

| Class 	| a 	| b 	| c 	| d 	| e 	| f 	| g 	|
|-------	|---	|---	|---	|---	|---	|---	|---	|
| 0 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	| 0 	|
| 1 	| 0 	| 1 	| 1 	| 0 	| 0 	| 0 	| 0 	|
| 2 	| 1 	| 1 	| 0 	| 1 	| 1 	| 0 	| 1 	|
| 3 	| 1 	| 1 	| 1 	| 1 	| 0 	| 0 	| 1 	|
| 4 	| 0 	| 1 	| 1 	| 0 	| 0 	| 1 	| 1 	|
| 5 	| 1 	| 0 	| 1 	| 1 	| 0 	| 1 	| 1 	|
| 6 	| 1 	| 0 	| 1 	| 1 	| 1 	| 1 	| 1 	|
| 7 	| 1 	| 1 	| 1 	| 0 	| 0 	| 1 	| 0 	|
| 8 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	| 1 	|
| 9 	| 1 	| 1 	| 1 	| 1 	| 0 	| 1 	| 1 	|


How would we represent such a lookup table in our Python code and how would we use it? The obvious answer to the first question is a dictionary. Notice that `DIGITSDICT` is just a representation of the "binary state" of each segment. The digit "8" for example correspond to all seven segments being activated, or "on" (state of `1`). 

```py
DIGITSDICT = {
    (1,1,1,1,1,1,0):0,
    (0,1,1,0,0,0,0):1,
    (1,1,0,1,1,0,1):2,
    (1,1,1,1,0,0,1):3,
    (0,1,1,0,0,1,1):4,
    (1,0,1,1,0,1,1):5,
    (1,0,1,1,1,1,1):6,
    (1,1,1,0,0,1,0):7,
    (1,1,1,1,1,1,1):8,
    (1,1,1,1,0,1,1):9
}
```

Then, for each digit, we would look at the pixel values in each of the seven segments, and if the majority of pixels are white, we would classify that segment as being in an activated state (`1`), otherwise in a state of `0`. As we iterate over the 7 segments, we now have an array of length 7, each element a binary value(`0` or `1`). 

We would then find the corresponding value in our dictionary using that array. Your code would resemble the following:

```py
# define the rectangle areas corresponding each segment
sevensegs = [
    ((x0, y0), (x1, y1)),
    ((x2, y2), (x3, y3)),
    ... # 7 of them
]

# initialize the state to OFF
on = [0] * 7 

# set each segment to ON / OFF based on majority
for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
    # numpy slicing to extract only one region
    region = roi[p1y:p2y, p1x:p2x]
    # if majority pixels are white, set state to ON
    if np.sum(region == 255) > region.size *0.5:
        on[i] = 1

# lookup on dictionary
digit = DIGITSDICT[tuple(on)] # digit is one of 0-9
```

There are multiple ways to write a for-loop but it's important that you are aware of the order in which your for-loop your executing. Referring to our seven-segment illustration below,the first iteration is only concerned with the state of 'A' while the second interation handles the state of 'B', and so on. 

![](assets/sevenseg1.png)

Using `enumerate`, we obtain an additional counter (`i`) to our iterable (`sevensegs`); This is convenient for the purpose of setting states. At the first iteration, the first element is our list is conditionally set to 1 if more than half of the pixels in segment 'A' are white. A more detailed example of python's enumeration is in `utils/enumerate.py`.

### Practical Strategies
If you are paying close attention to the digit '0' in our LCD display, you will notice that the absence of the 'G' segment causes a pretty visible and significant gap. When you test your digit recognition script without special consideration to this attribute, you will find it consistently failing to account for the numbers "0","1" and "7". In fact, you may not even be able to isolate the aforementioned numbers altogether using the `findContour` operation, because they were treated as two disjointed pieces instead of a whole piece. 

A reasonable strategy to handle this is the Dilation or Closing (Dilation followed by Erosion) operation that you've learned earlier. 

Similarly, your ROI may necessitate other pre-processing and the specific tactical solution vary greatly depending on the problem set at hand. 

As I inspect the bounding box we retrieved around the LCD screen, the observation that these bouding boxes often have their digits centered around the bottom half of the display led me to insert an additional step prior to the morphological transformation in the final code solution. The step uses numpy subsetting to trim away the top 20% as well as 20% on each side of the image:

```py
roi = cv2.imread("roi.png", flags=0)
RATIO = roi.shape[0] * 0.2
trimmed = roi[
    int(RATIO) :, 
    int(RATIO) : roi.shape[1] - int(RATIO)]
```

That said, whenever possible, you want to be cautious of not hand-tuning your problem in a way that is overly specific to the images you have at hand lest risking the solution **only** working on those specific images and not others, a phenomenon fondly termed as "overfitting" in the machine learning community.

I've re-executed the solution code against some sample image sets, once with the "trimming" in-place and then without the trimming, before settling on the decision. As you will see later, the trimming improves our accuracy and is a relatively safe strategy given how every LCD screen regardless of the issuer (bank) has the same asymmetry with more "blank space" at the top half compared to the bottom half. 

#### Contour Properties
Furthermore, in many cases of digit recognition / digit classification you will want to predict the class for each digit in an ordered fashion. Supposed the LCD screen contains the digits "40710382", our algorithm should correctly isolate these digits, classify them iteratively, but do so from the leftmost digit to the rightmost. Failing to account for this may result in your algorithm correctly classifying each digit, but produce an unreasonable output such as "1740238". 

There are a few strategies you can employ here. We've seen in  `contourarea_01.py` and `contourarea_02.py` how contour has attributes that can be retrieved using the `contourArea()` and `arcLength()` functions. Inspect the following snippet and it should help jog your memory:

```py
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:9]

for i, cnt in enumerate(cnts):
    cv2.drawContours(img_color, cnts, i, BCOLOR, THICKNESS)
    area = cv2.contourArea(cnt)
    peri = cv2.arcLength(cnt, closed=True)
    print(f"Area:{area}; Perimeter: {peri}")
```

Indeed, we're using countour area as a good indicator to search for our region of interest. When we take this idea a little further, we can further place a constraint on our search criteria. In the following code, we draw a bounding rectangle and for an extra layer of precaution, only takes any bounding boxes that are taller than 20 pixels (step 1).

Calling `boundingRect()` on a contour returns 4 values, respectively the x and y coordinate along with the width and height of the contour. 

We then use another property of the contour, its top-left coordinate to determine the logical order of our digits. Specifically, we use the first returned value (`cv2.boundingRect(cnt)[0]`) since that's the x value for the top-left coordinate of each region. By sorting against this value, our digits are stored in the Python list in an ordered fashion, determined by their respective coordinate value. 

```py
digits_cnts = []
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    # step 1
    if h > 20:
        digits_cnts += [cnt]
# step 2
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])
```

When we put these together, we now have a complete pipeline:  
![](assets/digitrecflow.png)

The full solution code is in `digit_01.py` but the essential parts are as follow:

```py
import cv2
import numpy as np
# step 1:
DIGITSDICT = {
    (1, 1, 1, 1, 1, 1, 0): 0,
    (0, 1, 1, 0, 0, 0, 0): 1,
    (1, 1, 0, 1, 1, 0, 1): 2,
    (1, 1, 1, 1, 0, 0, 1): 3,
    (0, 1, 1, 0, 0, 1, 1): 4,
    (1, 0, 1, 1, 0, 1, 1): 5,
    (1, 0, 1, 1, 1, 1, 1): 6,
    (1, 1, 1, 0, 0, 1, 0): 7,
    (1, 1, 1, 1, 1, 1, 1): 8,
    (1, 1, 1, 1, 0, 1, 1): 9,
}

# step 2
roi = cv2.imread("inter/ocbc-roi.png", flags=0)

# step 3
RATIO = roi.shape[0] * 0.2
roi = cv2.bilateralFilter(roi, 5, 30, 60)
trimmed = roi[int(RATIO) :, int(RATIO) : roi.shape[1] - int(RATIO)]

# step 4
edged = cv2.adaptiveThreshold(
    trimmed, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)

# step 5
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)
eroded = cv2.erode(dilated, kernel, iterations=1)

# step 6
cnts, _ = cv2.findContours(eroded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
digits_cnts = []
for cnt in cnts:
    (x, y, w, h) = cv2.boundingRect(cnt)
    if h > 20:
        digits_cnts += [cnt]

# step 7
sorted_digits = sorted(digits_cnts, key=lambda cnt: cv2.boundingRect(cnt)[0])

# step 8
digits = []
for cnt in sorted_digits:
    # step 8a
    (x, y, w, h) = cv2.boundingRect(cnt)
    roi = eroded[y : y + h, x : x + w]
    qW, qH = int(w * 0.25), int(h * 0.15)
    fractionH, halfH, fractionW = int(h * 0.05), int(h * 0.5), int(w * 0.25)

    # step 8b
    sevensegs = [
        ((0, 0), (w, qH)),  # a (top bar)
        ((w - qW, 0), (w, halfH)),  # b (upper right)
        ((w - qW, halfH), (w, h)),  # c (lower right)
        ((0, h - qH), (w, h)),  # d (lower bar)
        ((0, halfH), (qW, h)),  # e (lower left)
        ((0, 0), (qW, halfH)),  # f (upper left)
        # ((0, halfH - fractionH), (w, halfH + fractionH)) # center
        (
            (0 + fractionW, halfH - fractionH),
            (w - fractionW, halfH + fractionH),
        ),  # center
    ]

    # step 8c
    on = [0] * 7
    for (i, ((p1x, p1y), (p2x, p2y))) in enumerate(sevensegs):
        region = roi[p1y:p2y, p1x:p2x]
        print(
            f"{i}: Sum of 1: {np.sum(region == 255)}, Sum of 0: {np.sum(region == 0)}, Shape: {region.shape}, Size: {region.size}"
        )
        if np.sum(region == 255) > region.size * 0.5:
            on[i] = 1
        print(f"State of ON: {on}")
    # step 8d
    digit = DIGITSDICT[tuple(on)]
    print(f"Digit is: {digit}")
    digits += [digit]
    # step 9
    cv2.rectangle(canvas, (x, y), (x + w, y + h), CYAN, 1)
    cv2.putText(canvas, str(digit), (x - 5, y + 6), FONT, 0.3, (0, 0, 0), 1)
    cv2.imshow("Digit", canvas)
    cv2.waitKey(0)
print(f"Digits on the token are: {digits}")
```

- Step 1: Initialize the lookup dictionary
- Step 2: Read our ROI image using OpenCV
- Step 3: Noise reduction and trim away asymmetrical white space in our ROI
- Step 4: Binarize our image using adaptive thresholding
- Step 5: Morphological transformation to remove noise and fill the small holes in our digit
- Step 6: Find contours in our image with a height greater than 20px
- Step 7: Sort the contours in-place, using the x value of their coordinates (hence, left to right)
- Step 8
    - Step 8a: Create rectangle bounding box on each digit, and some convenience units that we later use to slice the seven segments. Notice that these convenience units are not hard-coded values, but are proportional to the Height (`h`) of our rectangular box
    - Step 8b: Slice the seven segments; The first segment ("A") is from point (0,0) to (w, `int(h * 0.15)`); This segment is `w` in width and 15% the height of the full digit contour, starting from position (0, 0)
    -  Step 8c: Initialize the state to `0` for each of the 7 segments, then conditionally set regions with more white than black pixels to `1`
    -  Step 8d: Once all 7 states have been set, perform lookup against the digit dictionary created in step 1; Append the value to the `digits` list created at the beginning of step 8
- Step 9: Draw rectangle and add predicted text for each bounding box. Finally, use a print statement to print the `digits` list. 


# References
[^1]: LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324
[^2]: Saliency map, Wikipedia
[^3]: Morphological Transformations, OpenCV Documentation
[^4]: Seven-segment display, Wikipedia
[^5]: Seven-segment display character representations, Wikipedia





================================================
FILE: digitrecognition/morphological_01.py
================================================
import cv2
import matplotlib.pyplot as plt

roi = cv2.imread("inter/ocbc-roi.png", flags=0)
blurred = cv2.bilateralFilter(roi, 5, 30, 60)
edged = cv2.adaptiveThreshold(
    blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5
)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 5))
dilated = cv2.dilate(edged, kernel, iterations=1)

plt.subplot(2, 2, 1), plt.imshow(roi, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(blurred, cmap="gray")
plt.title("Blurred"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(edged, cmap="gray")
plt.title("Edged"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(dilated, cmap="gray")
plt.title("Dilated"), plt.xticks([]), plt.yticks([])
plt.show()



================================================
FILE: digitrecognition/morphological_02.py
================================================
import cv2
import matplotlib.pyplot as plt

roi = cv2.imread("assets/0417s.png", flags=0)
cv2.imshow("Original", roi)
cv2.waitKey(0)

_, thresh = cv2.threshold(roi, 170, 255, cv2.THRESH_BINARY)
# thresh = cv2.adaptiveThreshold(dilated, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 5, 5)
cv2.imshow("Threshold", thresh)
cv2.waitKey(0)

inv = cv2.bitwise_not(thresh)
cv2.imshow("Inverted", inv)
cv2.waitKey(0)


kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (4, 4))
eroded = cv2.erode(inv, kernel, iterations=1)

cv2.imshow("Eroded", eroded)
cv2.waitKey(0)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (6, 6))
dilated = cv2.dilate(eroded, kernel, iterations=1)

cv2.imshow("Dilated", dilated)
cv2.waitKey(0)


plt.subplot(2, 2, 1), plt.imshow(roi, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 2), plt.imshow(thresh, cmap="gray")
plt.title("Thresholded"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 3), plt.imshow(inv, cmap="gray")
plt.title("Inverted"), plt.xticks([]), plt.yticks([])

plt.subplot(2, 2, 4), plt.imshow(dilated, cmap="gray")
plt.title("Transformed"), plt.xticks([]), plt.yticks([])

plt.show()



================================================
FILE: digitrecognition/roi_01.py
================================================
import cv2
BCOLOR = (75, 0, 130)
THICKNESS = 4

img_color = cv2.imread("assets/ocbc.jpg")
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

x,y,w,h = cv2.selectROI("Region of interest", img)
print(x,y,w,h)

cropped = img[y:y+h, x:x+w]
cv2.imshow("Cropped", cropped)
cv2.waitKey(0)

cv2.rectangle(img_color, (x,y), (x+w,y+h), (255,0,0), 2)
cv2.imshow("Original Image", img_color)
cv2.waitKey(0)


================================================
FILE: digitrecognition/roi_02.py
================================================
import cv2
import argparse
import re

parser = argparse.ArgumentParser()
parser.add_argument("-p", "--path", default="assets/ocbc.jpg")
args = vars(parser.parse_args())

# test: dbs.jpg | ocbc.jpg
img_color = cv2.imread(args["path"])
img_color = cv2.resize(img_color, None, None, fx=0.5, fy=0.5)
img = cv2.cvtColor(img_color, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(img, (7, 7), 0)
blurred = cv2.bilateralFilter(blurred, 5, sigmaColor=50, sigmaSpace=50)
edged = cv2.Canny(blurred, 130, 150, 255)

cv2.imshow("Outline of device", edged)
cv2.waitKey(0)

cnts, _ = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# sort contours by area, and get the largest
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:1]

cv2.drawContours(img_color, cnts, 0, (75, 0, 130), 4)
cv2.imshow("Target Contour", img_color)
cv2.waitKey(0)

x, y, w, h = cv2.boundingRect(cnts[0])
roi = img[y : y + h, x : x + w]
cv2.imshow("ROI", roi)

img_name = re.search("(?<=\/)(.*)(?=\.jpg)", args["path"]).group(1)

cv2.imwrite(f"inter/{img_name}-roi.png", roi)
cv2.waitKey(0)


================================================
FILE: digitrecognition/utils/enumerate.py
================================================
digits = ['a', 'b', 'c', 'd']

contracts = {
    # salesperson: contract value, duration
    'adam':(500, 2),
    'brian':(300, 1.5),
    'canny':(1000, 4)
}

# for i in range(len(digits)):
#     print(i, digits[i])
# better written as:
for i, d in enumerate(digits):
    print(i, d)

print('---')
print(dict(enumerate(digits)))

for i, c in enumerate(contracts):
    print(i, c)

for i, v in enumerate(contracts.values()):
    print(i, v)

d = {i+1:(k,f'${v1} for {v2} years') for i, (k,(v1, v2)) in enumerate(contracts.items())}
print(d)

================================================
FILE: edgedetect/adaptivethresholding_01.py
================================================
import cv2
import matplotlib.pyplot as plt
import numpy as np

img = cv2.imread("assets/sudoku.jpg", flags=0)
_, img_threshold = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)

img = cv2.medianBlur(img, 5)

mean_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 2
)
gaussian_adaptive = cv2.adaptiveThreshold(
    img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2
)

plt.subplot(2, 2, 1), plt.imshow(img, cmap="gray")
plt.title("Original"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 2), plt.imshow(img_threshold, cmap="gray")
plt.title("Binary Threshold (global:50)"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 3), plt.imshow(mean_adaptive, cmap="gray")
plt.title("Mean Adaptive"), plt.xticks([]), plt.yticks([])
plt.subplot(2, 2, 4), plt.imshow(gaussian_adaptive, cmap="gray")
plt.title("Gaussian Adaptive"), plt.xticks([]), plt.yticks([])
plt.show()



================================================
FILE: edgedetect/canny_01.py
================================================
import numpy as np
import cv2
import matplotlib.pyplot as plt

img = cv2.imread("assets/castello.png", flags=0)
img = cv2.medianBlur(img, 9)
img = cv2.GaussianBlur(img, (9, 9), 0)

def sobel(img, k):
    gradient_x = cv2.Sobel(img, cv2.CV_64F, 1, 0)
    gradient_y = cv2.Sobel(img, cv2.CV_64F, 0, 1)   
    gradient_x = cv2.convertScaleAbs(gradient_x)
    gradient_y = cv2.convertScaleAbs(gradient_y)

    return cv2.addWeighted(gradient_x, 0.5, gradient_y, 0.5, 0)

sobel = sobel(img, 3)
canny = cv2.Canny(img, 50, 180)


plt.subplot(1, 2, 1)
plt.imshow(sobel, cmap="gray")
plt.title("Sobel Edge Detector"), plt.xticks([]), plt.yticks([])

plt.subplot(1, 2, 2)
plt.imshow(canny, cmap="gray")
plt.title("Canny Edge Detector"), plt.xticks([]), plt.yticks([])
plt.show()

================================================
FILE: edgedetect/contour_01.py
================================================
import cv2
import numpy as np


image = cv2.imread("assets/pens.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("Grayscale", image)
cv2.waitKey(0)

image = cv2.GaussianBlur(image, (3, 3), 0)
cv2.imshow("After Smoothing", image)
cv2.waitKey(0)


def sobel(image):
    # run with col.png for best effect
    # cv2.Sobel last 2 argument -> order of derivatives in x and y direction respectively
    sobelX = cv2.Sobel(image, cv2.CV_64F, 1, 0)  # find vertical edges
    sobelY = cv2.Sobel(image, cv2.CV_64F, 0, 1)  # find horizontal edges along y-axis

    gradient_x = np.uint8(np.absolute(sobelX))
    gradient_y = np.uint8(np.absolute(sobelY))

    sobelCombined = cv2.bitwise_or(gradient_x, gradient_y)
    cv2.imshow("Sobel Combined", sobelCombined)
    cv2.waitKey(0)
    return sobelCombined


def counting_penguins(sobel, image):
    sobeled = sobel(image)
    _, edged = cv2.threshold(sobeled, 20, 255, cv2.THRESH_BINARY)
    cv2.imshow("(Edged)", edged)
    cv2.waitKey(0)
    cnts, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE,
    )

    canvas = np.ones(image.shape)
    cv2.drawContours(canvas, cnts, -1, (0, 255, 255), 1)
    cv2.imshow("Contour", canvas)
    cv2.waitKey(0)

    print(f"Found {len(cnts)} penguins")


if __name__ == "__main__":
    counting_penguins(sobel, image)


================================================
FILE: edgedetect/contourapprox.py
================================================
import cv2
import numpy as np


image = cv2.imread("homework/equal.png", flags=0)
cv2.imshow("Original", image)
cv2.waitKey(0)


def edge(image):
    _, edged = cv2.threshold(image, 220, 255, cv2.THRESH_BINARY_INV)
    cv2.imshow("(Edged)", edged)
    cv2.waitKey(0)
    cnts, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_SIMPLE,
    )
    print(f"Cnts Simple Shape (1): {cnts[0].shape}")
    print(f"Cnts Simple Shape (2): {cnts[0].shape}")
    cnts2, _ = cv2.findContours(
        # does this need to be changed?
        edged,
        cv2.RETR_EXTERNAL,
        cv2.CHAIN_APPROX_NONE,
    )
    print(f"Cnts NoApprox Shape:{cnts2[0].shape}")
    print(cnts)
    canvas = np.ones(image.shape)
    cv2.drawContours(canvas, cnts, -1, (0, 255, 255), 1)
    cv2.imshow("Contour", canvas)
    cv2.waitKey(0)
    print(f"Found {len(cnts)} shapes!")


if __name__ == "__main__":
    edge(image)


================================================
FILE: edgedetect/edgedetect.html
================================================
<!DOCTYPE html><html><head>
      <title>edgedetect</title>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      
      
        <script type="text/x-mathjax-config">
          MathJax.Hub.Config({"extensions":["tex2jax.js"],"jax":["input/TeX","output/HTML-CSS"],"messageStyle":"none","tex2jax":{"processEnvironments":false,"processEscapes":true,"inlineMath":[["$","$"],["\\(","\\)"]],"displayMath":[["$$","$$"],["\\[","\\]"]]},"TeX":{"extensions":["AMSmath.js","AMSsymbols.js","noErrors.js","noUndefined.js"]},"HTML-CSS":{"availableFonts":["TeX"]}});
        </script>
        <script type="text/javascript" async src="file:////Users/samuel/.vscode/extensions/shd101wyy.markdown-preview-enhanced-0.5.0/node_modules/@shd101wyy/mume/dependencies/mathjax/MathJax.js" charset="UTF-8"></script>
        
      
      

      
      
      
      
      
      
      

      <style>
      /**
 * prism.js Github theme based on GitHub's theme.
 * @author Sam Clarke
 */
code[class*="language-"],
pre[class*="language-"] {
  color: #333;
  background: none;
  font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
  text-align: left;
  white-space: pre;
  word-spacing: normal;
  word-break: normal;
  word-wrap: normal;
  line-height: 1.4;

  -moz-tab-size: 8;
  -o-tab-size: 8;
  tab-size: 8;

  -webkit-hyphens: none;
  -moz-hyphens: none;
  -ms-hyphens: none;
  hyphens: none;
}

/* Code blocks */
pre[class*="language-"] {
  padding: .8em;
  overflow: auto;
  /* border: 1px solid #ddd; */
  border-radius: 3px;
  /* background: #fff; */
  background: #f5f5f5;
}

/* Inline code */
:not(pre) > code[class*="language-"] {
  padding: .1em;
  border-radius: .3em;
  white-space: normal;
  background: #f5f5f5;
}

.token.comment,
.token.blockquote {
  color: #969896;
}

.token.cdata {
  color: #183691;
}

.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
  color: #333;
}

.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
  color: #a71d5d;
}

.token.string,
.token.url,
.token.regex,
.token.attr-value {
  color: #183691;
}

.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
  color: #0086b3;
}

.token.tag,
.token.selector,
.token.prolog {
  color: #63a35c;
}

.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
  color: #795da3;
}

.token.entity {
  cursor: help;
}

.token.title,
.token.title .token.punctuation {
  font-weight: bold;
  color: #1d3e81;
}

.token.list {
  color: #ed6a43;
}

.token.inserted {
  background-color: #eaffea;
  color: #55a532;
}

.token.deleted {
  background-color: #ffecec;
  color: #bd2c00;
}

.token.bold {
  font-weight: bold;
}

.token.italic {
  font-style: italic;
}


/* JSON */
.language-json .token.property {
  color: #183691;
}

.language-markup .token.tag .token.punctuation {
  color: #333;
}

/* CSS */
code.language-css,
.language-css .token.function {
  color: #0086b3;
}

/* YAML */
.language-yaml .token.atrule {
  color: #63a35c;
}

code.language-yaml {
  color: #183691;
}

/* Ruby */
.language-ruby .token.function {
  color: #333;
}

/* Markdown */
.language-markdown .token.url {
  color: #795da3;
}

/* Makefile */
.language-makefile .token.symbol {
  color: #795da3;
}

.language-makefile .token.variable {
  color: #183691;
}

.language-makefile .token.builtin {
  color: #0086b3;
}

/* Bash */
.language-bash .token.keyword {
  color: #0086b3;
}

/* highlight */
pre[data-line] {
  position: relative;
  padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
  position: absolute;
  top: 0;
  left: 0;
  background-color: transparent;
  display: block;
  width: 100%;
}

pre[data-line] .line-highlight {
  position: absolute;
  left: 0;
  right: 0;
  padding: inherit 0;
  margin-top: 1em;
  background: hsla(24, 20%, 50%,.08);
  background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
  pointer-events: none;
  line-height: inherit;
  white-space: pre;
}

pre[data-line] .line-highlight:before, 
pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-start);
  position: absolute;
  top: .4em;
  left: .6em;
  min-width: 1em;
  padding: 0 .5em;
  background-color: hsla(24, 20%, 50%,.4);
  color: hsl(24, 20%, 95%);
  font: bold 65%/1.5 sans-serif;
  text-align: center;
  vertical-align: .3em;
  border-radius: 999px;
  text-shadow: none;
  box-shadow: 0 1px white;
}

pre[data-line] .line-highlight[data-end]:after {
  content: attr(data-end);
  top: auto;
  bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% -  300px);padding:2em calc(50% - 457px -  150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/*   https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
.markdown-preview.markdown-preview h1,
.markdown-preview.markdown-preview h2,
.markdown-preview.markdown-preview h3,
.markdown-preview.markdown-preview h4,
.markdown-preview.markdown-preview h5,
.markdown-preview.markdown-preview h6 {
  font-weight: bolder;
  text-decoration-line: underline;
}

      </style>
    </head>
    <body for="html-export">
      <div class="mume markdown-preview  ">
      <h1 class="mume-header" id="definition">Definition</h1>

<p>An edge can be defined as boundary between regions in an image<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup>. Edge detection techniques we&apos;ll learn in this course builds upon what we&apos;ve learned from our lessons in kernel convolution. It is the process of using kernels to reduce the information in our data and preserving only the necessary structural properties in our image<sup class="footnote-ref"><a href="#fn1" id="fnref1:1">[1:1]</a></sup>.</p>
<h1 class="mume-header" id="gradient-based-edge-detection">Gradient-based Edge Detection</h1>

<p>Gradient points in the direction of the most rapid increase in intensity. When we apply a gradient based edge detection method, we are searching for the maximum and minimum in the first derivative of the image.</p>
<p>When we apply our convolution onto the image, we are finding for regions in the image where there&apos;s a sharp change in intensity or color. Arguably the most common edge detection method using this approach is the Sobel Operator.</p>
<h2 class="mume-header" id="sobel-operator">Sobel Operator</h2>

<p>The <code>Sobel</code> operator applies a filtering operation to produce an image output where the edge is emphasized. It convolves our original image using two 3x3 kernels to capture approximations of the derivatives in both the horizontal and vertical directions.</p>
<p>The x-direction and y-direction kernels would be:</p>
<p></p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}  G_y = \begin{bmatrix} 1 &amp; 2 &amp; 1 \\ 0 &amp; 0 &amp; 0 \\ -1 &amp; -2 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>Each kernel is applied separately to obtain the gradient component in each orientation, <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>. Expressed in formula, the gradient magnitude is:<br>
</p><div class="mathjax-exps">$$|G| = \sqrt{G^2_x + G^2_y}$$</div><p></p>
<p>Where the slope <span class="mathjax-exps">$\theta$</span> of the gradient is calculated as follow:<br>
</p><div class="mathjax-exps">$$\theta(x,y)=tan^{-1}(\frac{G_y}{G_x})$$</div><p></p>
<p>If the two formula above confuses you, read on as we unpack these ideas one at a time.</p>
<h3 class="mume-header" id="intuition-discrete-derivative">Intuition: Discrete Derivative</h3>

<p>In computer vision literature, you&apos;ll often hear about &quot;taking the derivative&quot; and this may erve as a source of confusion for beginning practitioners since &quot;derivatives&quot; is often thought of in the context of a continuous function. Images are a 2D matrix of discrete values, so how do we wrap our head around the idea of finding derivative?</p>
<p>But why do we even bother with derivatives when this course is suppopsed to be about edge detection in images?</p>
<p><img src="assets/derivatives.png" alt></p>
<p>Among the many ways to answer the question, my favorite being that image is really just a function. When it treat an image as a function, the utility of taking derivatives become a little more obvious. In the image below, supposed you want to count the number of windows in this area of Venezia Sestiere Cannaregio, your program can look for large derivatives since there are sharp changes in pixel intensity from the windows to the surrounding wall:</p>
<p><img src="assets/surface.png" alt></p>
<p>The code to generate the surface plot above is in <code>img2surface.py</code>.</p>
<p>Going back to our x-direction kernel in the Sobel Operator.<br>
This kernel has all 0 in the middle, which is quite easy to intuit about. Essentially, for each pixel in our image, we want to compute its derivative in the x-direction by approximating a formula that you may have come across in your calculus class:</p>
<p></p><div class="mathjax-exps">$$f&apos;(x) = \lim_{h\to0}\frac{f(x+h)-f(x)}{h}$$</div><p></p>
<p>This approximation is also called &apos;forward difference&apos;, because we&apos;re taking a value of <span class="mathjax-exps">$x$</span>, and computing the difference in <span class="mathjax-exps">$f(x)$</span> as we increment it by a small amount forward, denoted as <span class="mathjax-exps">$h$</span>.</p>
<p>And as it turns out, using the &apos;central difference&apos; to compute the derivative of our discrete signal can deliver better results<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup>:</p>
<p></p><div class="mathjax-exps">$$f&apos;(x) = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}$$</div><p></p>
<p>To make this more concrete, we can plug the formula into an actual array of pixels:</p>
<p></p><div class="mathjax-exps">$$[0, 255, 65, \underline{180}, 255, 255, 255]$$</div><p></p>
<p>when we set <span class="mathjax-exps">$h=2$</span> at the center pixel (index of value 180), we have the following:</p>
<p></p><div class="mathjax-exps">$$\begin{aligned} f&apos;(x) &amp; = \lim_{h\to0}\frac{f(x+0.5h)-f(x-0.5h)}{h}\\ &amp; = \frac{f(x+1)-f(x-1)}{2} \\ &amp; = \frac{255-65}{2} \\  &amp; = 95 \end{aligned}$$</div><p></p>
<p>Notice that a large part of the calculation we just perform is synonymous to a 1D convolution operation using a <span class="mathjax-exps">$\begin{bmatrix} -1 &amp; 0 &amp;  1 \end{bmatrix}$</span> kernel.</p>
<p>When the same 1x3 kernel <span class="mathjax-exps">$\begin{bmatrix} -1 &amp; 0 &amp;  1 \end{bmatrix}$</span> is applied on the right-most part of the image where its just white space ([..., 255, 255, 255]) the kernel would evaluate to 0. In other words, our derivative filter returns no response where it can&apos;t detect a sharp change in pixel intensity.</p>
<p>As a reminder, the x-direction kernel in our Sobel Operator is the following:<br>
</p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>This takes our 1x3 kernel and instead of convolving one row of pixels at a time, extends it to convolve at 3x3 neighborhoods at a time using a weighted average approach.</p>
<h3 class="mume-header" id="code-illustrations-sobel-operator">Code Illustrations: Sobel Operator</h3>

<p>The two kernels (one for horizontal and another for vertical edge detection) can be constructed, respectively, like the following:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">sobel_x <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>

sobel_y <span class="token operator">=</span> np<span class="token punctuation">.</span>array<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">2</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                    <span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">,</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</pre><p>You may have guessed that, given its role in digital image processing, <code>opencv</code> have included a method that performs our Sobel Operator for us, and thankfully there is. Here&apos;s an example of using the <code>cv2.Sobel(src, ddepth, dx, dy, dst=None, ksize)</code> method:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># Range: -177.0 | 204.0</span>

gradient_x <span class="token operator">=</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">(</span>np<span class="token punctuation">.</span>absolute<span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> np<span class="token punctuation">.</span>uint8<span class="token punctuation">(</span>np<span class="token punctuation">.</span>absolute<span class="token punctuation">(</span>gradient_y<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range uint8: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
<span class="token comment"># Range uint8: 0 | 204</span>

cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Gradient X&quot;</span><span class="token punctuation">,</span> gradient_x<span class="token punctuation">)</span>
cv2<span class="token punctuation">.</span>imshow<span class="token punctuation">(</span><span class="token string">&quot;Gradient Y&quot;</span><span class="token punctuation">,</span> gradient_y<span class="token punctuation">)</span>
</pre><p><img src="assets/sudokudemo.png" alt></p>
<p>The code above, extracted from <code>sobel_01.py</code> reinforces a couple of ideas that we&apos;ve been working on. It shows that:</p>
<ul>
<li>the <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>, gradients of the image, are computed separately through the convolution of two different Sobel kernels</li>
<li><span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span> responded to the change in pixel values along the x-direction and y-direction respectively, as visualized in the illustration above</li>
<li>convolution using the two Sobel filters may, and often will, produce a value outside the range of 0 and 255. Given the presence of [-1, -2, -1]  in one side of our kernel, mathematically this may lead to an output value of -1020. To store the values from these convolutions we use a 64-bit floating point (<code>cv2.CV_64F</code>). OpenCV suggests to &quot;keep the output datatype to some higher form such as <code>cv2.CV_64F</code>, take its absolute value and then convert back to <code>cv2.CV_8U</code>.<sup class="footnote-ref"><a href="#fn3" id="fnref3">[3]</a></sup>&quot;</li>
</ul>
<p>While the code above certainly works, OpenCV also has a method that scales, calculates absolute values and converts the result to 8-bit. <code>cv2.convertScaleAbs(src, dst, alpha=1, beta=0)</code> performs the following:<br>
</p><div class="mathjax-exps">$$dst(I) = cast&lt;uchar&gt;(|src(I) * \alpha + \beta|)$$</div><p></p>
<pre data-role="codeBlock" data-info="py" class="language-python">gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>Sobel<span class="token punctuation">(</span>img<span class="token punctuation">,</span> cv2<span class="token punctuation">.</span>CV_64F<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">,</span> ksize<span class="token operator">=</span><span class="token number">3</span><span class="token punctuation">)</span>

gradient_x <span class="token operator">=</span> cv2<span class="token punctuation">.</span>convertScaleAbs<span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span>
gradient_y <span class="token operator">=</span> cv2<span class="token punctuation">.</span>convertScaleAbs<span class="token punctuation">(</span>gradient_y<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string-interpolation"><span class="token string">f&quot;Range: </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">min</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string"> | </span><span class="token interpolation"><span class="token punctuation">{</span>np<span class="token punctuation">.</span><span class="token builtin">max</span><span class="token punctuation">(</span>gradient_x<span class="token punctuation">)</span><span class="token punctuation">}</span></span><span class="token string">&quot;</span></span><span class="token punctuation">)</span>
</pre><h3 class="mume-header" id="dive-deeper-gradient-orientation-magnitude">Dive Deeper: Gradient Orientation &amp; Magnitude</h3>

<p>At the beginning of this course I said that images are really just 2d functions before showing you the intricacies of our Sobel kernels. We saw the clever design of both the x- and y-direction kernels, by borrowing from the concept of &quot;taking the derivatives&quot; you often see in calculus text books.</p>
<p>But on a really basic level, these kernels only return the x and y edge responses. These are <strong>not the image gradient</strong>, just pure arithmetic values from following the convolution process. To get to the final form (where the edges in our image are emphasized) we still need to compute the gradient direction and magnitude for each point in our image.</p>
<p>This brings us back to our original formula. Recall that the x-direction and y-direction kernels are:</p>
<p></p><div class="mathjax-exps">$$G_x = \begin{bmatrix} 1 &amp; 0 &amp; -1 \\ 2 &amp; 0 &amp; -2 \\ 1 &amp; 0 &amp; -1  \end{bmatrix}  G_y = \begin{bmatrix} 1 &amp; 2 &amp; 1 \\ 0 &amp; 0 &amp; 0 \\ -1 &amp; -2 &amp; -1  \end{bmatrix}$$</div><p></p>
<p>We understand that each kernel is applied separately to obtain the gradient component in each orientation, <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span>. What is the significance of this? Well as it turns out if we know the shift in the x-direction and the corresponding change in value in the y-direction, then we can use the pythagorean theorem to approximate the &quot;length of the slope&quot;, a concept that many of you are familiar with.</p>
<p>Expressed in formula, the gradient magnitude is hence:<br>
</p><div class="mathjax-exps">$$|G| = \sqrt{G^2_x + G^2_y}$$</div><p></p>
<p>Along with the well-known mathematical formula that is Pythagorean theorem, some of you may also have some familiarity with the three trigonometric functions. Particularly, the tangent function tells us that in a right triangle, the <strong>tangent of an angle is the length of the opposite side divided by the length of the adjacent side</strong>.</p>
<p>This leads us to the following expression:<br>
</p><div class="mathjax-exps">$$tan(\theta_{(x,y)})=\frac{G_y}{G_x}$$</div><p></p>
<p>To rewrite the expression above, we arrive at the formula to capture the gradient&apos;s direction:<br>
</p><div class="mathjax-exps">$$\theta_{(x,y)}=tan^{-1}(\frac{G_y}{G_x})$$</div><p></p>
<p><img src="assets/2dfuncs.png" alt></p>
<p>This whole idea is also illustrated in code, and the script is provided to you:</p>
<ul>
<li><code>gradient.py</code> to generate the vector field in the picture above (right)</li>
<li><code>img2surface.py</code> on the penguin image in the <code>assets</code> folder generates the surface plot</li>
</ul>
<p>Succinctly, supposed the two 3x3 kernels do not fire a response (for example when no edges are detected in the white background of our penguin), both <span class="mathjax-exps">$G_x$</span> and <span class="mathjax-exps">$G_y$</span> will be 0, which leads to a gradient magnitude of 0. You can compute these by hand, let OpenCV&apos;s implementation handle that for you, or use <code>numpy</code> as illustrated in <code>gradient.py</code>:</p>
<pre data-role="codeBlock" data-info="py" class="language-python">dY<span class="token punctuation">,</span> dX <span class="token operator">=</span> np<span class="token punctuation">.</span>gradient<span class="token punctuation">(</span>img<span class="token punctuation">)</span>
</pre><h1 class="mume-header" id="image-segmentation">Image Segmentation</h1>

<p>Image segmentation is the process of decomposing an image into parts for further analysis. This has many utility:</p>
<ul>
<li>Background subtraction in human motion analysis</li>
<li>Multi-object classification</li>
<li>Find region of interest for OCR (optical character recognition)</li>
<li>Count pedestrians from a streamed video source</li>
<li>Isolating vehicle registration plates (license plate) and vehicle models from a busy highway scene</li>
</ul>
<p>Current literature on image segmentation techniques can be classified into<sup class="footnote-ref"><a href="#fn4" id="fnref4">[4]</a></sup>:</p>
<ul>
<li>Intensity-based segmentation</li>
<li>Edge-based segmentation</li>
<li>Region-based semantic segmentation</li>
</ul>
<p>It&apos;s important to note, however, that the rise in popularity of deep learning framework and techniques has ushered a proliferation of new methods to perform what was once a highly difficult t
Download .txt
gitextract_nnlovcxn/

├── .gitignore
├── README.md
├── digitrecognition/
│   ├── contourarea_01.py
│   ├── contourarea_02.py
│   ├── contourarea_03.py
│   ├── digit_01.py
│   ├── digitrec.html
│   ├── digitrec.md
│   ├── morphological_01.py
│   ├── morphological_02.py
│   ├── roi_01.py
│   ├── roi_02.py
│   └── utils/
│       └── enumerate.py
├── edgedetect/
│   ├── adaptivethresholding_01.py
│   ├── canny_01.py
│   ├── contour_01.py
│   ├── contourapprox.py
│   ├── edgedetect.html
│   ├── edgedetect.md
│   ├── gaussianblur_01.py
│   ├── gradient.py
│   ├── img2surface.py
│   ├── intensitythresholding_01.py
│   ├── kernel.html
│   ├── kernel.md
│   ├── meanblur_01.py
│   ├── meanblur_02.py
│   ├── meanblur_03.py
│   ├── sharpening_01.py
│   ├── sharpening_02.py
│   ├── sobel_01.py
│   ├── sobel_02.py
│   ├── sobel_03.py
│   ├── unsharpmask_01.py
│   ├── unsharpmask_02.py
│   └── utils/
│       └── gaussiancurve.r
├── quiz.md
├── requirements.txt
├── summarynotes/
│   └── class2201.md
└── transformation/
    ├── lecture_affine.html
    ├── lecture_affine.md
    ├── rotate_01.py
    ├── scale_01.py
    ├── scale_02.py
    ├── scale_03.py
    ├── scale_04.py
    ├── scale_05.py
    └── translate_01.py
Download .txt
SYMBOL INDEX (4 symbols across 3 files)

FILE: edgedetect/canny_01.py
  function sobel (line 9) | def sobel(img, k):

FILE: edgedetect/contour_01.py
  function sobel (line 15) | def sobel(image):
  function counting_penguins (line 30) | def counting_penguins(sobel, image):

FILE: edgedetect/contourapprox.py
  function edge (line 10) | def edge(image):
Condensed preview — 48 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (490K chars).
[
  {
    "path": ".gitignore",
    "chars": 41,
    "preview": "solutions/\n.DS_Store\n.vscode/\nanswers.md\n"
  },
  {
    "path": "README.md",
    "chars": 9469,
    "preview": "# Essentials of Computer Vision  \n\n![](assets/blurb.png)\n\nA math-first approach to learning computer vision in Python. T"
  },
  {
    "path": "digitrecognition/contourarea_01.py",
    "chars": 920,
    "preview": "import cv2\n\nBCOLOR = (75, 0, 130)\nTHICKNESS = 4\n\nimg_color = cv2.imread(\"assets/ocbc.jpg\")\nimg_color = cv2.resize(img_co"
  },
  {
    "path": "digitrecognition/contourarea_02.py",
    "chars": 1221,
    "preview": "import cv2\n\nPURPLE = (75, 0, 130)\nYELLOW = (0, 255, 255)\nTHICKNESS = 4\nFONT = cv2.FONT_HERSHEY_SIMPLEX\n\nimg_color = cv2."
  },
  {
    "path": "digitrecognition/contourarea_03.py",
    "chars": 1413,
    "preview": "import cv2\n\nPURPLE = (75, 0, 130)\nYELLOW = (0, 255, 255)\nTHICKNESS = 4\nFONT = cv2.FONT_HERSHEY_SIMPLEX\n\nimg_color = cv2."
  },
  {
    "path": "digitrecognition/digit_01.py",
    "chars": 4251,
    "preview": "import cv2\nimport numpy as np\n\nFONT = cv2.FONT_HERSHEY_SIMPLEX\nCYAN = (255, 255, 0)\nDIGITSDICT = {\n    (1, 1, 1, 1, 1, 1"
  },
  {
    "path": "digitrecognition/digitrec.html",
    "chars": 106584,
    "preview": "<!DOCTYPE html><html><head>\n      <title>digitrec</title>\n      <meta charset=\"utf-8\">\n      <meta name=\"viewport\" conte"
  },
  {
    "path": "digitrecognition/digitrec.md",
    "chars": 29686,
    "preview": "# Background\nIn Chapter 4: Digit Recognition, we'll add a few new techniques to our image processing toolset by attempti"
  },
  {
    "path": "digitrecognition/morphological_01.py",
    "chars": 796,
    "preview": "import cv2\nimport matplotlib.pyplot as plt\n\nroi = cv2.imread(\"inter/ocbc-roi.png\", flags=0)\nblurred = cv2.bilateralFilte"
  },
  {
    "path": "digitrecognition/morphological_02.py",
    "chars": 1187,
    "preview": "import cv2\nimport matplotlib.pyplot as plt\n\nroi = cv2.imread(\"assets/0417s.png\", flags=0)\ncv2.imshow(\"Original\", roi)\ncv"
  },
  {
    "path": "digitrecognition/roi_01.py",
    "chars": 457,
    "preview": "import cv2\nBCOLOR = (75, 0, 130)\nTHICKNESS = 4\n\nimg_color = cv2.imread(\"assets/ocbc.jpg\")\nimg_color = cv2.resize(img_col"
  },
  {
    "path": "digitrecognition/roi_02.py",
    "chars": 1078,
    "preview": "import cv2\nimport argparse\nimport re\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"-p\", \"--path\", default=\"as"
  },
  {
    "path": "digitrecognition/utils/enumerate.py",
    "chars": 539,
    "preview": "digits = ['a', 'b', 'c', 'd']\n\ncontracts = {\n    # salesperson: contract value, duration\n    'adam':(500, 2),\n    'brian"
  },
  {
    "path": "edgedetect/adaptivethresholding_01.py",
    "chars": 932,
    "preview": "import cv2\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimg = cv2.imread(\"assets/sudoku.jpg\", flags=0)\n_, img_thr"
  },
  {
    "path": "edgedetect/canny_01.py",
    "chars": 768,
    "preview": "import numpy as np\nimport cv2\nimport matplotlib.pyplot as plt\n\nimg = cv2.imread(\"assets/castello.png\", flags=0)\nimg = cv"
  },
  {
    "path": "edgedetect/contour_01.py",
    "chars": 1402,
    "preview": "import cv2\nimport numpy as np\n\n\nimage = cv2.imread(\"assets/pens.png\")\nimage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)\ncv"
  },
  {
    "path": "edgedetect/contourapprox.py",
    "chars": 974,
    "preview": "import cv2\nimport numpy as np\n\n\nimage = cv2.imread(\"homework/equal.png\", flags=0)\ncv2.imshow(\"Original\", image)\ncv2.wait"
  },
  {
    "path": "edgedetect/edgedetect.html",
    "chars": 67231,
    "preview": "<!DOCTYPE html><html><head>\n      <title>edgedetect</title>\n      <meta charset=\"utf-8\">\n      <meta name=\"viewport\" con"
  },
  {
    "path": "edgedetect/edgedetect.md",
    "chars": 27865,
    "preview": "# Definition\nAn edge can be defined as boundary between regions in an image[^1]. Edge detection techniques we'll learn i"
  },
  {
    "path": "edgedetect/gaussianblur_01.py",
    "chars": 325,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/canal.png\")\n\nmeanblurred = cv2.blur(img, K"
  },
  {
    "path": "edgedetect/gradient.py",
    "chars": 533,
    "preview": "import numpy as np\nimport matplotlib as mpl\nimport matplotlib.pyplot as plt\nimport cv2\n\nimg = cv2.imread(\"assets/pen.jpg"
  },
  {
    "path": "edgedetect/img2surface.py",
    "chars": 690,
    "preview": "import numpy as np\nimport cv2\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\n\nimg = cv2.imread("
  },
  {
    "path": "edgedetect/intensitythresholding_01.py",
    "chars": 777,
    "preview": "import cv2\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimg = cv2.imread(\"assets/sudoku.jpg\", flags=0)\n\n_, img_th"
  },
  {
    "path": "edgedetect/kernel.html",
    "chars": 60141,
    "preview": "<!DOCTYPE html><html><head>\n      <title>kernel</title>\n      <meta charset=\"utf-8\">\n      <meta name=\"viewport\" content"
  },
  {
    "path": "edgedetect/kernel.md",
    "chars": 20679,
    "preview": "# Kernels\n## Definition\nWhen performing an arithmetic computation on a given image, one approach is to apply said comput"
  },
  {
    "path": "edgedetect/meanblur_01.py",
    "chars": 677,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/canal.png\")\ngray = cv2.cvtColor(img, cv2.C"
  },
  {
    "path": "edgedetect/meanblur_02.py",
    "chars": 759,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/canal.png\")\ngray = cv2.cvtColor(img, cv2.C"
  },
  {
    "path": "edgedetect/meanblur_03.py",
    "chars": 750,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/canal.png\")\ngray = cv2.cvtColor(img, cv2.C"
  },
  {
    "path": "edgedetect/sharpening_01.py",
    "chars": 557,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/canal.png\")\ngray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)\n\nfor i "
  },
  {
    "path": "edgedetect/sharpening_02.py",
    "chars": 485,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/canal.png\")\n\ncv2.imshow(\"Original\", img)\ncv2.waitKey(0)\n\napprox_"
  },
  {
    "path": "edgedetect/sobel_01.py",
    "chars": 704,
    "preview": "import numpy as np\nimport cv2\nimport matplotlib.pyplot as plt\n\nimg = cv2.imread(\"assets/sudoku.jpg\", 0)\nimg = cv2.median"
  },
  {
    "path": "edgedetect/sobel_02.py",
    "chars": 1087,
    "preview": "import numpy as np\nimport cv2\nimport matplotlib.pyplot as plt\n\nimg_original = cv2.imread(\"assets/castello.png\")\nimg_orig"
  },
  {
    "path": "edgedetect/sobel_03.py",
    "chars": 583,
    "preview": "import numpy as np\nimport cv2\nimport matplotlib.pyplot as plt\n\nimg = cv2.imread(\"assets/castello.png\", flags=0)\nimg = cv"
  },
  {
    "path": "edgedetect/unsharpmask_01.py",
    "chars": 300,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/sarpi.png\")\ncv2.imshow(\"Original\", img)\ncv"
  },
  {
    "path": "edgedetect/unsharpmask_02.py",
    "chars": 595,
    "preview": "import numpy as np\nimport cv2\n\nKERNEL_SIZE = (5, 5)\n\nimg = cv2.imread(\"assets/sarpi.png\")\ncv2.imshow(\"Original\", img)\ncv"
  },
  {
    "path": "edgedetect/utils/gaussiancurve.r",
    "chars": 100,
    "preview": "x <- seq(-3, 3, length=1000000)\ny <- dnorm(x, mean=0, sd=1)\nplot(x, y, type=\"l\", lwd=1, ylab=\"g(x)\")"
  },
  {
    "path": "quiz.md",
    "chars": 2239,
    "preview": "## Affine Transformation\n\n1. Which of the following constructs the correct transformation matrix to perform a 2x scaling"
  },
  {
    "path": "requirements.txt",
    "chars": 426,
    "preview": "cycler==0.10.0  \ndecorator==4.4.1   \nimageio==2.6.1   \nimutils==0.5.3   \njoblib==0.14.0  \nkiwisolver==1.1.0   \nmahotas=="
  },
  {
    "path": "summarynotes/class2201.md",
    "chars": 1618,
    "preview": "# Computer Vision (Chapter 1 to 3)\n\n## Administrative Details\n- Prerequisites:\n    - Python 3\n    - OpenCV\n    - Numpy ("
  },
  {
    "path": "transformation/lecture_affine.html",
    "chars": 104980,
    "preview": "<!DOCTYPE html><html><head>\n      <title>lecture_affine</title>\n      <meta charset=\"utf-8\">\n      <meta name=\"viewport\""
  },
  {
    "path": "transformation/lecture_affine.md",
    "chars": 11526,
    "preview": "# Affine Transformation\n\n## Definition\nAny transformation that can be expressed in the form of a _matrix multiplication_"
  },
  {
    "path": "transformation/rotate_01.py",
    "chars": 411,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/cvess.png\")\ncv2.imshow(\"Original\", img)\ncv2.waitKey(0)\n(h, w) = "
  },
  {
    "path": "transformation/scale_01.py",
    "chars": 585,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/corgi.png\")\ncv2.circle(img, (10, 10), 4, (0, 255, 255), -1)\ncv2."
  },
  {
    "path": "transformation/scale_02.py",
    "chars": 441,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/corgi.png\")\ncv2.circle(img, (10, 10), 4, (255, 0, 0), -1)\ncv2.ci"
  },
  {
    "path": "transformation/scale_03.py",
    "chars": 304,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/corgi.png\")\ncv2.imshow(\"Original\", img)\n\n# custom transformation"
  },
  {
    "path": "transformation/scale_04.py",
    "chars": 317,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/corgi.png\")\ncv2.imshow(\"Original\", img)\n\n# custom transformation"
  },
  {
    "path": "transformation/scale_05.py",
    "chars": 585,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/corgi.png\")\ncv2.circle(img, (10, 10), 4, (0, 255, 255), -1)\ncv2."
  },
  {
    "path": "transformation/translate_01.py",
    "chars": 302,
    "preview": "import numpy as np\nimport cv2\n\nimg = cv2.imread(\"assets/cvess.png\")\ncv2.imshow(\"Original\", img)\ncv2.waitKey(0)\n(h, w) = "
  }
]

About this extraction

This page contains the full source code of the onlyphantom/cvessentials GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 48 files (459.3 KB), approximately 133.6k tokens, and a symbol index with 4 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!