Full Code of clovaai/CLEval for AI

master 513623c4a583 cached

19 files

91.0 KB

24.1k tokens

120 symbols

1 requests

Download .txt

Repository: clovaai/CLEval
Branch: master
Commit: 513623c4a583
Files: 19
Total size: 91.0 KB

Directory structure:
gitextract_aw6ps_j3/

├── .github/
│   ├── pull_request_template.md
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── LICENSE
├── NOTICE
├── README.md
├── cleval/
│   ├── __init__.py
│   ├── arg_parser.py
│   ├── box_types.py
│   ├── data.py
│   ├── eval_functions.py
│   ├── main.py
│   ├── torchmetric.py
│   ├── utils.py
│   └── validation.py
├── pyproject.toml
├── setup.py
└── tests/
    ├── __init__.py
    └── test_scores.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/pull_request_template.md
================================================
# Description

- Related issues:
  - #

# Changes in this PR

# How has this been tested?

# Checklist
- [ ] This PR follows the coding-style of this project
- [ ] I have tested these changes
- [ ] I have commented hard-to-understand codes


================================================
FILE: .github/workflows/ci.yml
================================================
name: CI

on: pull_request

jobs:
  black:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Setup python
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - name: Upgrade pip
      run: pip install --upgrade pip
    - name: Install black
      run: pip install --upgrade black==23.1.0
    - name: Run black
      run: black --check .

  isort:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Setup python
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - name: Upgrade pip
      run: pip install --upgrade pip
    - name: Install isort
      run: pip install --upgrade isort==5.12.0
    - name: Run isort
      working-directory: ./cleval
      run: isort --profile black --check .

  pytest:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v2
    - name: Setup python
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - name: Upgrade pip
      run: pip install --upgrade pip && pip install -U setuptools wheel
    - name: Update apt
      run: sudo apt update
    - name: Install pre-requirements
      run: sudo apt install -y libyajl2 libyajl-dev libleveldb-dev libgl1-mesa-glx libglib2.0-0
    - name: Install cleval
      run: pip install six && pip install --force-reinstall --no-cache-dir cleval opencv-python-headless
    - name: Install pytest
      run: pip install --upgrade pytest
    - name: Run pytest
      run: pytest


================================================
FILE: .gitignore
================================================
__pycache__/
.vscode
.DS_Store
.idea
output/
.pytest_cache
.mypy_cache
build/
dist/
*.egg-info/

venv
debug*
tmp*
profile.txt


================================================
FILE: LICENSE
================================================
Copyright (c) 2020-present NAVER Corp.

 Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

 The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


================================================
FILE: NOTICE
================================================
CLEval
Copyright (c) 2020-present NAVER Corp.

This project contains subcomponents with separate copyright notices and license terms. 
Your use of the source code for these subcomponents is subject to the terms and conditions of the following licenses.

=====

=====

CLEval solves the drawbacks of previous detection and end-to-end metrics such as IoU and DetEval. 
This code is based on ICDAR15 official evaluation code from https://rrc.cvc.uab.es/.

=====

jquery/jquery
http://jquery.com/


Copyright JS Foundation and other contributors, https://js.foundation/

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=====

jquery/jquery-ui
https://github.com/jquery/jquery-ui


Copyright jQuery Foundation and other contributors, https://jquery.org/

This software consists of voluntary contributions made by many
individuals. For exact contribution history, see the revision history
available at https://github.com/jquery/jquery-ui

The following license applies to all parts of this software except as
documented below:

====

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

====

Copyright and related rights for sample code are waived via CC0. Sample
code is defined as all source code contained within the demos directory.

CC0: http://creativecommons.org/publicdomain/zero/1.0/

====

All files located in the node_modules and external directories are
externally maintained libraries used by this software which have their
own licenses; we recommend you read them, as their terms may differ from
the terms above.

=====

malsup/form
https://github.com/malsup/form


Copyright 2006-2013 (c) M. Alsup

All versions, present and past, of the jQuery Form plugin are dual licensed under the MIT and GPL licenses:

MIT
GPL
You may use either license. The MIT License is recommended for most projects because it is simple and easy to understand and it places almost no restrictions on what you can do with the plugin.

If the GPL suits your project better you are also free to use the plugin under that license.

You don't have to do anything special to choose one license or the other and you don't have to notify anyone which license you are using. You are free to use the jQuery Form Plugin in commercial projects as long as the copyright header is left intact.

-----

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


=====

cs-chan/Total-Text-Dataset
https://github.com/cs-chan/Total-Text-Dataset

Copyright (c) 2018, Chee Seng Chan
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of Total-Text nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

=====

ICDAR 2013, ICDAR 2015 ground-truth annotation. (gt/gt_IC13.zip, gt/gt_IC15.zip)
https://rrc.cvc.uab.es/?ch=2&com=tasks, https://rrc.cvc.uab.es/?ch=4&com=tasks

The "Incidental Scene Text(ICDAR2015)" dataset and corresponding annotations are licensed under
a Creative Commons Attribution 4.0 License(https://creativecommons.org/licenses/by/4.0/).

=====

================================================
FILE: README.md
================================================
# CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks

Official implementation of CLEval | [paper](https://arxiv.org/abs/2006.06244)

## Overview
We propose a Character-Level Evaluation metric (CLEval). To perform fine-grained assessment of the results, *instance matching* process handles granularity difference and *scoring process* conducts character-level evaluation. Please refer to the paper for more details. This code is based on [ICDAR15 official evaluation code](http://rrc.cvc.uab.es/).

### 2023.10.16 Huge Update
- **Much More Faster Version** of CLEval has been Uploaded!!
- Support CLI 
- Support torchmetric
- Support scale-wise evaluation


### Simplified Method Description
![Explanation](resources/screenshots/explanation.gif)

## Supported annotation types
* **LTRB**(xmin, ymin, xmax, ymax)
* **QUAD**(x1, y1, x2, y2, x3, y3, x4, y4)
* **POLY**(x1, y1, x2, y2, ..., x_2n, y_2n)

## Supported datasets
* ICDAR 2013 Focused Scene Text [Link](https://rrc.cvc.uab.es/?ch=2)
* ICDAR 2015 Incidental Scene Text [Link](https://rrc.cvc.uab.es/?ch=4)
* TotalText [Link](https://github.com/cs-chan/Total-Text-Dataset)
* Any other datasets that have a similar format with the datasets mentioned above

## Installation

### Build from pip
download from Clova OCR pypi
```bash
$ pip install cleval
```

or build with url
```bash
$ pip install git+https://github.com/clovaai/CLEval.git --user
```

### Build from source

```bash
$ git clone https://github.com/clovaai/CLEval.git
$ cd cleval
$ python setup.py install --user
```

## How to use
You can replace `cleval` with `PYTHONPATH=$PWD python cleval/main.py` for evaluation using source.
```bash
$ PYTHONPATH=$PWD python cleval/main.py -g=gt/gt_IC13.zip -s=[result.zip] --BOX_TYPE=LTRB 
```

### Detection evaluation (CLI)
```bash
$ cleval -g=gt/gt_IC13.zip -s=[result.zip] --BOX_TYPE=LTRB          # IC13
$ cleval -g=gt/gt_IC15.zip -s=[result.zip]                          # IC15
$ cleval -g=gt/gt_TotalText.zip -s=[result.zip] --BOX_TYPE=POLY     # TotalText
```
* Notes
  * The default value of ```BOX_TYPE``` is set to ```QUAD```. It can be explicitly set to ```--BOX_TYPE=QUAD``` when running evaluation on IC15 dataset.
  * Add ```--TANSCRIPTION``` option if the result file contains transcription.
  * Add ```--CONFIDENCES``` option if the result file contains confidence.

### End-to-end evaluation (CLI)
```bash
$ cleval -g=gt/gt_IC13.zip -s=[result.zip] --E2E --BOX_TYPE=LTRB        # IC13
$ cleval -g=gt/gt_IC15.zip -s=[result.zip] --E2E                        # IC15
$ cleval -g=gt/gt_TotalText.zip -s=[result.zip] --E2E --BOX_TYPE=POLY   # TotalText
```
* Notes
  * Adding ```--E2E``` also automatically adds ```--TANSCRIPTION``` option. Make sure that the transcriptions are included in the result file.  
  * Add ```--CONFIDENCES``` option if the result file contains confidence.

### TorchMetric
```python
from cleval import CLEvalMetric
metric = CLEvalMetric()

for gt, det in zip(gts, dets):
    # your fancy algorithm
    # ...
    # gt_quads = ...
    # det_quads = ...
    # ...
    _ = metric(det_quads, gt_quads, det_letters, gt_letters, gt_is_dcs)

metric_out = metric.compute()
metric.reset()
```

### Profiling
```bash
$ cleval -g=resources/test_data/gt/gt_eval_doc_v1_kr_single.zip -s=resources/test_data/pred/res_eval_doc_v1_kr_single.zip --E2E -v --DEBUG --PPROFILE > profile.txt
$ PYTHONPATH=$PWD python cleval/main.py -g resources/test_data/gt/dummy_dataset_val.json -s resources/test_data/pred/dummy_dataset_val.json --SCALE_WISE --DOMAIN_WISE --ORIENTATION --E2E --ORIENTATION -v --PROFILE --DEBUG > profile.txt
```

### Paramters for evaluation script
| name | type | default | description |
| ---- | ---- | ------- | ---- |
| -g | ```string``` | | path to ground truth zip file |
| -s | ```string``` | | path to result zip file |
| -o | ```string``` | | path to save per-sample result file 'results.zip' |

| name | type | default | description |
| ---- | ---- | ------- | ---- |
| --BOX_TYPE | ```string``` | ```QUAD``` | annotation type of box (LTRB, QUAD, POLY) |
| --TRANSCRIPTION | ```boolean``` | ```False``` | set True if result file has transcription |
| --CONFIDENCES | ```boolean``` | ```False``` | set True if result file has confidence |
| --E2E | ```boolean``` | ```False``` | to measure end-to-end evaluation (if not, detection evalution only) |
| --CASE_SENSITIVE | ```boolean``` | ```True``` | set True to evaluate case-sensitively. (only used in end-to-end evaluation) |
* Note : Please refer to ```arg_parser.py``` file for additional parameters and default settings used internally.

* Note : For scalewise evaluation, we measure the ratio of the shorter length (text height) of the text-box to the longer length of the image. 
Through this, evaluation for each ratio can be performed. To adjust the scales, please use SCALE_BINS argument.

## Citation
```
@article{baek2020cleval,
  title={CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks},
  author={Youngmin Baek, Daehyun Nam, Sungrae Park, Junyeop Lee, Seung Shin, Jeonghun Baek, Chae Young Lee and Hwalsuk Lee},
  journal={arXiv preprint arXiv:2006.06244},
  year={2020}
}
```

## Contact us
CLEval has been proposed to make fair evaluation in the OCR community, so we want to hear from many researchers. We welcome any feedbacks to our metric, and appreciate pull requests if you have any comments or improvements.

## License
```
Copyright (c) 2020-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
```

### Contribute
Please use pre-commit which uses Black and Isort.
```
$ pip install pre-commit
$ pre-commit install
```

##### Step By Step
1. Write an issue.
2. Match code style (black, isort)
3. Wirte test code.
4. Delete branch after Squash&Merge.

Required Approve: 1

## Code Maintainer
- Donghyun Kim (artit.anthony@gmail.com)


================================================
FILE: cleval/__init__.py
================================================
from .torchmetric import CLEvalMetric

__version__ = ["0.1.1"]

__all__ = ["CLEvalMetric"]


================================================
FILE: cleval/arg_parser.py
================================================
import argparse
import os

from cleval.utils import cpu_count


def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ("yes", "true", "t", "y", "1"):
        return True
    elif v.lower() in ("no", "false", "f", "n", "0"):
        return False
    else:
        raise argparse.ArgumentTypeError("Boolean value expected.")


def get_params():
    parser = argparse.ArgumentParser(description="test global argument parser")

    # script parameters
    parser.add_argument("-g", "--GT_PATHS", nargs="+", help="Path of the Ground Truth files.")
    parser.add_argument("-s", "--SUBMIT_PATHS", nargs="+", help="Path of your method's results file.")

    # webserver parameters
    parser.add_argument(
        "-o",
        "--OUTPUT_PATH",
        default="output/",
        help="Path to a directory where to copy the file that" " contains per-sample results.",
    )
    parser.add_argument("--DUMP_SAMPLE_RESULT", action="store_true")
    parser.add_argument("-p", "--PORT", default=8080, help="port number to show")

    # result format related parameters
    parser.add_argument("--BOX_TYPE", default="QUAD", choices=["LTRB", "QUAD", "POLY"])
    parser.add_argument("--TRANSCRIPTION", action="store_true")
    parser.add_argument("--CONFIDENCES", action="store_true")
    parser.add_argument("--CRLF", action="store_true")

    # end-to-end related parameters
    parser.add_argument("--E2E", action="store_true")
    parser.add_argument("--CASE_SENSITIVE", default=True, type=str2bool)
    parser.add_argument("--RECOG_SCORE", default=True, type=str2bool)

    # evaluation related parameters
    parser.add_argument("--AREA_PRECISION_CONSTRAINT", type=float, default=0.3)
    parser.add_argument("--RECALL_GRANULARITY_PENALTY_WEIGHT", type=float, default=1.0)
    parser.add_argument("--PRECISION_GRANULARITY_PENALTY_WEIGHT", type=float, default=1.0)
    parser.add_argument("--VERTICAL_ASPECT_RATIO_THRESH", default=0.5)

    # orientation evaluation
    parser.add_argument("--ORIENTATION", action="store_true")

    # scale-wise evaluation  (
    parser.add_argument("--SCALE_WISE", action="store_true")  # scale-wise evaluation
    parser.add_argument("--SCALE_BINS", default=(0.0, 0.005, 0.01, 0.015, 0.02, 0.025, 0.1, 0.5, 1.0))

    # other parameters
    parser.add_argument("-t", "--NUM_WORKERS", default=-1, type=int, help="number of threads to use")
    parser.add_argument(
        "-v",
        "--VERBOSE",
        default=False,
        action="store_true",
        help="print evaluation progress or not",
    )
    parser.add_argument("--DEBUG", action="store_true")
    parser.add_argument("--PROFILE", action="store_true")

    args = parser.parse_args()
    assert len(args.GT_PATHS) == len(args.SUBMIT_PATHS) == 1

    if args.NUM_WORKERS == -1:
        args.NUM_WORKERS = cpu_count()

    # We suppose there always exist transcription information on end-to-end evaluation
    if args.E2E:
        args.TRANSCRIPTION = True

    os.makedirs(args.OUTPUT_PATH, exist_ok=True)
    return args


if __name__ == "__main__":
    from pprint import pprint

    args = get_params()
    pprint(args)


================================================
FILE: cleval/box_types.py
================================================
import abc
import math

import cv2
import numpy as np
import Polygon as polygon3
from shapely.geometry import Point
from shapely.geometry import Polygon as shapely_poly

MAX_FIDUCIAL_POINTS = 50


def get_midpoints(p1, p2):
    return (p1[0] + p2[0]) / 2, (p1[1] + p2[1]) / 2


def point_distance(p1, p2):
    distx = math.fabs(p1[0] - p2[0])
    disty = math.fabs(p1[1] - p2[1])
    return math.sqrt(distx * distx + disty * disty)


class Box(metaclass=abc.ABCMeta):
    def __init__(
        self,
        points,
        confidence,
        transcription,
        orientation=None,
        is_dc=None,
    ):
        self.points = points
        self.confidence = confidence
        self.transcription = transcription
        self.orientation = orientation
        self.is_dc = transcription == "###" if is_dc is None else is_dc

    @abc.abstractmethod
    def __and__(self, other) -> float:
        """Returns intersection between two objects"""
        pass

    @abc.abstractmethod
    def subtract(self, other):
        """polygon subtraction"""
        pass

    @abc.abstractmethod
    def center(self):
        pass

    @abc.abstractmethod
    def center_distance(self, other):
        """center distance between each box"""

    @abc.abstractmethod
    def diagonal_length(self) -> float:
        """Returns diagonal length for box-level"""
        pass

    @abc.abstractmethod
    def is_inside(self, x, y) -> bool:
        """Returns point (x, y) is inside polygon."""
        pass

    @abc.abstractmethod
    def make_polygon_obj(self):
        # TODO: docstring 좀 더 자세히 적기
        """Make polygon object to calculate for future"""
        pass

    @abc.abstractmethod
    def pseudo_character_center(self, *args) -> list:
        """get character level boxes for TedEval pseudo center"""
        pass


class QUAD(Box):
    """Points should be x1,y1,...,x4,y4 (8 points) format"""

    def __init__(
        self,
        points,
        confidence=0.0,
        transcription="",
        orientation=None,
        is_dc=None,
        scale=None,
    ):
        super().__init__(points, confidence, transcription, orientation, is_dc)
        self.polygon = self.make_polygon_obj()
        self.scale = scale
        if self.is_dc:
            self.transcription = "#" * self.pseudo_transcription_length()

    def __and__(self, other) -> float:
        """Get intersection between two area"""
        poly_intersect = self.polygon & other.polygon
        if len(poly_intersect) == 0:
            return 0.0
        return poly_intersect.area()

    def subtract(self, other):
        self.polygon = self.polygon - other.polygon

    def center(self):
        return self.polygon.center()

    def center_distance(self, other):
        return point_distance(self.center(), other.center())

    def area(self):
        return self.polygon.area()

    def __or__(self, other):
        return self.polygon.area() + other.polygon.area() - (self & other)

    def make_polygon_obj(self):
        point_matrix = np.empty((4, 2), np.int32)
        point_matrix[0][0] = int(self.points[0])
        point_matrix[0][1] = int(self.points[1])
        point_matrix[1][0] = int(self.points[2])
        point_matrix[1][1] = int(self.points[3])
        point_matrix[2][0] = int(self.points[4])
        point_matrix[2][1] = int(self.points[5])
        point_matrix[3][0] = int(self.points[6])
        point_matrix[3][1] = int(self.points[7])
        return polygon3.Polygon(point_matrix)

    def aspect_ratio(self):
        top_side = point_distance((self.points[0], self.points[1]), (self.points[2], self.points[3]))
        right_side = point_distance((self.points[2], self.points[3]), (self.points[4], self.points[5]))
        bottom_side = point_distance((self.points[4], self.points[5]), (self.points[6], self.points[7]))
        left_side = point_distance((self.points[6], self.points[7]), (self.points[0], self.points[1]))
        avg_hor = (top_side + bottom_side) / 2
        avg_ver = (right_side + left_side) / 2

        return (avg_hor + 1e-5) / (avg_ver + 1e-5)

    def pseudo_transcription_length(self):
        return min(round(0.5 + (max(self.aspect_ratio(), 1 / self.aspect_ratio()))), 10)

    def pseudo_character_center(self, vertical_aspect_ratio_threshold):
        chars = list()
        length = len(self.transcription)
        aspect_ratio = self.aspect_ratio()

        if length == 0:
            return chars

        if aspect_ratio >= vertical_aspect_ratio_threshold:
            left_top = self.points[0], self.points[1]
            right_top = self.points[2], self.points[3]
            right_bottom = self.points[4], self.points[5]
            left_bottom = self.points[6], self.points[7]
        else:
            left_top = self.points[6], self.points[7]
            right_top = self.points[0], self.points[1]
            right_bottom = self.points[2], self.points[3]
            left_bottom = self.points[4], self.points[5]

        p1 = get_midpoints(left_top, left_bottom)
        p2 = get_midpoints(right_top, right_bottom)

        unit_x = (p2[0] - p1[0]) / length
        unit_y = (p2[1] - p1[1]) / length

        for i in range(length):
            x = p1[0] + unit_x / 2 + unit_x * i
            y = p1[1] + unit_y / 2 + unit_y * i
            chars.append((x, y))
        return chars

    def diagonal_length(self) -> float:
        left_top = self.points[0], self.points[1]
        right_top = self.points[2], self.points[3]
        right_bottom = self.points[4], self.points[5]
        left_bottom = self.points[6], self.points[7]
        diag1 = point_distance(left_top, right_bottom)
        diag2 = point_distance(right_top, left_bottom)
        return (diag1 + diag2) / 2

    def is_inside(self, x, y) -> bool:
        return self.polygon.isInside(x, y)


class POLY(Box):
    """Points should be x1,y1,...,xn,yn (2*n points) format"""

    def __init__(self, points, confidence=0.0, transcription="", orientation=None, is_dc=None):
        super().__init__(points, confidence, transcription, orientation, is_dc)
        self.num_points = len(self.points) // 2
        self.polygon = self.make_polygon_obj()
        self._aspect_ratio = self.make_aspect_ratio()
        if self.is_dc:
            self.transcription = "#" * self.pseudo_transcription_length()

    def __and__(self, other):
        """Get intersection between two area"""
        poly_intersect = self.polygon.intersection(other.polygon)
        return poly_intersect.area

    def subtract(self, other):
        """get substraction"""
        self.polygon = self.polygon.difference(self.polygon.intersection(other.polygon))

    def __or__(self, other):
        return 1.0

    def area(self):
        return self.polygon.area

    def center(self):
        return self.polygon.centroid.coords[0]

    def center_distance(self, other):
        try:
            return point_distance(self.center(), other.center())
        except:
            return 0.0001

    def diagonal_length(self):
        left_top = self.points[0], self.points[1]
        right_top = self.points[self.num_points - 2], self.points[self.num_points - 1]
        right_bottom = self.points[self.num_points], self.points[self.num_points + 1]
        left_bottom = (
            self.points[self.num_points * 2 - 2],
            self.points[self.num_points * 2 - 1],
        )

        diag1 = point_distance(left_top, right_bottom)
        diag2 = point_distance(right_top, left_bottom)

        return (diag1 + diag2) / 2

    def is_inside(self, x, y) -> bool:
        return self.polygon.contains(Point(x, y))

    def check_corner_points_are_continuous(self, lt, rt, rb, lb):
        counter = 0
        while lt != rt:
            lt = (lt + 1) % self.num_points
            counter += 1

        while rb != lb:
            rb = (rb + 1) % self.num_points
            counter += 1

        return True

    def get_four_max_distance_from_center(self):
        center_x, center_y = self.center()
        distance_from_center = list()
        point_x = self.points[0::2]
        point_y = self.points[1::2]

        for px, py in zip(point_x, point_y):
            distance_from_center.append(point_distance((center_x, center_y), (px, py)))

        distance_idx_max_order = np.argsort(distance_from_center)[::-1]
        return distance_idx_max_order[:4]

    def make_polygon_obj(self):
        point_x = self.points[0::2]
        point_y = self.points[1::2]
        # In TotalText dataset, there are under 4 points annotation for Polygon shape.
        # so, we have to deal with it

        # if points are given 3, fill last quad points with left bottom coordinates
        if len(point_x) == len(point_y) == 3:
            point_x.append(point_x[0])
            point_y.append(point_y[2])
            self.points.append(point_x[0])
            self.points.append(point_y[2])
            self.num_points = len(self.points) // 2

        # if points are given 2, copy value 2 times
        elif len(point_x) == len(point_y) == 2:
            point_x *= 2
            point_y *= 2
            self.points.append(point_x[1])
            self.points.append(point_y[0])
            self.points.append(point_x[0])
            self.points.append(point_y[1])
            self.num_points = len(self.points) // 2

        # if points are given 1, copy value 4 times
        elif len(point_x) == len(point_y) == 1:
            point_x *= 4
            point_y *= 4
            for _ in range(3):
                self.points.append(point_x[0])
                self.points.append(point_x[0])
            self.num_points = len(self.points) // 2
        return shapely_poly(np.stack([point_x, point_y], axis=1)).buffer(0)

    def aspect_ratio(self):
        return self._aspect_ratio

    def pseudo_transcription_length(self):
        return min(round(0.5 + (max(self._aspect_ratio, 1 / self._aspect_ratio))), 10)

    def make_aspect_ratio(self):
        np.array(np.reshape(self.points, [-1, 2]))
        rect = cv2.minAreaRect(np.array(np.reshape(self.points, [-1, 2]), dtype=np.float32))
        width = rect[1][0]
        height = rect[1][1]

        width += 1e-6
        height += 1e-6

        return min(10, height / width) + 1e5

    def pseudo_character_center(self):
        chars = list()
        length = len(self.transcription)

        # Prepare polygon line estimation with interpolation
        point_x = self.points[0::2]
        point_y = self.points[1::2]
        points_x_top = point_x[: self.num_points // 2]
        points_x_bottom = point_x[self.num_points // 2 :]
        points_y_top = point_y[: self.num_points // 2]
        points_y_bottom = point_y[self.num_points // 2 :]

        # reverse bottom point order from left to right
        points_x_bottom = points_x_bottom[::-1]
        points_y_bottom = points_y_bottom[::-1]

        num_interpolation_section = (self.num_points // 2) - 1
        num_points_to_interpolate = length

        new_point_x_top, new_point_x_bottom = list(), list()
        new_point_y_top, new_point_y_bottom = list(), list()

        for sec_idx in range(num_interpolation_section):
            start_x_top, end_x_top = points_x_top[sec_idx], points_x_top[sec_idx + 1]
            start_y_top, end_y_top = points_y_top[sec_idx], points_y_top[sec_idx + 1]
            start_x_bottom, end_x_bottom = (
                points_x_bottom[sec_idx],
                points_x_bottom[sec_idx + 1],
            )
            start_y_bottom, end_y_bottom = (
                points_y_bottom[sec_idx],
                points_y_bottom[sec_idx + 1],
            )

            diff_x_top = (end_x_top - start_x_top) / num_points_to_interpolate
            diff_y_top = (end_y_top - start_y_top) / num_points_to_interpolate
            diff_x_bottom = (end_x_bottom - start_x_bottom) / num_points_to_interpolate
            diff_y_bottom = (end_y_bottom - start_y_bottom) / num_points_to_interpolate

            new_point_x_top.append(start_x_top)
            new_point_x_bottom.append(start_x_bottom)
            new_point_y_top.append(start_y_top)
            new_point_y_bottom.append(start_y_bottom)

            for num_pt in range(1, num_points_to_interpolate):
                new_point_x_top.append(int(start_x_top + diff_x_top * num_pt))
                new_point_x_bottom.append(int(start_x_bottom + diff_x_bottom * num_pt))
                new_point_y_top.append(int(start_y_top + diff_y_top * num_pt))
                new_point_y_bottom.append(int(start_y_bottom + diff_y_bottom * num_pt))
        new_point_x_top.append(points_x_top[-1])
        new_point_y_top.append(points_y_top[-1])
        new_point_x_bottom.append(points_x_bottom[-1])
        new_point_y_bottom.append(points_y_bottom[-1])

        len_section_for_single_char = (len(new_point_x_top) - 1) / len(self.transcription)

        for c in range(len(self.transcription)):
            center_x = (
                new_point_x_top[int(c * len_section_for_single_char)]
                + new_point_x_top[int((c + 1) * len_section_for_single_char)]
                + new_point_x_bottom[int(c * len_section_for_single_char)]
                + new_point_x_bottom[int((c + 1) * len_section_for_single_char)]
            ) / 4

            center_y = (
                new_point_y_top[int(c * len_section_for_single_char)]
                + new_point_y_top[int((c + 1) * len_section_for_single_char)]
                + new_point_y_bottom[int(c * len_section_for_single_char)]
                + new_point_y_bottom[int((c + 1) * len_section_for_single_char)]
            ) / 4

            chars.append((center_x, center_y))
        return chars


================================================
FILE: cleval/data.py
================================================
from dataclasses import dataclass, field
from typing import Dict, List, Union

from cleval.utils import harmonic_mean


class MatchReleation:
    ONE_TO_ONE = "one-to-one"
    MANY_TO_ONE = "many-to-one"
    ONE_TO_MANY = "one-to-many"


@dataclass
class CoreStats:
    recall: float = 0.0
    precision: float = 0.0
    hmean: float = 0.0

    num_char_gt: int = 0  # TotalNum for Recall
    num_char_det: int = 0  # TotalNum for Precisiion
    gran_score_recall: float = 0.0
    num_char_tp_recall: int = 0
    gran_score_precision: float = 0.0
    num_char_tp_precision: int = 0

    num_char_fp: int = 0  # false positive


@dataclass
class MatchResult:
    gt_ids: List[int]
    det_ids: List[int]
    match_relation: str  # from MatchRelation

    det: CoreStats = field(default_factory=CoreStats)
    e2e: CoreStats = field(default_factory=CoreStats)


@dataclass
class Point:
    x: int
    y: int


@dataclass
class GTBoxResult:
    id: int
    points: List[Point]
    pccs: List[Point]
    orientation: Union[None, str]
    letters: str
    is_dc: bool


@dataclass
class DetBoxResult:
    id: int
    points: List[Point]
    orientation: Union[None, str]
    letters: str


@dataclass
class Stats:
    det: CoreStats = field(default_factory=CoreStats)
    e2e: CoreStats = field(default_factory=CoreStats)

    # split-merge cases
    num_splitted: int = 0
    num_merged: int = 0
    num_char_overlapped: int = 0

    # orientation evaluation
    ori_acc: float = 0.0
    num_ori_total: int = 0
    num_ori_correct: int = 0


@dataclass
class SampleResult:
    matches: List[MatchResult]
    gts: List[GTBoxResult]
    preds: List[DetBoxResult]
    stats: Stats = field(default_factory=Stats)
    image_id: Union[int, None] = None


@dataclass
class GlobalResult:
    """Object that holds each record of all samples."""

    dataset_inform: Dict = field(default_factory=dict)
    sample_results: List[SampleResult] = field(default_factory=list)
    stats: Stats = field(default_factory=Stats)


def accumulate_result(
    global_res: GlobalResult,
    sample_res: SampleResult,
    is_e2e: bool,
    dump_sample_res: bool = False,
):
    if dump_sample_res:
        global_res.sample_results.append(sample_res)
    accumulate_stats(global_res.stats, sample_res.stats, is_e2e)


def accumulate_stats(stats1: Stats, stats2: Stats, is_e2e: bool):
    """Accumulate core stats exclude ori_acc."""
    stats1.num_splitted += stats2.num_splitted
    stats1.num_merged += stats2.num_merged
    stats1.num_char_overlapped += stats2.num_char_overlapped
    stats1.num_ori_total += stats2.num_ori_total
    stats1.num_ori_correct += stats2.num_ori_correct

    accumulate_core_stats(stats1.det, stats2.det)
    if is_e2e:
        accumulate_core_stats(stats1.e2e, stats2.e2e)


def accumulate_core_stats(stats1: CoreStats, stats2: CoreStats):
    """Accumulate core stats exclude recall, precision, and hmean."""
    stats1.num_char_gt += stats2.num_char_gt
    stats1.num_char_det += stats2.num_char_det
    stats1.gran_score_recall += stats2.gran_score_recall
    stats1.num_char_tp_recall += stats2.num_char_tp_recall
    stats1.gran_score_precision += stats2.gran_score_precision
    stats1.num_char_tp_precision += stats2.num_char_tp_precision
    stats1.num_char_fp += stats2.num_char_fp


def calculate_global_rph(res: GlobalResult, is_e2e: bool):
    calculate_rph(res.stats.det)
    if is_e2e:
        calculate_rph(res.stats.e2e)


def calculate_rph(stats: CoreStats):
    total_gt = stats.num_char_gt
    total_det = stats.num_char_det
    tp_gt = stats.num_char_tp_recall
    gran_gt = stats.gran_score_recall
    tp_det = stats.num_char_tp_precision
    gran_det = stats.gran_score_precision

    # Sample Score : Character correct length - Granularity Penalty
    recall = 0.0 if total_gt == 0 else max(0.0, tp_gt - gran_gt) / total_gt
    precision = 0.0 if total_det == 0 else max(0.0, tp_det - gran_det) / total_det
    hmean = harmonic_mean(recall, precision)
    stats.recall = recall
    stats.precision = precision
    stats.hmean = hmean


================================================
FILE: cleval/eval_functions.py
================================================
from dataclasses import dataclass
from typing import List

import numpy as np
from numba import njit
from numpy.typing import NDArray

from cleval.data import (
    DetBoxResult,
    GTBoxResult,
    MatchReleation,
    MatchResult,
    Point,
    SampleResult,
)
from cleval.utils import harmonic_mean, lcs


@dataclass
class EvalMaterial:
    """EvalMaterial Dataclass
    These are used for calculating eval results.
    """

    gt_pcc_points: List[List]  # [gt_idx][pcc_idx] nested list which has variable length
    pcc_mat_list: List[NDArray]  # list of pcc_mat which has (len_det, len_pcc) shape.
    pcc_mat_sum: NDArray[np.int16]  # (len_gt, len_det)
    ap_mat: NDArray[np.float32]  # (len_gt, len_det)
    ap_mat_binary: NDArray[bool]  # (len_gt, len_det)
    ap_constraint: float
    gt_valid_indices: set
    det_valid_indices: set
    len_gt: int
    len_det: int


def evaluation(args, gt_boxes, det_boxes, scale_range=(0.0, 1.0)):
    """main evaluation function

    Notes:
        Abbreviations for variable names.
         - ap: area precision (not average precision)
         - thresh: threshold
         - pcc: pseudo char center
         - mat: matrix
         - res: result
         - dc: don't care
         - fp: false positive
         - tran: transcription
    """
    # prepare gt, det
    gt_dc_indices, gt_pcc_points = prepare_gt(
        gt_boxes, args.CASE_SENSITIVE, args.VERTICAL_ASPECT_RATIO_THRESH, scale_range
    )
    prepare_det(det_boxes, args.CASE_SENSITIVE)
    len_gt = len(gt_boxes)
    len_det = len(det_boxes)

    # calc area_precision
    ap_constraint = args.AREA_PRECISION_CONSTRAINT
    ap_mat, ap_mat_binary = calc_area_precision(gt_boxes, det_boxes, ap_constraint)

    # calc pcc inclusion
    pcc_mat_list, pcc_mat_sum = calc_pcc_inclusion(det_boxes, gt_pcc_points)

    # prepare valid indices
    det_dc_indices = get_det_dc_indices(gt_dc_indices, pcc_mat_sum, ap_mat, ap_mat_binary, ap_constraint, len_det)
    gt_valid_indices = set(range(len_gt)) - gt_dc_indices
    det_valid_indices = set(range(len_det)) - det_dc_indices

    # construct eval material
    eval_material = EvalMaterial(
        gt_pcc_points,
        pcc_mat_list,
        pcc_mat_sum,
        ap_mat,
        ap_mat_binary,
        ap_constraint,
        gt_valid_indices,
        det_valid_indices,
        len_gt,
        len_det,
    )

    # Matching process
    match_mat, match_results = calc_match_matrix(eval_material)

    # Prepare sample_result
    gt_results, det_results = get_box_results(gt_boxes, gt_pcc_points, det_boxes)
    sample_res = SampleResult(match_results, gt_results, det_results)

    # Evaluation Process
    eval_det(args, sample_res, gt_boxes, det_boxes, eval_material, match_mat)

    if args.E2E:
        eval_e2e(args, sample_res, gt_boxes, det_boxes, eval_material, match_mat)

    if args.ORIENTATION:
        eval_orientation(sample_res, gt_boxes, det_boxes, gt_valid_indices, match_mat)

    return sample_res


def prepare_gt(gt_boxes, is_case_sensitive, vertical_aspect_ratio_thresh, scale_range):
    """prepare ground-truth boxes in evaluation format."""
    gt_dc_indices = set()  # fast check via using set (hash-table)
    gt_pcc_points = []
    for gt_idx, gt_box in enumerate(gt_boxes):
        if not is_case_sensitive:
            gt_box.transcription = gt_box.transcription.upper()

        if gt_box.is_dc or (gt_box.scale is not None and not scale_range[0] <= gt_box.scale <= scale_range[1]):
            gt_dc_indices.add(gt_idx)
        gt_pcc_point = gt_box.pseudo_character_center(vertical_aspect_ratio_thresh)
        gt_pcc_points.append(gt_pcc_point)

    # subtract overlapping gt area from don't care boxes
    # Area(Don't care) - Area(Ground Truth):
    for dc_idx in gt_dc_indices:
        for idx in range(len(gt_boxes)):
            if idx in gt_dc_indices:
                continue
            if gt_boxes[idx] & gt_boxes[dc_idx] > 0:
                # TODO: Consider PCC exclusion for area overlapped with don't care.
                gt_boxes[dc_idx].subtract(gt_boxes[idx])
    return gt_dc_indices, gt_pcc_points


def prepare_det(det_boxes, is_case_sensitive):
    """prepare detection results in evaluation format."""
    for det_idx, det_box in enumerate(det_boxes):
        if not is_case_sensitive:
            det_box.transcription = det_box.transcription.upper()


def calc_area_precision(gt_boxes, det_boxes, ap_constraint):
    """calculate area precision between each GTbox and DETbox
    Args:
        gt_boxes(List[Box]): list of gt boxes
        det_boxes(List[Box]): list of det boxes
        ap_constraint(float): area precision contstraint

    Returns:
        ap_mat(NDArray[float32]): area precision matrix
        ap_mat_binary(NDArray[bool]): boolean mat that area precision >= ap_constraint

    """
    ap_mat = np.zeros([len(gt_boxes), len(det_boxes)], dtype=np.float32)

    for gt_idx, gt_box in enumerate(gt_boxes):
        for det_idx, det_box in enumerate(det_boxes):
            intersected_area = gt_box & det_box
            det_area = det_box.area()
            if det_area > 0.0:
                ap_mat[gt_idx, det_idx] = intersected_area / det_area
    ap_mat_binary = ap_mat >= ap_constraint
    return ap_mat, ap_mat_binary


def calc_pcc_inclusion(det_boxes, gt_pcc_points):
    """fill PCC counting matrix by iterating each GTbox and DETbox"""
    len_gt = len(gt_pcc_points)
    len_det = len(det_boxes)
    pcc_mat_list = []
    pcc_mat_sum = np.zeros((len_gt, len_det), dtype=np.int16)

    for gt_idx, gt_word_pccs in enumerate(gt_pcc_points):
        len_pcc = len(gt_word_pccs)
        pcc_mat = np.zeros((len_det, len_pcc), dtype=bool)

        for det_idx, det_box in enumerate(det_boxes):
            for pcc_idx, pcc_point in enumerate(gt_word_pccs):
                if det_box.is_inside(pcc_point[0], pcc_point[1]):
                    pcc_mat[det_idx, pcc_idx] = True
                    pcc_mat_sum[gt_idx, det_idx] += 1

        pcc_mat_list.append(pcc_mat)
    return pcc_mat_list, pcc_mat_sum


def get_det_dc_indices(gt_dc_indices, pcc_mat_sum, ap_mat, ap_mat_binary, ap_constraint, len_det):
    """Filter detection Don't care boxes"""
    det_dc_indices = set()
    if len(gt_dc_indices) > 0:
        for det_idx in range(len_det):
            ap_sum = 0
            for gt_idx in gt_dc_indices:
                if ap_mat_binary[gt_idx, det_idx]:
                    det_dc_indices.add(det_idx)
                    break
                if pcc_mat_sum[gt_idx, det_idx] > 0:
                    ap_sum += ap_mat[gt_idx, det_idx]
            if ap_sum >= ap_constraint:
                det_dc_indices.add(det_idx)
    return det_dc_indices


def calc_match_matrix(eval_material):
    """Calculate match matrix with PCC counting matrix information."""
    em = eval_material
    match_results = []
    match_mat = np.zeros([em.len_gt, em.len_det], dtype=bool)

    # one-to-one match
    for gt_idx in em.gt_valid_indices:
        for det_idx in em.det_valid_indices:
            is_matched = one_to_one_match(em.pcc_mat_sum, gt_idx, det_idx, em.ap_mat_binary, em.len_gt, em.len_det)
            if is_matched:
                match_result = MatchResult(
                    gt_ids=[gt_idx],
                    det_ids=[det_idx],
                    match_relation=MatchReleation.ONE_TO_ONE,
                )
                match_results.append(match_result)

    # one-to-many match
    for gt_idx in em.gt_valid_indices:
        det_valid_indices_np = np.array(list(em.det_valid_indices), dtype=np.int16)
        is_matched, matched_det_indices = one_to_many_match(
            em.pcc_mat_sum, gt_idx, em.ap_mat_binary, det_valid_indices_np
        )
        if is_matched:
            match_result = MatchResult(
                gt_ids=[gt_idx],
                det_ids=matched_det_indices,
                match_relation=MatchReleation.ONE_TO_MANY,
            )
            match_results.append(match_result)

    # many-to-one match
    for det_idx in em.det_valid_indices:
        gt_valid_indices_np = np.array(list(em.gt_valid_indices), dtype=np.int16)
        is_matched, matched_gt_indices = many_to_one_match(
            em.pcc_mat_sum, det_idx, em.ap_mat, em.ap_constraint, gt_valid_indices_np
        )
        if is_matched:
            match_result = MatchResult(
                gt_ids=matched_gt_indices,
                det_ids=[det_idx],
                match_relation=MatchReleation.MANY_TO_ONE,
            )
            match_results.append(match_result)

    for match_result in match_results:
        match_mat[match_result.gt_ids, match_result.det_ids] = True

    # clear pcc count flag for not matched pairs
    for gt_idx in range(em.len_gt):
        for det_idx in range(em.len_det):
            if match_mat[gt_idx, det_idx]:
                continue
            for pcc_idx in range(len(em.gt_pcc_points[gt_idx])):
                em.pcc_mat_sum[gt_idx, det_idx] -= em.pcc_mat_list[gt_idx][det_idx, pcc_idx]
                em.pcc_mat_list[gt_idx][det_idx, pcc_idx] = 0
    return match_mat, match_results


@njit
def one_to_one_match(pcc_mat_sum, gt_idx, det_idx, ap_mat_binary, len_gt, len_det):
    """One-to-One match condition"""
    match_counter = 0
    for i in range(len_det):
        if ap_mat_binary[gt_idx, i] and pcc_mat_sum[gt_idx, i] > 0:
            match_counter += 1
            if match_counter >= 2:
                break
    if match_counter != 1:
        return False

    match_counter = 0
    for i in range(len_gt):
        if ap_mat_binary[i, det_idx] and pcc_mat_sum[i, det_idx] > 0:
            match_counter += 1
            if match_counter >= 2:
                break
    if match_counter != 1:
        return False

    if ap_mat_binary[gt_idx, det_idx] and pcc_mat_sum[gt_idx, det_idx] > 0:
        return True
    return False


@njit
def one_to_many_match(pcc_mat_sum, gt_idx, ap_mat_binary, det_valid_indices):
    """One-to-Many match condition"""
    many_sum = 0
    matched_det_indices = []
    for det_idx in det_valid_indices:
        if ap_mat_binary[gt_idx, det_idx] and pcc_mat_sum[gt_idx, det_idx] > 0:
            many_sum += pcc_mat_sum[gt_idx, det_idx]
            matched_det_indices.append(det_idx)

    if many_sum > 0 and len(matched_det_indices) >= 2:
        return True, matched_det_indices
    else:
        return False, matched_det_indices


@njit
def many_to_one_match(pcc_mat_sum, det_idx, ap_mat, ap_constraint, gt_valid_indices):
    """Many-to-One match condition"""
    many_sum = 0
    matched_gt_indices = []
    for gt_idx in gt_valid_indices:
        if pcc_mat_sum[gt_idx, det_idx] > 0:
            many_sum += ap_mat[gt_idx, det_idx]
            matched_gt_indices.append(gt_idx)
    if many_sum >= ap_constraint and len(matched_gt_indices) >= 2:
        return True, matched_gt_indices
    else:
        return False, matched_gt_indices


def get_box_results(gt_boxes, gt_pcc_points, det_boxes):
    gt_results = []
    for gt_idx, gt_box in enumerate(gt_boxes):
        gt = GTBoxResult(
            id=gt_idx,
            points=__points_to_result(gt_box.points),
            pccs=__pccs_to_result(gt_pcc_points[gt_idx]),
            orientation=gt_box.orientation,
            letters=gt_box.transcription,
            is_dc=gt_box.is_dc,
        )
        gt_results.append(gt)

    det_results = []
    for det_idx, det_box in enumerate(det_boxes):
        det = DetBoxResult(
            id=det_idx,
            points=__points_to_result(det_box.points),
            orientation=det_box.orientation,
            letters=det_box.transcription,
        )
        det_results.append(det)

    return gt_results, det_results


def __points_to_result(points):
    points = np.array(points, dtype=np.int16).reshape(-1, 2)
    new_points = [Point(int(round(pt[0])), int(round(pt[1]))) for pt in points]
    return new_points


def __pccs_to_result(pcc_points):
    return [Point(int(round(pt[0])), int(round(pt[1]))) for pt in pcc_points]


def eval_det(args, sample_res, gt_boxes, det_boxes, eval_material, match_mat):
    stats = sample_res.stats
    em = eval_material

    # res_mat has +2 size for granuarity penalty and summation of matrix
    res_mat = np.zeros([em.len_gt + 2, em.len_det + 2], dtype=np.float32)

    match_mat_gts_sum = match_mat.sum(axis=0)
    match_mat_dets_sum = match_mat.sum(axis=1)
    pcc_checked = [np.zeros(len(pccs), dtype=bool) for pccs in em.gt_pcc_points]

    # Precision score
    for det_idx in em.det_valid_indices:
        if match_mat_gts_sum[det_idx] > 0:
            matched_gt_indices = np.where(match_mat[:, det_idx])[0]
            if len(matched_gt_indices) > 1:
                stats.num_merged += 1

            for gt_idx in matched_gt_indices:
                pcc_indices = np.where(em.pcc_mat_list[gt_idx][det_idx])[0]
                for pcc_idx in pcc_indices:
                    if not pcc_checked[gt_idx][pcc_idx]:
                        pcc_checked[gt_idx][pcc_idx] = True
                        res_mat[-2, det_idx] += 1  # for total score
                        res_mat[gt_idx, det_idx] += 1
                    else:
                        stats.num_char_overlapped += 1
            gran_weight = args.PRECISION_GRANULARITY_PENALTY_WEIGHT
            res_mat[-1, det_idx] = get_gran_score(len(matched_gt_indices), gran_weight)

    # Recall score
    for gt_idx in em.gt_valid_indices:
        found_gt_chars = 0
        if match_mat_dets_sum[gt_idx] > 0:
            matched_det_indices = np.where(match_mat[gt_idx] > 0)[0]
            if len(matched_det_indices) > 1:
                stats.num_splitted += 1

            found_gt_chars = np.sum(pcc_checked[gt_idx])
            gran_weight = args.RECALL_GRANULARITY_PENALTY_WEIGHT
            res_mat[gt_idx, -1] = get_gran_score(len(matched_det_indices), gran_weight)
        res_mat[gt_idx, -2] = found_gt_chars

    # Calculate precision / recall
    num_char_gt, num_char_det = get_num_total_char(gt_boxes, em.pcc_mat_sum, em.gt_valid_indices, em.det_valid_indices)
    num_char_fp = get_num_fp_char(det_boxes, em.det_valid_indices, match_mat_gts_sum)
    num_char_det += num_char_fp
    extract_stats(sample_res.stats.det, num_char_fp, num_char_gt, num_char_det, res_mat)

    # Calculate match-wise eval out
    if args.DUMP_SAMPLE_RESULT:
        for match_res in sample_res.matches:
            gt_ids = match_res.gt_ids
            det_ids = match_res.det_ids
            num_char_gt, num_char_det = get_num_total_char(gt_boxes, em.pcc_mat_sum, gt_ids, det_ids)
            num_char_fp = get_num_fp_char(det_boxes, det_ids, match_mat_gts_sum)
            num_char_det += num_char_fp
            extract_stats(match_res.det, num_char_fp, num_char_gt, num_char_det, res_mat)


def get_num_total_char(gt_boxes, pcc_mat_sum, gt_valid_indices, det_valid_indices):
    """get TotalNum for detection evaluation."""
    num_char_gt = 0
    num_char_det = 0
    for gt_idx, gt_box in enumerate(gt_boxes):
        if gt_idx in gt_valid_indices:
            num_char_gt += len(gt_box.transcription)
        num_char_det += np.sum(pcc_mat_sum[gt_idx][list(det_valid_indices)])
    return num_char_gt, num_char_det


def get_num_fp_char(det_boxes, det_valid_indices, match_mat_gts_sum):
    """get FalsePositive for detection evaluation."""
    fp_char_counts = 0
    for det_idx in det_valid_indices:
        # no match with any GTs && not matched with don't care
        if match_mat_gts_sum[det_idx] == 0:
            fp_char_count = round(0.5 + 1 / (1e-5 + det_boxes[det_idx].aspect_ratio()))
            fp_char_counts += min(fp_char_count, 10)
    return fp_char_counts


def eval_e2e(args, sample_res, gt_boxes, det_boxes, eval_material, match_mat):
    gt_trans = [box.transcription for box in gt_boxes]
    det_trans = [box.transcription for box in det_boxes]
    gt_trans_not_found = [box.transcription for box in gt_boxes]
    det_trans_not_found = [box.transcription for box in det_boxes]

    em = eval_material
    stats = sample_res.stats

    # +2 size for granuarity penalty and summation of matrix
    res_mat = np.zeros([em.len_gt + 2, em.len_det + 2], dtype=np.float32)

    match_mat_gts_sum = match_mat.sum(axis=0)
    match_mat_dets_sum = match_mat.sum(axis=1)

    # Recall score
    for gt_idx in em.gt_valid_indices:
        if match_mat_dets_sum[gt_idx] > 0:
            matched_det_indices = np.where(match_mat[gt_idx])[0]
            sorted_det_indices = sort_detbox_order_by_pcc(
                gt_idx, matched_det_indices, em.gt_pcc_points, em.pcc_mat_list
            )
            corrected_num_chars = lcs_elimination(
                gt_trans,
                gt_trans_not_found,
                det_trans_not_found,
                gt_idx,
                sorted_det_indices,
            )
            res_mat[gt_idx, -2] = corrected_num_chars
            gran_weight = args.RECALL_GRANULARITY_PENALTY_WEIGHT
            res_mat[gt_idx, -1] = get_gran_score(len(matched_det_indices), gran_weight)

    # Precision score
    for det_idx in em.det_valid_indices:
        if match_mat_gts_sum[det_idx] > 0:
            matched_gt_indices = np.where(match_mat[:, det_idx])[0]
            gran_weight = args.PRECISION_GRANULARITY_PENALTY_WEIGHT
            res_mat[-1, det_idx] = get_gran_score(len(matched_gt_indices), gran_weight)
        res_mat[-2, det_idx] = len(det_trans[det_idx]) - len(det_trans_not_found[det_idx])

    num_char_det = sum([len(det_trans[i]) for i in em.det_valid_indices])
    num_char_fp = num_char_det - np.sum(res_mat[-2])
    extract_stats(stats.e2e, num_char_fp, stats.det.num_char_gt, num_char_det, res_mat)

    if args.DUMP_SAMPLE_RESULT:
        for match_res in sample_res.matches:
            det_ids = match_res.det_ids
            num_char_det = sum([len(det_trans[i]) for i in det_ids])
            num_char_fp = num_char_det - np.sum(res_mat[-2][det_ids])
            num_char_gt = match_res.det.num_char_gt
            extract_stats(match_res.e2e, num_char_fp, num_char_gt, num_char_det, res_mat)


def sort_detbox_order_by_pcc(gt_idx, matched_det_indices, gt_pcc_points, pcc_mat_list):
    """sort detected box order by pcc information."""
    unordered = matched_det_indices.tolist()  # deepcopy
    ordered_indices = []

    char_len = len(gt_pcc_points[gt_idx])
    for pcc_idx in range(char_len):
        if len(unordered) == 1:
            break

        for det_idx in unordered:
            if pcc_mat_list[gt_idx][det_idx, pcc_idx]:
                ordered_indices.append(det_idx)
                unordered.remove(det_idx)
                break

    ordered_indices.append(unordered[0])
    return ordered_indices


def lcs_elimination(gt_trans, gt_trans_not_found, det_trans_not_found, gt_idx, sorted_det_indices):
    """longest common sequence elimination by sorted detection boxes"""
    target_string = "".join(det_trans_not_found[i] for i in sorted_det_indices)
    lcs_length, lcs_string = lcs(gt_trans[gt_idx], target_string)

    for char in lcs_string:
        gt_trans_not_found[gt_idx] = gt_trans_not_found[gt_idx].replace(char, "", 1)

        for det_idx in sorted_det_indices:
            det_tran = det_trans_not_found[det_idx]
            if not det_tran.find(char) < 0:
                det_trans_not_found[det_idx] = det_tran.replace(char, "", 1)
                break
    return lcs_length


def eval_orientation(sample_res, gt_boxes, det_boxes, gt_valid_indices, match_mat):
    gt_query = [box.orientation for box in gt_boxes]
    det_query = [box.orientation for box in det_boxes]

    match_mat_dets_sum = match_mat.sum(axis=1)
    counter = 0
    num_ori_correct = 0
    stats = sample_res.stats

    for gt_idx in gt_valid_indices:
        if match_mat_dets_sum[gt_idx] > 0:
            matched_det_indices = np.where(match_mat[gt_idx])[0]
            counter += 1
            count_size = 0 if len(matched_det_indices) else 1 / len(matched_det_indices)
            for det_idx in matched_det_indices:
                if gt_query[gt_idx] == det_query[det_idx]:
                    num_ori_correct += count_size
    if counter != 0:
        stats.num_ori_total = counter
        stats.num_ori_correct = num_ori_correct
        stats.ori_acc = num_ori_correct / counter


def extract_stats(core_stats, num_char_fp, num_char_gt, num_char_det, res_mat):
    core_stats.num_char_fp = int(num_char_fp)
    core_stats.num_char_gt = total_gt = int(num_char_gt)
    core_stats.num_char_det = total_det = int(num_char_det)
    core_stats.num_char_tp_recall = tp_gt = int(np.sum(res_mat[-2]))
    core_stats.gran_score_recall = gran_gt = float(np.sum(res_mat[:, -1]))
    core_stats.num_char_tp_precision = tp_det = int(np.sum(res_mat[-2]))
    core_stats.gran_score_precision = gran_det = float(np.sum(res_mat[-1]))

    # Sample Score : Character correct length - Granularity Penalty
    recall = 0.0 if total_gt == 0 else max(0.0, tp_gt - gran_gt) / total_gt
    precision = 0.0 if total_det == 0 else max(0.0, tp_det - gran_det) / total_det
    hmean = harmonic_mean(recall, precision)
    core_stats.recall = recall
    core_stats.precision = precision
    core_stats.hmean = hmean


@njit
def get_gran_score(num_splitted, penalty_weight):
    """get granularity penalty given number of how many splitted"""
    return max(num_splitted - 1, 0) * penalty_weight


================================================
FILE: cleval/main.py
================================================
import os
import re
import time
from concurrent.futures import ProcessPoolExecutor, as_completed
from dataclasses import asdict
from pprint import pprint

from tqdm import tqdm

from cleval.arg_parser import get_params
from cleval.box_types import POLY, QUAD, Box
from cleval.data import GlobalResult, accumulate_result, calculate_global_rph
from cleval.eval_functions import evaluation
from cleval.utils import (
    convert_ltrb2quad,
    decode_utf8,
    dump_json,
    load_zip_file,
    ltrb_regex_match,
    quad_regex_match,
)
from cleval.validation import (
    validate_data,
    validate_min_max_bounds,
    validate_point_inside_bounds,
)


def main():
    """Also used by cli"""
    start_t = time.perf_counter()
    args = get_params()

    if args.PROFILE:
        assert args.DEBUG, "DEBUG mode should be turned on for PPROFILE."
        import pprofile

        prof = pprofile.Profile()
        with prof():
            res_dict = cleval(args)
        prof.print_stats()
    else:
        res_dict = cleval(args)
    end_t = time.perf_counter()
    print(f"CLEval total duration...{end_t - start_t}s")
    pprint(res_dict)


def cleval(args):
    """This process validates a method, evaluates it.
    If it succeeds, generates a ZIP file with a JSON entry for each sample.
    """
    validate_data(args.GT_PATHS[0], args.SUBMIT_PATHS[0], args.CRLF)

    global_res = GlobalResult()
    gt_zipfile = args.GT_PATHS[0]
    submit_zipfile = args.SUBMIT_PATHS[0]
    gt_files, det_files, file_indices = get_file_paths(gt_zipfile, submit_zipfile)

    with tqdm(total=len(gt_files), disable=not args.VERBOSE) as pbar:
        pbar.set_description("Integrating results...")
        if args.DEBUG or args.NUM_WORKERS <= 1:
            for gt_file, det_file, file_idx in zip(gt_files, det_files, file_indices):
                sample_res = eval_single(args, gt_file, det_file, file_idx)
                accumulate_result(global_res, sample_res, args.E2E, args.DUMP_SAMPLE_RESULT)
                pbar.update(1)
        else:
            futures = []
            executor = ProcessPoolExecutor(max_workers=args.NUM_WORKERS)
            for gt_file, det_file, file_idx in zip(gt_files, det_files, file_indices):
                future = executor.submit(eval_single, args, gt_file, det_file, file_idx)
                futures.append(future)

            for future in as_completed(futures):
                sample_res = future.result()
                accumulate_result(global_res, sample_res, args.E2E, args.DUMP_SAMPLE_RESULT)
                pbar.update(1)

            executor.shutdown()

    # Calculate global recall, precision, hmean after accumulate all sample-results.
    calculate_global_rph(global_res, args.E2E)

    res_dict = {"all": asdict(global_res.stats)}
    dump_path = os.path.join(args.OUTPUT_PATH, "results.json")
    dump_json(dump_path, res_dict)

    if args.DUMP_SAMPLE_RESULT:
        dump_path = os.path.join(args.OUTPUT_PATH, f"sample_wise.json")
        dump_json(dump_path, asdict(global_res))

    if args.VERBOSE:
        pprint("Calculated!")
        pprint(res_dict)

    return res_dict


def get_file_paths(gt_zipfile, submit_zipfile):
    gt_zipfile_loaded = load_zip_file(gt_zipfile)
    submission_zipfile_loaded = load_zip_file(submit_zipfile)
    gt_files, det_files, file_indices = [], [], []

    for file_idx in gt_zipfile_loaded:
        gt_file = decode_utf8(gt_zipfile_loaded[file_idx])

        if file_idx in submission_zipfile_loaded:
            det_file = decode_utf8(submission_zipfile_loaded[file_idx])
            if det_file is None:
                det_file = ""
        else:
            det_file = ""

        gt_files.append(gt_file)
        det_files.append(det_file)
        file_indices.append(file_idx)
    return gt_files, det_files, file_indices


def eval_single(args, gt_file, det_file, file_id):
    gt_boxes = parse_single_file(gt_file, args.CRLF, True, False, box_type=args.BOX_TYPE)
    det_boxes = parse_single_file(
        det_file,
        args.CRLF,
        args.TRANSCRIPTION,
        args.CONFIDENCES,
        box_type=args.BOX_TYPE,
    )
    sample_res = evaluation(args, gt_boxes, det_boxes)
    sample_res.img_id = file_id
    return sample_res


def parse_single_file(
    content,
    has_crlf=True,
    with_transcription=False,
    with_confidence=False,
    img_width=0,
    img_height=0,
    sort_by_confidences=True,
    box_type="QUAD",
):
    """Returns all points, confindences and transcriptions of a file in lists.

    valid line formats:
        xmin,ymin,xmax,ymax,[confidence],[transcription]
        x1,y1,x2,y2,x3,y3,x4,y4,[confidence],[transcription]
    """
    result_boxes = []
    lines = content.split("\r\n" if has_crlf else "\n")
    for line in lines:
        line = line.replace("\r", "").replace("\n", "")
        if line != "":
            result_box = parse_values_from_single_line(
                line,
                with_transcription,
                with_confidence,
                img_width,
                img_height,
                box_type=box_type,
            )
            result_boxes.append(result_box)

    if with_confidence and len(result_boxes) and sort_by_confidences:
        result_boxes.sort(key=lambda x: x.confidence, reverse=True)

    return result_boxes


def parse_values_from_single_line(
    line,
    with_transcription=False,
    with_confidence=False,
    img_width=0,
    img_height=0,
    box_type="QUAD",
) -> Box:
    """
    Validate the format of the line.
    If the line is not valid an ValueError will be raised.
    If maxWidth and maxHeight are specified, all points must be inside the image bounds.
    Posible values are:
    LTRB=True: xmin,ymin,xmax,ymax[,confidence][,transcription]
    LTRB=False: x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]
    LTRB="POLY": x1,y1,x2,y2,x3,y3,x4,y4[,confidence][,transcription]

    box_type:
        - LTRB: add description
        - QUAD: add description
        - POLY: add description

    Returns values from a textline. Points , [Confidences], [Transcriptions]
    """
    confidence = 0.0
    transcription = ""

    if box_type == "LTRB":
        box_type = QUAD
        num_points = 4
        m = ltrb_regex_match(line, with_transcription, with_confidence)
        xmin = int(m.group(1))
        ymin = int(m.group(2))
        xmax = int(m.group(3))
        ymax = int(m.group(4))

        validate_min_max_bounds(lower_val=xmin, upper_val=xmax)
        validate_min_max_bounds(lower_val=ymin, upper_val=ymax)

        points = [float(m.group(i)) for i in range(1, (num_points + 1))]
        points = convert_ltrb2quad(points)

        if img_width > 0 and img_height > 0:
            validate_point_inside_bounds(xmin, ymin, img_width, img_height)
            validate_point_inside_bounds(xmax, ymax, img_width, img_height)

    elif box_type == "QUAD":
        box_type = QUAD

        num_points = 8
        m = quad_regex_match(line, with_transcription, with_confidence)
        points = [float(m.group(i)) for i in range(1, (num_points + 1))]

        # validate_clockwise_points(points)
        if img_width > 0 and img_height > 0:
            validate_point_inside_bounds(points[0], points[1], img_width, img_height)
            validate_point_inside_bounds(points[2], points[3], img_width, img_height)
            validate_point_inside_bounds(points[4], points[5], img_width, img_height)
            validate_point_inside_bounds(points[6], points[7], img_width, img_height)

    elif box_type == "POLY":
        # TODO: TotalText GT보고 정하기
        # TODO: 이렇게 리턴하는 건 굉장히 위험
        splitted_line = line.split(",")
        tmp_transcription = list()

        if with_transcription:
            tmp_transcription.append(splitted_line.pop())
            while not len("".join(tmp_transcription)):
                tmp_transcription.append(splitted_line.pop())

        if with_confidence:
            if len(splitted_line) % 2 != 0:
                confidence = float(splitted_line.pop())
                points = [float(x) for x in splitted_line]
            else:
                backward_idx = len(splitted_line) - 1
                while backward_idx > 0:
                    if splitted_line[backward_idx].isdigit() and len(splitted_line) % 2 != 0:
                        break
                    tmp_transcription.append(splitted_line.pop())
                    backward_idx -= 1
                confidence = float(splitted_line.pop())
                points = [float(x) for x in splitted_line]
        else:
            if len(splitted_line) % 2 == 0:
                points = [float(x) for x in splitted_line]
            else:
                backward_idx = len(splitted_line) - 1
                while backward_idx > 0:
                    if splitted_line[backward_idx].isdigit():
                        break
                    tmp_transcription.append(splitted_line.pop())
                    backward_idx -= 1
                points = [float(x) for x in splitted_line]

        transcription = ",".join(tmp_transcription)
        return POLY(points, confidence=confidence, transcription=transcription)
    else:
        raise RuntimeError(f"Something is wrong with configuration. Box Type: [{box_type}]")

    # QUAD or LTRB format
    if with_confidence:
        try:
            confidence = float(m.group(num_points + 1))
        except ValueError:
            raise ValueError("Confidence value must be a float")

    if with_transcription:
        pos_transcription = num_points + (2 if with_confidence else 1)
        transcription = m.group(pos_transcription)
        m2 = re.match(r"^\s*\"(.*)\"\s*$", transcription)

        # Transcription with double quotes
        # We extract the value and replace escaped characters
        if m2 is not None:
            transcription = m2.group(1).replace("\\\\", "\\").replace('\\"', '"')

    result_box = box_type(points, confidence=confidence, transcription=transcription)
    return result_box


def parse_jylee_annot(quad, transcription, box_type):
    assert box_type == "QUAD"
    points = [
        quad["x1"],
        quad["y1"],
        quad["x2"],
        quad["y2"],
        quad["x3"],
        quad["y3"],
        quad["x4"],
        quad["y4"],
    ]
    result_box = QUAD(points, confidence=0.0, transcription=transcription)
    return result_box


def parse_clova_ocr(quad, transcription, box_type):
    assert box_type == "QUAD"
    result_box = QUAD(quad, confidence=0.0, transcription=transcription)
    return result_box


if __name__ == "__main__":
    main()


================================================
FILE: cleval/torchmetric.py
================================================
"""
TODO: Support scalewise eval
TODO: Support orientation accuracy
"""

import cv2
import numpy as np
import torch
from torchmetrics import Metric

from cleval.box_types import QUAD
from cleval.data import SampleResult
from cleval.eval_functions import evaluation


class Options:
    def __init__(
        self,
        case_sensitive,
        recall_gran_penalty,
        precision_gran_penalty,
        vertical_aspect_ratio_thresh,
        ap_constraint,
    ):
        self.DUMP_SAMPLE_RESULT = False
        self.E2E = (True,)  # change in runtime. See update function.
        self.ORIENTATION = False
        self.CASE_SENSITIVE = case_sensitive
        self.RECALL_GRANULARITY_PENALTY_WEIGHT = recall_gran_penalty
        self.PRECISION_GRANULARITY_PENALTY_WEIGHT = precision_gran_penalty
        self.VERTICAL_ASPECT_RATIO_THRESH = vertical_aspect_ratio_thresh
        self.AREA_PRECISION_CONSTRAINT = ap_constraint


class CLEvalMetric(Metric):
    full_state_update: bool = False

    def __init__(
        self,
        dist_sync_on_step=False,
        case_sensitive=True,
        recall_gran_penalty=1.0,
        precision_gran_penalty=1.0,
        vertical_aspect_ratio_thresh=0.5,
        ap_constraint=0.3,
        scale_wise=False,
        scale_bins=(0.0, 0.005, 0.01, 0.015, 0.02, 0.025, 0.1, 0.5, 1.0),
        scale_range=(0.0, 1.0),
    ):
        super().__init__(dist_sync_on_step=dist_sync_on_step)
        self.options = Options(
            case_sensitive,
            recall_gran_penalty,
            precision_gran_penalty,
            vertical_aspect_ratio_thresh,
            ap_constraint,
        )
        self.scale_range = scale_range

        self.scalewise_metric = {}
        if scale_wise:
            bin_ranges = [scale_bins[i : i + 2] for i in range(len(scale_bins) - 1)]
            for bin_range in bin_ranges:
                self.scalewise_metric[bin_range] = CLEvalMetric(
                    dist_sync_on_step=dist_sync_on_step,
                    case_sensitive=case_sensitive,
                    recall_gran_penalty=recall_gran_penalty,
                    precision_gran_penalty=precision_gran_penalty,
                    vertical_aspect_ratio_thresh=vertical_aspect_ratio_thresh,
                    ap_constraint=ap_constraint,
                    scale_wise=False,
                    scale_range=bin_range,
                )

        # Detection
        self.add_state("det_num_char_gt", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state("det_num_char_det", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state(
            "det_gran_score_recall",
            torch.tensor(0, dtype=torch.float32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "det_num_char_tp_recall",
            torch.tensor(0, dtype=torch.int32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "det_gran_score_precision",
            torch.tensor(0, dtype=torch.float32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "det_num_char_tp_precision",
            torch.tensor(0, dtype=torch.int32),
            dist_reduce_fx="sum",
        )

        self.add_state("det_num_char_fp", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")

        # E2E
        self.add_state("e2e_num_char_gt", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state("e2e_num_char_det", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state(
            "e2e_gran_score_recall",
            torch.tensor(0, dtype=torch.float32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "e2e_num_char_tp_recall",
            torch.tensor(0, dtype=torch.int32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "e2e_gran_score_precision",
            torch.tensor(0, dtype=torch.float32),
            dist_reduce_fx="sum",
        )
        self.add_state(
            "e2e_num_char_tp_precision",
            torch.tensor(0, dtype=torch.int32),
            dist_reduce_fx="sum",
        )

        self.add_state("e2e_num_char_fp", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")

        # split-merge cases
        self.add_state("num_splitted", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state("num_merged", torch.tensor(0, dtype=torch.int32), dist_reduce_fx="sum")
        self.add_state(
            "num_char_overlapped",
            torch.tensor(0, dtype=torch.int32),
            dist_reduce_fx="sum",
        )

    def to(self, *args, **kwargs):
        super().to(*args, **kwargs)
        for key, metric in self.scalewise_metric.items():
            self.scalewise_metric[key] = metric.to(*args, **kwargs)
        return self

    def update(
        self,
        det_quads,
        gt_quads,
        det_letters=None,
        gt_letters=None,
        gt_is_dcs=None,
        img_longer_length=None,
    ):
        """
        Args:
            det_quads (NDArray[float32]): (N, 8) detected quads
            gt_quads (NDArray[float32]): (N, 8) target quads
            det_letters (List[str]): detected letters
            gt_letters (List[str]): target letters
            gt_is_dcs (List[bool]): is dc gt quad?
            img_longer_length (int): longer length of images
        """
        gt_inps = self.__make_eval_input(gt_quads, gt_letters, gt_is_dcs, img_longer_length)
        det_inps = self.__make_eval_input(det_quads, det_letters)
        self.options.E2E = False if gt_letters is None and det_letters is None else True
        sample_res = evaluation(self.options, gt_inps, det_inps, scale_range=self.scale_range)
        self.__accumulate(sample_res)

        for metric in self.scalewise_metric.values():
            if img_longer_length is None:
                raise ValueError("[img_longer_length] argument should be " "given for scalewise evaluation.")
            metric(
                det_quads,
                gt_quads,
                det_letters,
                gt_letters,
                gt_is_dcs,
                img_longer_length,
            )

    def __make_eval_input(self, quads, letters, is_dcs=None, img_longer_length=None):
        eval_inps = []
        for i in range(len(quads)):
            box_scale = None
            if img_longer_length is not None:
                box_scale = self.__check_box_scale(quads[i], img_longer_length)

            eval_inp = QUAD(
                quads[i],
                confidence=0.0,
                transcription=None if letters is None else letters[i],
                is_dc=None if is_dcs is None else is_dcs[i],
                scale=box_scale,
            )
            eval_inps.append(eval_inp)
        return eval_inps

    @staticmethod
    def __check_box_scale(quad, img_longer_length):
        """The method calculates box scale
        Box scale is defined using the equation: char-height / image-longer size
        The size of a box is defined w.r.t image size, allowing us to judge how sensitive
        the model is to the box scale.
        """
        rect = cv2.minAreaRect(quad.reshape(4, 2))
        quad = cv2.boxPoints(rect)
        quad = np.around(quad)
        box_w = np.linalg.norm(quad[1] - quad[0]) + np.linalg.norm(quad[3] - quad[2])
        box_h = np.linalg.norm(quad[2] - quad[1]) + np.linalg.norm(quad[0] - quad[3])
        box_scale = min(box_w, box_h) / 2 / img_longer_length
        return box_scale

    def __accumulate(self, sample_res: SampleResult):
        self.num_splitted += sample_res.stats.num_splitted
        self.num_merged += sample_res.stats.num_merged
        self.num_char_overlapped += sample_res.stats.num_char_overlapped

        self.det_num_char_gt += sample_res.stats.det.num_char_gt
        self.det_num_char_det += sample_res.stats.det.num_char_det
        self.det_gran_score_recall += sample_res.stats.det.gran_score_recall
        self.det_num_char_tp_recall += sample_res.stats.det.num_char_tp_recall
        self.det_gran_score_precision += sample_res.stats.det.gran_score_precision
        self.det_num_char_tp_precision += sample_res.stats.det.num_char_tp_precision
        self.det_num_char_fp += sample_res.stats.det.num_char_fp

        self.e2e_num_char_gt += sample_res.stats.e2e.num_char_gt
        self.e2e_num_char_det += sample_res.stats.e2e.num_char_det
        self.e2e_gran_score_recall += sample_res.stats.e2e.gran_score_recall
        self.e2e_num_char_tp_recall += sample_res.stats.e2e.num_char_tp_recall
        self.e2e_gran_score_precision += sample_res.stats.e2e.gran_score_precision
        self.e2e_num_char_tp_precision += sample_res.stats.e2e.num_char_tp_precision
        self.e2e_num_char_fp += sample_res.stats.e2e.num_char_fp

    def compute(self):
        det_r, det_p, det_h = self.__calculate_rph(
            self.det_num_char_gt,
            self.det_num_char_det,
            self.det_gran_score_recall,
            self.det_num_char_tp_recall,
            self.det_gran_score_precision,
            self.det_num_char_tp_precision,
        )
        e2e_r, e2e_p, e2e_h = self.__calculate_rph(
            self.e2e_num_char_gt,
            self.e2e_num_char_det,
            self.e2e_gran_score_recall,
            self.e2e_num_char_tp_recall,
            self.e2e_gran_score_precision,
            self.e2e_num_char_tp_precision,
        )
        return_dict = {
            "det_r": det_r,
            "det_p": det_p,
            "det_h": det_h,
            "e2e_r": e2e_r,
            "e2e_p": e2e_p,
            "e2e_h": e2e_h,
            "num_splitted": self.num_splitted,
            "num_merged": self.num_merged,
            "num_char_overlapped": self.num_char_overlapped,
            "scale_wise": {},
        }

        for scale_bin, metric in self.scalewise_metric.items():
            return_dict["scale_wise"][scale_bin] = metric.compute()

        return return_dict

    def reset(self):
        super().reset()
        for metric in self.scalewise_metric.values():
            metric.reset()

    def __calculate_rph(
        self,
        num_char_gt,
        num_char_det,
        gran_score_recall,
        num_char_tp_recall,
        gran_score_precision,
        num_char_tp_precision,
    ):
        total_gt = num_char_gt
        total_det = num_char_det
        gran_gt = gran_score_recall
        tp_gt = num_char_tp_recall
        gran_det = gran_score_precision
        tp_det = num_char_tp_precision

        # Sample Score : Character correct length - Granularity Penalty
        recall = 0.0 if total_gt == 0 else max(0.0, tp_gt - gran_gt) / total_gt
        precision = 0.0 if total_det == 0 else max(0.0, tp_det - gran_det) / total_det
        hmean = self.harmonic_mean(recall, precision)
        return recall, precision, hmean

    def harmonic_mean(self, score1, score2):
        """get harmonic mean value"""
        if score1 + score2 == 0:
            return torch.tensor(0, dtype=torch.float32, device=self.device)
        else:
            return (2 * score1 * score2) / (score1 + score2)


================================================
FILE: cleval/utils.py
================================================
import codecs
import json
import re
import subprocess
import zipfile

from numba import njit


def load_zip_file(file):
    """
    Returns an array with the contents (filtered by fileNameRegExp) of a ZIP file.
    all_entries validates that all entries in the ZIP file pass the fileNameRegExp
    """
    archive = zipfile.ZipFile(file, mode="r", allowZip64=True)

    pairs = dict()
    for name in archive.namelist():
        key_name = (
            name.replace("gt_", "").replace("res_", "").replace(".txt", "").replace(".json", "").replace(".jpg", "")
        )
        pairs[key_name] = archive.read(name)
    return pairs


def decode_utf8(raw):
    """
    Returns a Unicode object
    """
    raw = codecs.decode(raw, "utf-8", "replace")

    # extracts BOM if exists
    raw = raw.encode("utf8")
    if raw.startswith(codecs.BOM_UTF8):
        raw = raw.replace(codecs.BOM_UTF8, b"", 1)
    return raw.decode("utf-8")


def dump_json(json_file_path, json_data):
    with open(json_file_path, "w", encoding="utf-8") as f:
        json.dump(json_data, f)


def read_json(json_file_path):
    with open(json_file_path, "r", encoding="utf-8") as f:
        json_data = json.load(f)
    return json_data


def convert_ltrb2quad(points):
    """Convert point format from LTRB to QUAD"""
    new_points = [
        points[0],
        points[1],
        points[2],
        points[1],
        points[2],
        points[3],
        points[0],
        points[3],
    ]
    return new_points


def ltrb_regex_match(line, with_transcription, with_confidence):
    if with_transcription and with_confidence:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*"
            r",\s*(-?[0-9]+)\s*"
            r",\s*([0-9]+)\s*"
            r",\s*([0-9]+)\s*"
            r",\s*([0-1].?[0-9]*)\s*,(.*)$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. " "Should be: xmin,ymin,xmax,ymax,confidence,transcription")
    elif with_confidence:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*," r"\s*(-?[0-9]+)\s*," r"\s*([0-9]+)\s*," r"\s*([0-9]+)\s*," r"\s*([0-1].?[0-9]*)\s*$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: xmin,ymin,xmax,ymax,confidence")
    elif with_transcription:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*," r"\s*(-?[0-9]+)\s*," r"\s*([0-9]+)\s*," r"\s*([0-9]+)\s*,(.*)$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: xmin,ymin,xmax,ymax,transcription")
    else:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*," r"\s*(-?[0-9]+)\s*," r"\s*([0-9]+)\s*," r"\s*([0-9]+)\s*,?\s*$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: xmin,ymin,xmax,ymax")
    return m


def quad_regex_match(line, with_transcription, with_confidence):
    if with_transcription and with_confidence:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*([0-1].?[0-9]*)\s*,(.*)$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. " "Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence,transcription")
    elif with_confidence:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*([0-1].?[0-9]*)\s*$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,confidence")
    elif with_transcription:
        m = re.match(
            r"^\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,(.*)$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4,transcription")
    else:
        if line[-1] == ",":
            line = line[:-1]
        m = re.match(
            r"^\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*,"
            r"\s*(-?[0-9]+)\s*$",
            line,
        )
        if m is None:
            raise ValueError("Format incorrect. Should be: x1,y1,x2,y2,x3,y3,x4,y4")
    return m


@njit
def lcs(s1, s2):
    """Longeset Common Sequence between s1 & s2"""
    # https://stackoverflow.com/questions/48651891/longest-common-subsequence-in-python
    if len(s1) == 0 or len(s2) == 0:
        return 0, ""
    matrix = [["" for _ in range(len(s2))] for _ in range(len(s1))]
    for i in range(len(s1)):
        for j in range(len(s2)):
            if s1[i] == s2[j]:
                if i == 0 or j == 0:
                    matrix[i][j] = s1[i]
                else:
                    matrix[i][j] = matrix[i - 1][j - 1] + s1[i]
            else:
                if len(matrix[i - 1][j]) > len(matrix[i][j - 1]):
                    matrix[i][j] = matrix[i - 1][j]
                else:
                    matrix[i][j] = matrix[i][j - 1]
    cs = matrix[-1][-1]
    return len(cs), cs


@njit
def harmonic_mean(score1, score2):
    """get harmonic mean value"""
    if score1 + score2 == 0:
        return 0
    else:
        return (2 * score1 * score2) / (score1 + score2)


def cpu_count():
    """Get number of cpu
    os.cpu_count() has a problem with docker container.
    For example, we have 72 cpus. os.cpu_count() always return 72
    even if we allocate only 4 cpus for container.
    """
    return int(subprocess.check_output("nproc").decode().strip())


================================================
FILE: cleval/validation.py
================================================
from cleval.utils import decode_utf8, load_zip_file


def validate_data(gt_file, submit_file, has_crlf):
    gt = load_zip_file(gt_file)
    subm = load_zip_file(submit_file)

    # Validate format of GroundTruth
    for k in gt:
        validate_lines_in_file(k, gt[k], has_crlf)

    # Validate format of results
    for k in subm:
        if k not in gt:
            raise ValueError("The sample %s not present in GT" % k)
        validate_lines_in_file(k, subm[k], has_crlf)


def validate_lines_in_file(file_name, file_contents, has_crlf=True):
    """This function validates that all lines of the file.
    Execute line validation function for each line.
    """
    utf8file = decode_utf8(file_contents)
    if utf8file is None:
        raise ValueError("The file %s is not UTF-8" % file_name)

    lines = utf8file.split("\r\n" if has_crlf else "\n")
    for line in lines:
        _ = line.replace("\r", "").replace("\n", "")


def validate_point_inside_bounds(x, y, img_width, img_height):
    if x < 0 or x > img_width:
        raise ValueError("X value (%s) not valid. Image dimensions: (%s,%s)" % (x, img_width, img_height))
    if y < 0 or y > img_height:
        raise ValueError("Y value (%s)  not valid. Image dimensions: (%s,%s)" % (y, img_width, img_height))


def validate_min_max_bounds(lower_val, upper_val):
    if lower_val > upper_val:
        raise ValueError(f"Value {lower_val} should be smaller than value {upper_val}.")


def validate_clockwise_points(points):
    """
    Validates that the points are in clockwise order.
    """

    if len(points) != 8:
        raise ValueError("Points list not valid." + str(len(points)))

    point = [
        [int(points[0]), int(points[1])],
        [int(points[2]), int(points[3])],
        [int(points[4]), int(points[5])],
        [int(points[6]), int(points[7])],
    ]
    edge = [
        (point[1][0] - point[0][0]) * (point[1][1] + point[0][1]),
        (point[2][0] - point[1][0]) * (point[2][1] + point[1][1]),
        (point[3][0] - point[2][0]) * (point[3][1] + point[2][1]),
        (point[0][0] - point[3][0]) * (point[0][1] + point[3][1]),
    ]

    summatory = edge[0] + edge[1] + edge[2] + edge[3]
    if summatory > 0:
        raise ValueError(
            "Points are not clockwise. " "The coordinates of bounding quads have to be given in clockwise order."
        )


================================================
FILE: pyproject.toml
================================================
[tool.isort]
profile = "black"

[tool.black]
line-length = 120
target-version = ['py38']
include = '\.pyi?$'

[tool.pytest.ini_options]
addopts = "-s"


================================================
FILE: setup.py
================================================
import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="cleval",
    version="0.1.1",
    author="dong.hyun",
    author_email="dong.hyun@navercorp.com",
    description="cleval",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://oss.navercorp.com/CLOVA-AI-OCR/cleval",
    packages=setuptools.find_packages(),
    install_requires=[
        "bottle",
        "requests",
        "Pillow",
        "Polygon3",
        "Shapely",
        "tqdm",
        "pprofile",
        "numba>=0.58.0",
        "six",
        "torchmetrics>=1.2.0",
        "numpy",
    ],
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    entry_points={
        "console_scripts": [
            "cleval = cleval.main:main",
        ],
    },
    python_requires=">=3.7",
)


================================================
FILE: tests/__init__.py
================================================


================================================
FILE: tests/test_scores.py
================================================
import sys

import pytest


def test_output_score():
    sys.argv[1:] = [
        "-g=resources/test_data/gt/gt_eval_doc_v1_kr.zip",
        "-s=resources/test_data/pred/res_eval_doc_v1_kr.zip",
        "--E2E",
        "--DUMP_SAMPLE_RESULT",
        "--DEBUG",
    ]
    from cleval.arg_parser import get_params
    from cleval.main import cleval

    args = get_params()
    result = cleval(args)
    det_hmean = 0.977989360950786
    e2e_hmean = 0.9165773847119407
    pred_det_hmean = result["all"]["det"]["hmean"]
    pred_e2e_hmean = result["all"]["e2e"]["hmean"]
    assert pred_det_hmean == pytest.approx(det_hmean), pred_det_hmean
    assert pred_e2e_hmean == pytest.approx(e2e_hmean), pred_e2e_hmean


def test_output_score_torchmetric():
    sys.argv[1:] = [
        "-g=resources/test_data/gt/gt_eval_doc_v1_kr.zip",
        "-s=resources/test_data/pred/res_eval_doc_v1_kr.zip",
        "--E2E",
        "--DUMP_SAMPLE_RESULT",
        "--DEBUG",
    ]

    import numpy as np

    from cleval import CLEvalMetric
    from cleval.arg_parser import get_params
    from cleval.main import get_file_paths, parse_single_file

    args = get_params()
    gt_zipfile = args.GT_PATHS[0]
    submit_zipfile = args.SUBMIT_PATHS[0]
    gt_files, det_files, file_indices = get_file_paths(gt_zipfile, submit_zipfile)
    metric = CLEvalMetric()

    for gt_file, det_file, file_idx in zip(gt_files, det_files, file_indices):
        gt_boxes = parse_single_file(gt_file, args.CRLF, True, False, box_type=args.BOX_TYPE)
        det_boxes = parse_single_file(
            det_file,
            args.CRLF,
            args.TRANSCRIPTION,
            args.CONFIDENCES,
            box_type=args.BOX_TYPE,
        )
        gt_quads = np.array([gt_box.points for gt_box in gt_boxes])
        gt_letters = [gt_box.transcription for gt_box in gt_boxes]
        gt_is_dcs = [gt_box.is_dc for gt_box in gt_boxes]
        det_quads = np.array([det_box.points for det_box in det_boxes])
        det_letters = [det_box.transcription for det_box in det_boxes]
        _ = metric(det_quads, gt_quads, det_letters, gt_letters, gt_is_dcs)

    metric_out = metric.compute()
    metric.reset()

    det_hmean = 0.977989360950786
    e2e_hmean = 0.9165773847119407
    assert metric_out["det_h"].item() == pytest.approx(det_hmean), metric_out["det_h"]
    assert metric_out["e2e_h"].item() == pytest.approx(e2e_hmean), metric_out["e2e_h"]

Download .txt

gitextract_aw6ps_j3/

├── .github/
│   ├── pull_request_template.md
│   └── workflows/
│       └── ci.yml
├── .gitignore
├── LICENSE
├── NOTICE
├── README.md
├── cleval/
│   ├── __init__.py
│   ├── arg_parser.py
│   ├── box_types.py
│   ├── data.py
│   ├── eval_functions.py
│   ├── main.py
│   ├── torchmetric.py
│   ├── utils.py
│   └── validation.py
├── pyproject.toml
├── setup.py
└── tests/
    ├── __init__.py
    └── test_scores.py

Download .txt

SYMBOL INDEX (120 symbols across 9 files)

FILE: cleval/arg_parser.py
  function str2bool (line 7) | def str2bool(v):
  function get_params (line 18) | def get_params():

FILE: cleval/box_types.py
  function get_midpoints (line 13) | def get_midpoints(p1, p2):
  function point_distance (line 17) | def point_distance(p1, p2):
  class Box (line 23) | class Box(metaclass=abc.ABCMeta):
    method __init__ (line 24) | def __init__(
    method __and__ (line 39) | def __and__(self, other) -> float:
    method subtract (line 44) | def subtract(self, other):
    method center (line 49) | def center(self):
    method center_distance (line 53) | def center_distance(self, other):
    method diagonal_length (line 57) | def diagonal_length(self) -> float:
    method is_inside (line 62) | def is_inside(self, x, y) -> bool:
    method make_polygon_obj (line 67) | def make_polygon_obj(self):
    method pseudo_character_center (line 73) | def pseudo_character_center(self, *args) -> list:
  class QUAD (line 78) | class QUAD(Box):
    method __init__ (line 81) | def __init__(
    method __and__ (line 96) | def __and__(self, other) -> float:
    method subtract (line 103) | def subtract(self, other):
    method center (line 106) | def center(self):
    method center_distance (line 109) | def center_distance(self, other):
    method area (line 112) | def area(self):
    method __or__ (line 115) | def __or__(self, other):
    method make_polygon_obj (line 118) | def make_polygon_obj(self):
    method aspect_ratio (line 130) | def aspect_ratio(self):
    method pseudo_transcription_length (line 140) | def pseudo_transcription_length(self):
    method pseudo_character_center (line 143) | def pseudo_character_center(self, vertical_aspect_ratio_threshold):
    method diagonal_length (line 174) | def diagonal_length(self) -> float:
    method is_inside (line 183) | def is_inside(self, x, y) -> bool:
  class POLY (line 187) | class POLY(Box):
    method __init__ (line 190) | def __init__(self, points, confidence=0.0, transcription="", orientati...
    method __and__ (line 198) | def __and__(self, other):
    method subtract (line 203) | def subtract(self, other):
    method __or__ (line 207) | def __or__(self, other):
    method area (line 210) | def area(self):
    method center (line 213) | def center(self):
    method center_distance (line 216) | def center_distance(self, other):
    method diagonal_length (line 222) | def diagonal_length(self):
    method is_inside (line 236) | def is_inside(self, x, y) -> bool:
    method check_corner_points_are_continuous (line 239) | def check_corner_points_are_continuous(self, lt, rt, rb, lb):
    method get_four_max_distance_from_center (line 251) | def get_four_max_distance_from_center(self):
    method make_polygon_obj (line 263) | def make_polygon_obj(self):
    method aspect_ratio (line 297) | def aspect_ratio(self):
    method pseudo_transcription_length (line 300) | def pseudo_transcription_length(self):
    method make_aspect_ratio (line 303) | def make_aspect_ratio(self):
    method pseudo_character_center (line 314) | def pseudo_character_center(self):

FILE: cleval/data.py
  class MatchReleation (line 7) | class MatchReleation:
  class CoreStats (line 14) | class CoreStats:
  class MatchResult (line 30) | class MatchResult:
  class Point (line 40) | class Point:
  class GTBoxResult (line 46) | class GTBoxResult:
  class DetBoxResult (line 56) | class DetBoxResult:
  class Stats (line 64) | class Stats:
  class SampleResult (line 80) | class SampleResult:
  class GlobalResult (line 89) | class GlobalResult:
  function accumulate_result (line 97) | def accumulate_result(
  function accumulate_stats (line 108) | def accumulate_stats(stats1: Stats, stats2: Stats, is_e2e: bool):
  function accumulate_core_stats (line 121) | def accumulate_core_stats(stats1: CoreStats, stats2: CoreStats):
  function calculate_global_rph (line 132) | def calculate_global_rph(res: GlobalResult, is_e2e: bool):
  function calculate_rph (line 138) | def calculate_rph(stats: CoreStats):

FILE: cleval/eval_functions.py
  class EvalMaterial (line 20) | class EvalMaterial:
  function evaluation (line 37) | def evaluation(args, gt_boxes, det_boxes, scale_range=(0.0, 1.0)):
  function prepare_gt (line 104) | def prepare_gt(gt_boxes, is_case_sensitive, vertical_aspect_ratio_thresh...
  function prepare_det (line 129) | def prepare_det(det_boxes, is_case_sensitive):
  function calc_area_precision (line 136) | def calc_area_precision(gt_boxes, det_boxes, ap_constraint):
  function calc_pcc_inclusion (line 160) | def calc_pcc_inclusion(det_boxes, gt_pcc_points):
  function get_det_dc_indices (line 181) | def get_det_dc_indices(gt_dc_indices, pcc_mat_sum, ap_mat, ap_mat_binary...
  function calc_match_matrix (line 198) | def calc_match_matrix(eval_material):
  function one_to_one_match (line 259) | def one_to_one_match(pcc_mat_sum, gt_idx, det_idx, ap_mat_binary, len_gt...
  function one_to_many_match (line 285) | def one_to_many_match(pcc_mat_sum, gt_idx, ap_mat_binary, det_valid_indi...
  function many_to_one_match (line 301) | def many_to_one_match(pcc_mat_sum, det_idx, ap_mat, ap_constraint, gt_va...
  function get_box_results (line 315) | def get_box_results(gt_boxes, gt_pcc_points, det_boxes):
  function __points_to_result (line 341) | def __points_to_result(points):
  function __pccs_to_result (line 347) | def __pccs_to_result(pcc_points):
  function eval_det (line 351) | def eval_det(args, sample_res, gt_boxes, det_boxes, eval_material, match...
  function get_num_total_char (line 411) | def get_num_total_char(gt_boxes, pcc_mat_sum, gt_valid_indices, det_vali...
  function get_num_fp_char (line 422) | def get_num_fp_char(det_boxes, det_valid_indices, match_mat_gts_sum):
  function eval_e2e (line 433) | def eval_e2e(args, sample_res, gt_boxes, det_boxes, eval_material, match...
  function sort_detbox_order_by_pcc (line 487) | def sort_detbox_order_by_pcc(gt_idx, matched_det_indices, gt_pcc_points,...
  function lcs_elimination (line 507) | def lcs_elimination(gt_trans, gt_trans_not_found, det_trans_not_found, g...
  function eval_orientation (line 523) | def eval_orientation(sample_res, gt_boxes, det_boxes, gt_valid_indices, ...
  function extract_stats (line 546) | def extract_stats(core_stats, num_char_fp, num_char_gt, num_char_det, re...
  function get_gran_score (line 565) | def get_gran_score(num_splitted, penalty_weight):

FILE: cleval/main.py
  function main (line 29) | def main():
  function cleval (line 49) | def cleval(args):
  function get_file_paths (line 99) | def get_file_paths(gt_zipfile, submit_zipfile):
  function eval_single (line 120) | def eval_single(args, gt_file, det_file, file_id):
  function parse_single_file (line 134) | def parse_single_file(
  function parse_values_from_single_line (line 171) | def parse_values_from_single_line(
  function parse_jylee_annot (line 293) | def parse_jylee_annot(quad, transcription, box_type):
  function parse_clova_ocr (line 309) | def parse_clova_ocr(quad, transcription, box_type):

FILE: cleval/torchmetric.py
  class Options (line 16) | class Options:
    method __init__ (line 17) | def __init__(
  class CLEvalMetric (line 35) | class CLEvalMetric(Metric):
    method __init__ (line 38) | def __init__(
    method to (line 136) | def to(self, *args, **kwargs):
    method update (line 142) | def update(
    method __make_eval_input (line 178) | def __make_eval_input(self, quads, letters, is_dcs=None, img_longer_le...
    method __check_box_scale (line 196) | def __check_box_scale(quad, img_longer_length):
    method __accumulate (line 210) | def __accumulate(self, sample_res: SampleResult):
    method compute (line 231) | def compute(self):
    method reset (line 266) | def reset(self):
    method __calculate_rph (line 271) | def __calculate_rph(
    method harmonic_mean (line 293) | def harmonic_mean(self, score1, score2):

FILE: cleval/utils.py
  function load_zip_file (line 10) | def load_zip_file(file):
  function decode_utf8 (line 26) | def decode_utf8(raw):
  function dump_json (line 39) | def dump_json(json_file_path, json_data):
  function read_json (line 44) | def read_json(json_file_path):
  function convert_ltrb2quad (line 50) | def convert_ltrb2quad(points):
  function ltrb_regex_match (line 65) | def ltrb_regex_match(line, with_transcription, with_confidence):
  function quad_regex_match (line 101) | def quad_regex_match(line, with_transcription, with_confidence):
  function lcs (line 166) | def lcs(s1, s2):
  function harmonic_mean (line 189) | def harmonic_mean(score1, score2):
  function cpu_count (line 197) | def cpu_count():

FILE: cleval/validation.py
  function validate_data (line 4) | def validate_data(gt_file, submit_file, has_crlf):
  function validate_lines_in_file (line 19) | def validate_lines_in_file(file_name, file_contents, has_crlf=True):
  function validate_point_inside_bounds (line 32) | def validate_point_inside_bounds(x, y, img_width, img_height):
  function validate_min_max_bounds (line 39) | def validate_min_max_bounds(lower_val, upper_val):
  function validate_clockwise_points (line 44) | def validate_clockwise_points(points):

FILE: tests/test_scores.py
  function test_output_score (line 6) | def test_output_score():
  function test_output_score_torchmetric (line 27) | def test_output_score_torchmetric():

Download .json

Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (98K chars).

[
  {
    "path": ".github/pull_request_template.md",
    "chars": 240,
    "preview": "# Description\n\n- Related issues:\n  - #\n\n# Changes in this PR\n\n# How has this been tested?\n\n# Checklist\n- [ ] This PR fol"
  },
  {
    "path": ".github/workflows/ci.yml",
    "chars": 1575,
    "preview": "name: CI\n\non: pull_request\n\njobs:\n  black:\n    runs-on: ubuntu-latest\n    steps:\n    - name: Checkout\n      uses: action"
  },
  {
    "path": ".gitignore",
    "chars": 126,
    "preview": "__pycache__/\n.vscode\n.DS_Store\n.idea\noutput/\n.pytest_cache\n.mypy_cache\nbuild/\ndist/\n*.egg-info/\n\nvenv\ndebug*\ntmp*\nprofil"
  },
  {
    "path": "LICENSE",
    "chars": 1067,
    "preview": "Copyright (c) 2020-present NAVER Corp.\n\n Permission is hereby granted, free of charge, to any person obtaining a copy\nof"
  },
  {
    "path": "NOTICE",
    "chars": 7172,
    "preview": "CLEval\nCopyright (c) 2020-present NAVER Corp.\n\nThis project contains subcomponents with separate copyright notices and l"
  },
  {
    "path": "README.md",
    "chars": 6854,
    "preview": "# CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks\n\nOfficial implementation of CLEval | [pape"
  },
  {
    "path": "cleval/__init__.py",
    "chars": 91,
    "preview": "from .torchmetric import CLEvalMetric\n\n__version__ = [\"0.1.1\"]\n\n__all__ = [\"CLEvalMetric\"]\n"
  },
  {
    "path": "cleval/arg_parser.py",
    "chars": 3146,
    "preview": "import argparse\nimport os\n\nfrom cleval.utils import cpu_count\n\n\ndef str2bool(v):\n    if isinstance(v, bool):\n        ret"
  },
  {
    "path": "cleval/box_types.py",
    "chars": 13652,
    "preview": "import abc\nimport math\n\nimport cv2\nimport numpy as np\nimport Polygon as polygon3\nfrom shapely.geometry import Point\nfrom"
  },
  {
    "path": "cleval/data.py",
    "chars": 4062,
    "preview": "from dataclasses import dataclass, field\nfrom typing import Dict, List, Union\n\nfrom cleval.utils import harmonic_mean\n\n\n"
  },
  {
    "path": "cleval/eval_functions.py",
    "chars": 21344,
    "preview": "from dataclasses import dataclass\nfrom typing import List\n\nimport numpy as np\nfrom numba import njit\nfrom numpy.typing i"
  },
  {
    "path": "cleval/main.py",
    "chars": 10584,
    "preview": "import os\nimport re\nimport time\nfrom concurrent.futures import ProcessPoolExecutor, as_completed\nfrom dataclasses import"
  },
  {
    "path": "cleval/torchmetric.py",
    "chars": 11166,
    "preview": "\"\"\"\nTODO: Support scalewise eval\nTODO: Support orientation accuracy\n\"\"\"\n\nimport cv2\nimport numpy as np\nimport torch\nfrom"
  },
  {
    "path": "cleval/utils.py",
    "chars": 6205,
    "preview": "import codecs\nimport json\nimport re\nimport subprocess\nimport zipfile\n\nfrom numba import njit\n\n\ndef load_zip_file(file):\n"
  },
  {
    "path": "cleval/validation.py",
    "chars": 2360,
    "preview": "from cleval.utils import decode_utf8, load_zip_file\n\n\ndef validate_data(gt_file, submit_file, has_crlf):\n    gt = load_z"
  },
  {
    "path": "pyproject.toml",
    "chars": 151,
    "preview": "[tool.isort]\nprofile = \"black\"\n\n[tool.black]\nline-length = 120\ntarget-version = ['py38']\ninclude = '\\.pyi?$'\n\n[tool.pyte"
  },
  {
    "path": "setup.py",
    "chars": 983,
    "preview": "import setuptools\n\nwith open(\"README.md\", \"r\") as fh:\n    long_description = fh.read()\n\nsetuptools.setup(\n    name=\"clev"
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/test_scores.py",
    "chars": 2422,
    "preview": "import sys\n\nimport pytest\n\n\ndef test_output_score():\n    sys.argv[1:] = [\n        \"-g=resources/test_data/gt/gt_eval_doc"
  }
]

About this extraction

This page contains the full source code of the clovaai/CLEval GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (91.0 KB), approximately 24.1k tokens, and a symbol index with 120 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo