Full Code of mekarpeles/captcha-decoder for AI

master 912dac971f83 cached
14 files
41.8 KB
10.6k tokens
33 symbols
1 requests
Download .txt
Repository: mekarpeles/captcha-decoder
Branch: master
Commit: 912dac971f83
Files: 14
Total size: 41.8 KB

Directory structure:
gitextract_0h0kn1hl/

├── .gitignore
├── .travis.yml
├── AUTHORS
├── CHANGES
├── LICENSE
├── MANIFEST.in
├── README.md
├── decaptcha/
│   ├── __init__.py
│   ├── cli.py
│   └── decoder.py
├── setup.cfg
├── setup.py
└── tests/
    ├── __init__.py
    └── test_captcha.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*~
decaptcha/output/*

*.py[co]

# Packages
*.egg
*.egg-info
dist
build
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg

# Installer logs
pip-log.txt

# Unit test / coverage reports
.coverage
.tox

#Translations
*.mo

#Mr Developer
.mr.developer.cfg

# PyCharm
.idea/

================================================
FILE: .travis.yml
================================================
language: python
python:
  - "2.7"
  - "3.4"
matrix:
  allow_failures:
    - python: "2.7"
install:
  - pip install flake8 --use-mirrors
  - pip install pep8 --use-mirrors
  - pip install -q -e . --use-mirrors
script:
  - nosetests
  - if [[ $TRAVIS_PYTHON_VERSION == '2.7' ]]; then flake8 decaptcha; fi
cache: apt
sudo: false

================================================
FILE: AUTHORS
================================================
Mek <michael.karpeles@gmail.com> @mekarpeles (maintainer)
Ben Boyter <bboyte01@gmail.com> @boyter (original author)
Nate Urwin <nateurwin@gmail.com> @nateurwin
Abel Molina




================================================
FILE: CHANGES
================================================
v0.0.1, Tue Jul 28 17:43:00 2015 -- Initial Release.


================================================
FILE: LICENSE
================================================
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS
CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS
PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE
WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS
PROHIBITED.

BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND
AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS
LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU
THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH
TERMS AND CONDITIONS.

1. Definitions

"Adaptation" means a work based upon the Work, or upon the Work and
other pre-existing works, such as a translation, adaptation,
derivative work, arrangement of music or other alterations of a
literary or artistic work, or phonogram or performance and includes
cinematographic adaptations or any other form in which the Work may be
recast, transformed, or adapted including in any form recognizably
derived from the original, except that a work that constitutes a
Collection will not be considered an Adaptation for the purpose of
this License. For the avoidance of doubt, where the Work is a musical
work, performance or phonogram, the synchronization of the Work in
timed-relation with a moving image ("synching") will be considered an
Adaptation for the purpose of this License.  "Collection" means a
collection of literary or artistic works, such as encyclopedias and
anthologies, or performances, phonograms or broadcasts, or other works
or subject matter other than works listed in Section 1(f) below,
which, by reason of the selection and arrangement of their contents,
constitute intellectual creations, in which the Work is included in
its entirety in unmodified form along with one or more other
contributions, each constituting separate and independent works in
themselves, which together are assembled into a collective whole. A
work that constitutes a Collection will not be considered an
Adaptation (as defined below) for the purposes of this License.
"Creative Commons Compatible License" means a license that is listed
at http://creativecommons.org/compatiblelicenses that has been
approved by Creative Commons as being essentially equivalent to this
License, including, at a minimum, because that license: (i) contains
terms that have the same purpose, meaning and effect as the License
Elements of this License; and, (ii) explicitly permits the relicensing
of adaptations of works made available under that license under this
License or a Creative Commons jurisdiction license with the same
License Elements as this License.  "Distribute" means to make
available to the public the original and copies of the Work or
Adaptation, as appropriate, through sale or other transfer of
ownership.  "License Elements" means the following high-level license
attributes as selected by Licensor and indicated in the title of this
License: Attribution, ShareAlike.  "Licensor" means the individual,
individuals, entity or entities that offer(s) the Work under the terms
of this License.  "Original Author" means, in the case of a literary
or artistic work, the individual, individuals, entity or entities who
created the Work or if no individual or entity can be identified, the
publisher; and in addition (i) in the case of a performance the
actors, singers, musicians, dancers, and other persons who act, sing,
deliver, declaim, play in, interpret or otherwise perform literary or
artistic works or expressions of folklore; (ii) in the case of a
phonogram the producer being the person or legal entity who first
fixes the sounds of a performance or other sounds; and, (iii) in the
case of broadcasts, the organization that transmits the broadcast.
"Work" means the literary and/or artistic work offered under the terms
of this License including without limitation any production in the
literary, scientific and artistic domain, whatever may be the mode or
form of its expression including digital form, such as a book,
pamphlet and other writing; a lecture, address, sermon or other work
of the same nature; a dramatic or dramatico-musical work; a
choreographic work or entertainment in dumb show; a musical
composition with or without words; a cinematographic work to which are
assimilated works expressed by a process analogous to cinematography;
a work of drawing, painting, architecture, sculpture, engraving or
lithography; a photographic work to which are assimilated works
expressed by a process analogous to photography; a work of applied
art; an illustration, map, plan, sketch or three-dimensional work
relative to geography, topography, architecture or science; a
performance; a broadcast; a phonogram; a compilation of data to the
extent it is protected as a copyrightable work; or a work performed by
a variety or circus performer to the extent it is not otherwise
considered a literary or artistic work.  "You" means an individual or
entity exercising rights under this License who has not previously
violated the terms of this License with respect to the Work, or who
has received express permission from the Licensor to exercise rights
under this License despite a previous violation.  "Publicly Perform"
means to perform public recitations of the Work and to communicate to
the public those public recitations, by any means or process,
including by wire or wireless means or public digital performances; to
make available to the public Works in such a way that members of the
public may access these Works from a place and at a place individually
chosen by them; to perform the Work to the public by any means or
process and the communication to the public of the performances of the
Work, including by public digital performance; to broadcast and
rebroadcast the Work by any means including signs, sounds or images.
"Reproduce" means to make copies of the Work by any means including
without limitation by sound or visual recordings and the right of
fixation and reproducing fixations of the Work, including storage of a
protected performance or phonogram in digital form or other electronic
medium.  2. Fair Dealing Rights. Nothing in this License is intended
to reduce, limit, or restrict any uses free from copyright or rights
arising from limitations or exceptions that are provided for in
connection with the copyright protection under copyright law or other
applicable laws.

3. License Grant. Subject to the terms and conditions of this License,
Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
perpetual (for the duration of the applicable copyright) license to
exercise the rights in the Work as stated below:

to Reproduce the Work, to incorporate the Work into one or more
Collections, and to Reproduce the Work as incorporated in the
Collections; to create and Reproduce Adaptations provided that any
such Adaptation, including any translation in any medium, takes
reasonable steps to clearly label, demarcate or otherwise identify
that changes were made to the original Work. For example, a
translation could be marked "The original work was translated from
English to Spanish," or a modification could indicate "The original
work has been modified."; to Distribute and Publicly Perform the Work
including as incorporated in Collections; and, to Distribute and
Publicly Perform Adaptations.  For the avoidance of doubt:

Non-waivable Compulsory License Schemes. In those jurisdictions in
which the right to collect royalties through any statutory or
compulsory licensing scheme cannot be waived, the Licensor reserves
the exclusive right to collect such royalties for any exercise by You
of the rights granted under this License; Waivable Compulsory License
Schemes. In those jurisdictions in which the right to collect
royalties through any statutory or compulsory licensing scheme can be
waived, the Licensor waives the exclusive right to collect such
royalties for any exercise by You of the rights granted under this
License; and, Voluntary License Schemes. The Licensor waives the right
to collect royalties, whether individually or, in the event that the
Licensor is a member of a collecting society that administers
voluntary licensing schemes, via that society, from any exercise by
You of the rights granted under this License.  The above rights may be
exercised in all media and formats whether now known or hereafter
devised. The above rights include the right to make such modifications
as are technically necessary to exercise the rights in other media and
formats. Subject to Section 8(f), all rights not expressly granted by
Licensor are hereby reserved.

4. Restrictions. The license granted in Section 3 above is expressly
made subject to and limited by the following restrictions:

You may Distribute or Publicly Perform the Work only under the terms
of this License. You must include a copy of, or the Uniform Resource
Identifier (URI) for, this License with every copy of the Work You
Distribute or Publicly Perform. You may not offer or impose any terms
on the Work that restrict the terms of this License or the ability of
the recipient of the Work to exercise the rights granted to that
recipient under the terms of the License. You may not sublicense the
Work. You must keep intact all notices that refer to this License and
to the disclaimer of warranties with every copy of the Work You
Distribute or Publicly Perform. When You Distribute or Publicly
Perform the Work, You may not impose any effective technological
measures on the Work that restrict the ability of a recipient of the
Work from You to exercise the rights granted to that recipient under
the terms of the License. This Section 4(a) applies to the Work as
incorporated in a Collection, but this does not require the Collection
apart from the Work itself to be made subject to the terms of this
License. If You create a Collection, upon notice from any Licensor You
must, to the extent practicable, remove from the Collection any credit
as required by Section 4(c), as requested. If You create an
Adaptation, upon notice from any Licensor You must, to the extent
practicable, remove from the Adaptation any credit as required by
Section 4(c), as requested.  You may Distribute or Publicly Perform an
Adaptation only under the terms of: (i) this License; (ii) a later
version of this License with the same License Elements as this
License; (iii) a Creative Commons jurisdiction license (either this or
a later license version) that contains the same License Elements as
this License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative
Commons Compatible License. If you license the Adaptation under one of
the licenses mentioned in (iv), you must comply with the terms of that
license. If you license the Adaptation under the terms of any of the
licenses mentioned in (i), (ii) or (iii) (the "Applicable License"),
you must comply with the terms of the Applicable License generally and
the following provisions: (I) You must include a copy of, or the URI
for, the Applicable License with every copy of each Adaptation You
Distribute or Publicly Perform; (II) You may not offer or impose any
terms on the Adaptation that restrict the terms of the Applicable
License or the ability of the recipient of the Adaptation to exercise
the rights granted to that recipient under the terms of the Applicable
License; (III) You must keep intact all notices that refer to the
Applicable License and to the disclaimer of warranties with every copy
of the Work as included in the Adaptation You Distribute or Publicly
Perform; (IV) when You Distribute or Publicly Perform the Adaptation,
You may not impose any effective technological measures on the
Adaptation that restrict the ability of a recipient of the Adaptation
from You to exercise the rights granted to that recipient under the
terms of the Applicable License. This Section 4(b) applies to the
Adaptation as incorporated in a Collection, but this does not require
the Collection apart from the Adaptation itself to be made subject to
the terms of the Applicable License.  If You Distribute, or Publicly
Perform the Work or any Adaptations or Collections, You must, unless a
request has been made pursuant to Section 4(a), keep intact all
copyright notices for the Work and provide, reasonable to the medium
or mans You are utilizing: (i) the name of the Original Author (or
pseudonym, if applicable) if supplied, and/or if the Original Author
and/or Licensor designate another party or parties (e.g., a sponsor
institute, publishing entity, journal) for attribution ("Attribution
Parties") in Licensor's copyright notice, terms of service or by other
reasonable means, the name of such party or parties; (ii) the title of
the Work if supplied; (iii) to the extent reasonably practicable, the
URI, if any, that Licensor specifies to be associated with the Work,
unless such URI does not refer to the copyright notice or licensing
information for the Work; and (iv) , consistent with Ssection 3(b), in
the case of an Adaptation, a credit identifying the use of the Work in
the Adaptation (e.g., "French translation of the Work by Original
Author," or "Screenplay based on original Work by Original
Author"). The credit required by this Section 4(c) may be implemented
in any reasonable manner; provided, however, that in the case of a
Adaptation or Collection, at a minimum such credit will appear, if a
credit for all contributing authors of the Adaptation or Collection
appears, then as part of these credits and in a manner at least as
prominent as the credits for the other contributing authors. For the
avoidance of doubt, You may only use the credit required by this
Section for the purpose of attribution in the manner set out above
and, by exercising Your rights under this License, You may not
implicitly or explicitly assert or imply any connection with,
sponsorship or endorsement by the Original Author, Licensor and/or
Attribution Parties, as appropriate, of You or Your use of the Work,
without the separate, express prior written permission of the Original
Author, Licensor and/or Attribution Parties.  Except as otherwise
agreed in writing by the Licensor or as may be otherwise permitted by
applicable law, if You Reproduce, Distribute or Publicly Perform the
Work either by itself or as part of any Adaptations or Collections,
You must not distort, mutilate, modify or take other derogatory action
in relation to the Work which would be prejudicial to the Original
Author's honor or reputation. Licensor agrees that in those
jurisdictions (e.g. Japan), in which any exercise of the right granted
in Section 3(b) of this License (the right to make Adaptations) would
be deemed to be a distortion, mutilation, modification or other
derogatory action prejudicial to the Original Author's honor and
reputation, the Licensor will waive or not assert, as appropriate,
this Section, to the fullest extent permitted by the applicable
national law, to enable You to reasonably exercise Your right under
Section 3(b) of this License (right to make Adaptations) but not
otherwise.  5. Representations, Warranties and Disclaimer

UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING,
LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR
WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED,
STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF
TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE,
NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY,
OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT
DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED
WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.

6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY
APPLICABLE LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY
LEGAL THEORY FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR
EXEMPLARY DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK,
EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

7. Termination

This License and the rights granted hereunder will terminate
automatically upon any breach by You of the terms of this
License. Individuals or entities who have received Adaptations or
Collections from You under this License, however, will not have their
licenses terminated provided such individuals or entities remain in
full compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8
will survive any termination of this License.  Subject to the above
terms and conditions, the license granted here is perpetual (for the
duration of the applicable copyright in the Work). Notwithstanding the
above, Licensor reserves the right to release the Work under different
license terms or to stop distributing the Work at any time; provided,
however that any such election will not serve to withdraw this License
(or any other license that has been, or is required to be, granted
under the terms of this License), and this License will continue in
full force and effect unless terminated as stated above.

8. Miscellaneous

Each time You Distribute or Publicly Perform the Work or a Collection,
the Licensor offers to the recipient a license to the Work on the same
terms and conditions as the license granted to You under this License.
Each time You Distribute or Publicly Perform an Adaptation, Licensor
offers to the recipient a license to the original Work on the same
terms and conditions as the license granted to You under this License.
If any provision of this License is invalid or unenforceable under
applicable law, it shall not affect the validity or enforceability of
the remainder of the terms of this License, and without further action
by the parties to this agreement, such provision shall be reformed to
the minimum extent necessary to make such provision valid and
enforceable.  No term or provision of this License shall be deemed
waived and no breach consented to unless such waiver or consent shall
be in writing and signed by the party to be charged with such waiver
or consent.  This License constitutes the entire agreement between the
parties with respect to the Work licensed here. There are no
understandings, agreements or representations with respect to the Work
not specified here. Licensor shall not be bound by any additional
provisions that may appear in any communication from You. This License
may not be modified without the mutual written agreement of the
Licensor and You.  The rights granted under, and the subject matter
referenced, in this License were drafted utilizing the terminology of
the Berne Convention for the Protection of Literary and Artistic Works
(as amended on September 28, 1979), the Rome Convention of 1961, the
WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms
Treaty of 1996 and the Universal Copyright Convention (as revised on
July 24, 1971). These rights and subject matter take effect in the
relevant jurisdiction in which the License terms are sought to be
enforced according to the corresponding provisions of the
implementation of those treaty provisions in the applicable national
law. If the standard suite of rights granted under applicable
copyright law includes additional rights not granted under this
License, such additional rights are deemed to be included in the
License; this License is not intended to restrict the license of any
rights under applicable law.  Creative Commons Notice

Creative Commons is not a party to this License, and makes no warranty
whatsoever in connection with the Work. Creative Commons will not be
liable to You or any party on any legal theory for any damages
whatsoever, including without limitation any general, special,
incidental or consequential damages arising in connection to this
license. Notwithstanding the foregoing two (2) sentences, if Creative
Commons has expressly identified itself as the Licensor hereunder, it
shall have all rights and obligations of Licensor.

Except for the limited purpose of indicating to the public that the
Work is licensed under the CCPL, Creative Commons does not authorize
the use by either party of the trademark "Creative Commons" or any
related trademark or logo of Creative Commons without the prior
written consent of Creative Commons. Any permitted use will be in
compliance with Creative Commons' then-current trademark usage
guidelines, as may be published on its website or otherwise made
available upon request from time to time. For the avoidance of doubt,
this trademark restriction does not form part of the License.

Creative Commons may be contacted at http://creativecommons.org/.

================================================
FILE: MANIFEST.in
================================================
recursive-include decaptcha/iconset *
recursive-include examples *


================================================
FILE: README.md
================================================
captcha-decoder
===============

![Build Status](https://travis-ci.org/mekarpeles/captcha-decoder.png)

This module takes a captcha (image) as input, attempts to partition it into discrete segments, each (it hopes) containing a single symbol, and then runs basic vector space search to determine the similarity of each symbol against known characters (whose reference images are included). The objective of this project is to (a) make bboyte's code more accessible and (b) illustrate, in a readable way, the fundamentals of captcha cracking. It's primary goal is clarity and makes no claims or attempts at efficiency, accuracy, or practicality.

This work is a derivation of an original work by @boyter
<bboyte01@gmail.com>, http://www.boyter.org/decoding-captchas/ (see
origin tutorial at
https://web.archive.org/web/20121012023114/http://www.wausita.com/captcha/)

## Installation

On ubuntu, libjpeg-dev and libpng-dev may be system requirements for the Python Pillow (PIL) library 

    sudo apt-get install libjpeg-dev
    sudo apt-get install libpng-dev
    
Next, fetch and build the decaptcha library

    pip install git+https://github.com/mekarpeles/captcha-decoder.git

## Usage

The decaptcha library comes with a command line utility called `decaptca`. Running the command with `-h` will show a list of options. The <img> argument can be provided a filepath or a url:

    usage: decaptcha [-h] [-v] [-l LIMIT] [-c CHANNELS] [-t THRESHOLD] [--min MIN]
                     [--max MAX] [-o TOLERANCE]
                     [<img>]
    
    Python captcha cracking utility
    
    positional arguments:
      <img>                 Enter the filesystem path or url of a captcha image
    
    optional arguments:
      -h, --help            show this help message and exit
      -v                    Displays the decaptcha version
      -l LIMIT, --limit LIMIT
                            Package url
      -c CHANNELS, --channels CHANNELS
                            The number of prominant color channels to keep
      -t THRESHOLD, --threshold THRESHOLD
                            Accuracy threshold for matching decimal [0-1]
      --min MIN             Filter out colors darker than this [0-256]
      --max MAX             Filter out colors light than this [0-256]
      -o TOLERANCE, --tolerance TOLERANCE
                            Pixel tolerance for character segmentation. Higher is
                            more lenient/greedy, lower is strict.

# Example

    $ decaptcha http://www.mondor.org/img/capex.jpg  --min 0 --max 20 --limit 5 --channels 5 --tolerance 7
    
    Character 0 of 7:
            t (confidence of 0.839150063096)
            e (confidence of 0.827405543276)
    Character 1 of 7:
            0 (confidence of 0.834057656228)
            l (confidence of 0.771064160322)
    Character 2 of 7:
            t (confidence of 0.309437274354)
            e (confidence of 0.303227199152)
    Character 3 of 7:
    Character 4 of 7:
            t (confidence of 0.267644230239)
            7 (confidence of 0.266067912114)
    Character 5 of 7:
            0 (confidence of 0.834057656228)
            l (confidence of 0.789422830806)
    Character 6 of 7:
            t (confidence of 0.835510535512)
            e (confidence of 0.835221298415)


## Further Reading

The following implementations and techniques are recommended as more practical and accurate alternatives for this project:

1. http://www.codeproject.com/Articles/106583/Handwriting-Recognition-Revisited-Kernel-Support-V


================================================
FILE: decaptcha/__init__.py
================================================
# -*- coding: utf-8 -*-

"""
   decaptcha
   ~~~~~~~~~

   Basic Captcha Cracker
"""

__version__ = '0.0.1'
__author__ = [
    'Mek <michael.karpeles@gmail.com>'
]
__license__ = 'see LICENSE (creative commons)'
__contributors__ = 'see AUTHORS'
__title__ = 'Python captcha cracking utility'

import sys
from .decoder import Captcha  # NOQA
from .decoder import trim, channel, monochrome, regions, decode  # NOQA
from .cli import main

if __name__ == '__main__':
    sys.exit(main())


================================================
FILE: decaptcha/cli.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import argparse
from . import Captcha, __title__, __version__


def threshold(x):
    x = float(x)
    if 1 > x < 0:
        raise argparse.ArgumentTypeError(
            "Threshold must be a value between 0 and 1.")
    return x


def argparser():
    """Creates a command line ArgumentParser for decaptcha."""
    parser = argparse.ArgumentParser(description=__title__)
    parser.add_argument('-v', help='Displays the decaptcha version',
                        action='version', version='%s v%s'
                        % (__title__, __version__))
    parser.add_argument('captcha', nargs='?', metavar='<img>',
                        help='Enter the filesystem path or url '
                        'of a captcha image')
    parser.add_argument('-l', '--limit', dest='limit', help='Package url',
                        type=int, default=3)
    parser.add_argument('-c', '--channels', dest='channels',
                        help='The number of prominant color channels to keep',
                        type=int, default=3)
    parser.add_argument('-t', '--threshold', dest='threshold',
                        help='Accuracy threshold for matching decimal [0-1]',
                        type=threshold, default=0)
    parser.add_argument('--min', dest='min', type=int, default=0,
                        help='Filter out colors darker than this [0-256]')
    parser.add_argument('--max', dest='max', type=int, default=230,
                        help='Filter out colors light than this [0-256]')
    parser.add_argument('-o', '--tolerance', dest='tolerance', type=int,
                        default=3, help='Pixel tolerance for character '
                        'segmentation. Higher is more lenient/greedy, '
                        'lower is strict.')
    return parser


def prettyprint(guesses):
    regions = len(guesses)
    for i, guess in enumerate(guesses):
        print('Character %s of %s:' % (i + 1, regions))
        for result in guess:
            confidence, symbol = result
            print('\t%s (%s confidence)' % (symbol, confidence))


def main():
    parser = argparser()
    args = parser.parse_args()

    if not args.captcha:
        raise ValueError('No captcha input image provided')

    prettyprint(Captcha(args.captcha).decode(channels=args.channels,
                                             limit=args.limit,
                                             threshold=args.threshold,
                                             tolerance=args.tolerance,
                                             _min=args.min,
                                             _max=args.max))


if __name__ == "__main__":
    main()


================================================
FILE: decaptcha/decoder.py
================================================
# -*- coding: utf-8 -*-

"""
    crack.py
    ~~~~~~~~

    This module takes captcha images as input and partitions them into
    n new images, 1 image per character found within the captcha.

    Original Code (http://tinyurl.com/puq6alb) by bboyte01@gmail.com
    https://web.archive.org/web/20121012023114/http://www.wausita.com/captcha/
    http://www.wausita.com/captcha/

    :copyright: (c) 2012 by Mek
    :license: see LICENSE for more details.
"""

import os
import string
from operator import itemgetter
from math import sqrt
from PIL import Image, ImageChops
from io import BytesIO

try:
    from urllib.request import urlopen
except ImportError:
    from urllib import urlopen

SYMBOLS = list(string.ascii_lowercase + string.digits)
ICONS_PATH = os.path.abspath(
    os.path.join(os.path.dirname(__file__), 'iconset'))
IMAGESET = []
WHITE = 255


def imageset():
    """Loads icons of various characters"""
    imageset = []
    for symbol in SYMBOLS:
        for imfile in os.listdir(os.path.join(ICONS_PATH, symbol)):
            path = os.path.join(ICONS_PATH, symbol, imfile)
            imageset.append({symbol: Image.open(path)})
    return imageset


def trim(im, color=WHITE):
    """Tims image to remove excess color (default: whitespace)"""
    bg = Image.new(im.mode, im.size, WHITE)
    diff = ImageChops.difference(im, bg)
    diff = ImageChops.add(diff, diff, 2.0, -100)
    return im.crop(diff.getbbox())


def channel(im, *colors, **kwargs):
    """Composes an new image with the same dimensions as `im` but
    draws only pixels of the specified color channels on a `bg`
    colored background.
    """
    bg = kwargs.get('bg', WHITE)
    sample = Image.new('P', im.size, bg)
    width, height = im.size
    for col in range(width):
        for row in range(height):
            pixel = im.getpixel((col, row))
            if pixel in colors:
                sample.putpixel((col, row), pixel)
    return sample


def monochrome(im, threshold=255):
    """Converts all colors in gif image which are less than threshold
    to black"""
    return im.point(lambda x: 0 if x < 255 else 255, '1')


def regions(im, threshold=1):
    """Iterates over the columns of an image from left-to-right and
    composes an ordered list of (start, end) column ranges referring
    to discrete, contiguous columns which contain at least `threshold`
    non-white pixel.
    """
    regions = []
    start = None
    width, height = im.size
    for col in range(width):
        # if column contains at least one pixel
        if sum([im.getpixel((col, row)) is not WHITE
                for row in range(height)]) >= threshold:
            start = start if start else col
        elif start:
            regions.append((start, col))
            start = None  # reset start
    return regions


def similarity(im1, im2, equalize=False):
    """Takes in two images, vectorizes them into concordance
    dictionaries and spits out a number from 0 to 1 indicating how
    related they are. 0 means no relation and 1 indicates they are the
    same.

    params:
        stretch - stretch im2 to be the same dimensions as 1
    """
    def scale(im1, im2):
        """Scales the image with the greater height to match the one
        with the smaller height
        """
        if im1.size[1] > im2.size[1]:
            return im1.resize(im2.size, Image.ANTIALIAS), im2
        elif im1.size[1] < im2.size[1]:
            return im1, im2.resize(im1.size, Image.ANTIALIAS)
        return im1, im2

    def vectorize(im):
        """im.getdata returns the contents of an image as a flattened
        sequence object containing pixel values.
        """
        d1 = {}
        for count, i in enumerate(im.getdata()):
            d1[count] = i
        return d1

    def magnitude(concordance):
        return sqrt(sum(count ** 2 for word, count in concordance.items()))

    c1, c2 = [vectorize(im) for im in
              (scale(im1, im2) if equalize else (im1, im2))]
    topvalue = 0
    for word, count in c1.items():
        if word in c2:
            topvalue += count * c2[word]
    return topvalue / (magnitude(c1) * magnitude(c2))


class Captcha(object):

    def __init__(self, imgpath):
        self.imgpath = imgpath

    @property
    def im(self):
        """Fetches captcha's image from disk or url"""
        try:
            im = Image.open(self.imgpath)
        except:
            im = Image.open(BytesIO(urlopen(self.imgpath).read()))
        return self.gif(im)

    @property
    def histogram(self):
        with self.im as im:
            return im.histogram()

    def decode(self, channels=3, limit=3, threshold=0, tolerance=3,
               _min=0, _max=245):
        """Attempts to decode a captcha by:

        - Finding the `n` most prominant colors in the image
        - Sampling the captcha into `n` images, each discretely composed
          of a differnet prominant colors.
        - Segmenting each sample into regions of contiguous columns
          containing any pixels pixelation (which are hopefully
          equates to individual alphanumeric characters), and finally
        - Guessing which character appears in each segment

        XXX Prettier output and organizing of results required
        """
        colors = [color for color, _ in self
                  .prominant_colors(n=channels, _min=_min, _max=_max)]
        sample = monochrome(self.channel(*colors))
        return [self.guess_character(segment, limit=limit, threshold=threshold)
                for segment in self.segments(sample, tolerance=tolerance)]

    def prominant_colors(self, n=5, _min=0, _max=256):
        """Calculates the n most prominant colors of an image as an
        ordered list of (color, frequency) tuples.

        params:
            n - limit the number of colors to `n`
            _min - exclude any colors below this number (filter
                   out dark colors, like black/0)
            _max - exclude any colors above this number (filter out
                   light colors, like white/256)

        XXX consider sorted(im.getcolors(n), reverse=True)
        """
        hist = self.histogram[_min:_max]
        return sorted([(c, f) for c, f in enumerate(hist)],
                      key=itemgetter(1), reverse=True)[:n]

    def channel(self, *colors, **kwargs):
        """Composes an image with the same dimensions as `im` but
        draws only pixels of the specified colors on a `bg` colored
        background.
        """
        with self.im as im:
            return channel(im, *colors, **kwargs)

    @staticmethod
    def gif(im):
        """Converts captcha to a GIF (makes things easier since it has
        255 colors) and finds the most prominent colors in the image
        """
        return im if im.mode is 'P' else im.convert('P')

    @classmethod
    def segments(cls, im, tolerance=3, crop=True):
        """Discover """
        return [cls.segment(im, region, crop=crop) for
                region in regions(im, threshold=tolerance)]

    @classmethod
    def segment(cls, im, region, crop=True):
        """Returns a cropped image segment (hopefully of an
        alphanumeric character) within the range of the region
        """
        start, end = 0, 1
        segment = im.crop((region[start], 0, region[end], im.size[1]))
        return trim(segment) if crop else segment

    @staticmethod
    def guess_character(im, threshold=0, limit=None):
        """Guess alphanumeric character in image using Basic Vector
        Space Search algorithm.

        http://la2600.org/talks/files/20040102/Vector_Space_Search_Engine_Theory.pdf
        """
        global IMAGESET  # lazy-ish style iconset loading
        if not IMAGESET:
            IMAGESET = imageset()

        guesses = []
        for icon in IMAGESET:
            for symbol, im2 in icon.items():
                guess = similarity(im, im2, equalize=True)
                if guess >= threshold:
                    guesses.append((guess, symbol))
        return sorted(guesses, reverse=True)[:limit]


def decode(captcha, channels=1, limit=3, threshold=0, tolerance=3,
           _min=0, _max=256):
    """Backward compatible method for decoding a captcha"""
    return Captcha(captcha).decode(
        channels=channels,
        limit=limit,
        threshold=threshold,
        tolerance=tolerance,
        _min=_min, _max=_max)


================================================
FILE: setup.cfg
================================================
[bdist_wheel]
universal=1


================================================
FILE: setup.py
================================================
# -*- coding: utf-8 -*-

"""
    decaptcha
    ~~~~~~~~~

"""

import codecs
import os
import re
from setuptools import setup

here = os.path.abspath(os.path.dirname(__file__))


def read(*parts):
    """Taken from pypa pip setup.py:
    intentionally *not* adding an encoding option to open, See:
    https://github.com/pypa/virtualenv/issues/201#issuecomment-3145690
    """
    return codecs.open(os.path.join(here, *parts), 'r').read()


def find_version(*file_paths):
    version_file = read(*file_paths)
    version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]",
                              version_file, re.M)
    if version_match:
        return version_match.group(1)
    raise RuntimeError("Unable to find version string.")


setup(
    name='decaptcha',
    version=find_version("decaptcha", "__init__.py"),
    description='Python captcha cracking utility',
    long_description=read('README.md'),
    url='http://github.com/mekarpeles/captcha-decoder',
    author='mek',
    author_email='michael.karpeles@gmail.com',
    packages=[
        'decaptcha',
        ],
    platforms='any',
    license='LICENSE',
    classifiers=[
        'Development Status :: 2 - Pre-Alpha',
        "Intended Audience :: Developers",
        "Programming Language :: Python :: 2.7",
        "Programming Language :: Python :: 3"
        ],
    install_requires=[
        'Pillow >= 2.9.0'
        ],
    entry_points={
        'console_scripts': ['decaptcha=decaptcha.cli:main'],
        },
    extras_require={
        ':python_version=="2.7"': ['argparse']
        },
    include_package_data=True,
    package_data={'': ['iconset/**/*.gif']},
)


================================================
FILE: tests/__init__.py
================================================


================================================
FILE: tests/test_captcha.py
================================================
# -*- coding: utf-8 -*-

"""
    tests
    ~~~~~
    Test cases for the decaptcha package

    :copyright: (c) 2012 by Mek
    :license: see LICENSE for more details.
"""

import os.path
import unittest
from PIL import Image
import decaptcha

TEST_IMG_DIR = os.path.abspath(
    os.path.join(
        os.path.dirname(os.path.realpath(__file__)),
        os.pardir, 'tests', 'images'))
TEST_CAPTCHA_IMG = os.path.join(TEST_IMG_DIR, 'captcha.gif')
TEST_CHANNEL_IMG = os.path.join(TEST_IMG_DIR, 'channel.gif')
TEST_SEGMENT_IMG = lambda i: os.path.join(TEST_IMG_DIR, 'segment%s.gif' % i)
EXPECTED_HISTOGRAM = [
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
    0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0,
    2, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 1, 2, 0, 0, 0, 1, 2, 0, 1, 0, 0, 1, 0, 2, 0, 0, 1, 0, 0, 2, 0, 0, 0,
    0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 3, 1, 3, 3, 0, 0, 0, 0, 0, 0, 1, 0, 3,
    2, 132, 1, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 15, 0, 1, 0, 1, 0, 0,
    8, 1, 0, 0, 0, 0, 1, 6, 0, 2, 0, 0, 0, 0, 18, 1, 1, 1, 1, 1, 2, 365,
    115, 0, 1, 0, 0, 0, 135, 186, 0, 0, 1, 0, 0, 0, 116, 3, 0, 0, 0, 0, 0,
    21, 1, 1, 0, 0, 0, 2, 10, 2, 0, 0, 0, 0, 2, 10, 0, 0, 0, 0, 1, 0, 625
    ]
EXPECTED_DOMINANT_COLORS = [
    (255, 625), (212, 365), (220, 186), (219, 135), (169, 132), (227, 116),
    (213, 115), (234, 21), (205, 18), (184, 15)
    ]
EXPECTED_REGIONS = [
    (6, 14), (15, 25), (27, 35), (37, 46), (48, 56), (57, 67)
    ]
EXPECTED_OUTPUT = (1.0, '7')


class CaptchaDecoderTest(unittest.TestCase):

    def setUp(self):
        self.captcha = decaptcha.Captcha(TEST_CAPTCHA_IMG)

    def test_histogram(self):
        self.assertTrue(self.captcha.histogram == EXPECTED_HISTOGRAM,
                        "Captcha histogram different from expected")

    def test_prominant_colors(self):
        self.assertTrue(self.captcha.prominant_colors(n=10) ==
                        EXPECTED_DOMINANT_COLORS,
                        "Captcha's dominant colors differ from expected")

    def test_channels(self):
        channel = decaptcha.monochrome(self.captcha.channel(220, 227))
        print(channel.histogram())
        self.assertTrue(channel.histogram() ==
                        Image.open(TEST_CHANNEL_IMG).histogram(),
                        "Channel sample did not match expected image")

    def test_regions(self):
        sample = decaptcha.monochrome(self.captcha.channel(220, 227))
        regions = decaptcha.regions(sample, threshold=1)
        self.assertTrue(regions == EXPECTED_REGIONS,
                        "Expected regions %s, instead got %s"
                        % (EXPECTED_REGIONS, regions))

    def test_segmentation(self):
        """Note, in the test cases of the original publication
        (http://tinyurl.com/phvggox), output segments #3 and #5 (which
        are both the number 9) are mistakenly swapped.
        """
        sample = decaptcha.monochrome(self.captcha.channel(220, 227))
        segments = self.captcha.segments(sample, crop=False, tolerance=1)
        for i, segment in enumerate(segments):
            EXPECTED_SEGMENT = Image.open(TEST_SEGMENT_IMG(i+1))
            self.assertTrue(segment.histogram() ==
                            EXPECTED_SEGMENT.histogram(),
                            "Segment #%s is wrong." % (i+1))

    def test_guess_character(self):
        sample = decaptcha.monochrome(self.captcha.channel(220, 227))
        regions = decaptcha.regions(sample)
        segment = self.captcha.segment(sample, regions[0], crop=False)
        exp_seg = Image.open(TEST_SEGMENT_IMG(1))
        self.assertTrue(segment.histogram() == exp_seg.histogram(),
                        "Segment is wrong")
        predictions = self.captcha.guess_character(segment)
        self.assertTrue(predictions[0] == EXPECTED_OUTPUT,
                        "Expected %s, got %s" %
                        (EXPECTED_OUTPUT, predictions[0]))

    def test_decoder(self):
        self.captcha.decode()
Download .txt
gitextract_0h0kn1hl/

├── .gitignore
├── .travis.yml
├── AUTHORS
├── CHANGES
├── LICENSE
├── MANIFEST.in
├── README.md
├── decaptcha/
│   ├── __init__.py
│   ├── cli.py
│   └── decoder.py
├── setup.cfg
├── setup.py
└── tests/
    ├── __init__.py
    └── test_captcha.py
Download .txt
SYMBOL INDEX (33 symbols across 4 files)

FILE: decaptcha/cli.py
  function threshold (line 8) | def threshold(x):
  function argparser (line 16) | def argparser():
  function prettyprint (line 44) | def prettyprint(guesses):
  function main (line 53) | def main():

FILE: decaptcha/decoder.py
  function imageset (line 37) | def imageset():
  function trim (line 47) | def trim(im, color=WHITE):
  function channel (line 55) | def channel(im, *colors, **kwargs):
  function monochrome (line 71) | def monochrome(im, threshold=255):
  function regions (line 77) | def regions(im, threshold=1):
  function similarity (line 97) | def similarity(im1, im2, equalize=False):
  class Captcha (line 137) | class Captcha(object):
    method __init__ (line 139) | def __init__(self, imgpath):
    method im (line 143) | def im(self):
    method histogram (line 152) | def histogram(self):
    method decode (line 156) | def decode(self, channels=3, limit=3, threshold=0, tolerance=3,
    method prominant_colors (line 176) | def prominant_colors(self, n=5, _min=0, _max=256):
    method channel (line 193) | def channel(self, *colors, **kwargs):
    method gif (line 202) | def gif(im):
    method segments (line 209) | def segments(cls, im, tolerance=3, crop=True):
    method segment (line 215) | def segment(cls, im, region, crop=True):
    method guess_character (line 224) | def guess_character(im, threshold=0, limit=None):
  function decode (line 243) | def decode(captcha, channels=1, limit=3, threshold=0, tolerance=3,

FILE: setup.py
  function read (line 17) | def read(*parts):
  function find_version (line 25) | def find_version(*file_paths):

FILE: tests/test_captcha.py
  class CaptchaDecoderTest (line 47) | class CaptchaDecoderTest(unittest.TestCase):
    method setUp (line 49) | def setUp(self):
    method test_histogram (line 52) | def test_histogram(self):
    method test_prominant_colors (line 56) | def test_prominant_colors(self):
    method test_channels (line 61) | def test_channels(self):
    method test_regions (line 68) | def test_regions(self):
    method test_segmentation (line 75) | def test_segmentation(self):
    method test_guess_character (line 88) | def test_guess_character(self):
    method test_decoder (line 100) | def test_decoder(self):
Condensed preview — 14 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (45K chars).
[
  {
    "path": ".gitignore",
    "chars": 272,
    "preview": "*~\ndecaptcha/output/*\n\n*.py[co]\n\n# Packages\n*.egg\n*.egg-info\ndist\nbuild\neggs\nparts\nbin\nvar\nsdist\ndevelop-eggs\n.installed"
  },
  {
    "path": ".travis.yml",
    "chars": 326,
    "preview": "language: python\npython:\n  - \"2.7\"\n  - \"3.4\"\nmatrix:\n  allow_failures:\n    - python: \"2.7\"\ninstall:\n  - pip install flak"
  },
  {
    "path": "AUTHORS",
    "chars": 174,
    "preview": "Mek <michael.karpeles@gmail.com> @mekarpeles (maintainer)\nBen Boyter <bboyte01@gmail.com> @boyter (original author)\nNate"
  },
  {
    "path": "CHANGES",
    "chars": 53,
    "preview": "v0.0.1, Tue Jul 28 17:43:00 2015 -- Initial Release.\n"
  },
  {
    "path": "LICENSE",
    "chars": 20603,
    "preview": "THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS\nCREATIVE COMMONS PUBLIC LICENSE (\"CCPL\" OR \"LICENSE\"). T"
  },
  {
    "path": "MANIFEST.in",
    "chars": 67,
    "preview": "recursive-include decaptcha/iconset *\nrecursive-include examples *\n"
  },
  {
    "path": "README.md",
    "chars": 3534,
    "preview": "captcha-decoder\n===============\n\n![Build Status](https://travis-ci.org/mekarpeles/captcha-decoder.png)\n\nThis module take"
  },
  {
    "path": "decaptcha/__init__.py",
    "chars": 482,
    "preview": "# -*- coding: utf-8 -*-\n\n\"\"\"\n   decaptcha\n   ~~~~~~~~~\n\n   Basic Captcha Cracker\n\"\"\"\n\n__version__ = '0.0.1'\n__author__ ="
  },
  {
    "path": "decaptcha/cli.py",
    "chars": 2710,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport argparse\nfrom . import Captcha, __title__, __version__\n\n\ndef thres"
  },
  {
    "path": "decaptcha/decoder.py",
    "chars": 8639,
    "preview": "# -*- coding: utf-8 -*-\r\n\r\n\"\"\"\r\n    crack.py\r\n    ~~~~~~~~\r\n\r\n    This module takes captcha images as input and partitio"
  },
  {
    "path": "setup.cfg",
    "chars": 26,
    "preview": "[bdist_wheel]\nuniversal=1\n"
  },
  {
    "path": "setup.py",
    "chars": 1656,
    "preview": "# -*- coding: utf-8 -*-\n\n\"\"\"\n    decaptcha\n    ~~~~~~~~~\n\n\"\"\"\n\nimport codecs\nimport os\nimport re\nfrom setuptools import "
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/test_captcha.py",
    "chars": 4219,
    "preview": "# -*- coding: utf-8 -*-\n\n\"\"\"\n    tests\n    ~~~~~\n    Test cases for the decaptcha package\n\n    :copyright: (c) 2012 by M"
  }
]

About this extraction

This page contains the full source code of the mekarpeles/captcha-decoder GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 14 files (41.8 KB), approximately 10.6k tokens, and a symbol index with 33 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!