Full Code of aboSamoor/pycld2 for AI

master bc9d269f603d cached
123 files
100.6 MB
12.6M tokens
421 symbols
1 requests
Copy disabled (too large) Download .txt
Showing preview only (53,641K chars total). Download the full file to get everything.
Repository: aboSamoor/pycld2
Branch: master
Commit: bc9d269f603d
Files: 123
Total size: 100.6 MB

Directory structure:
gitextract_ord_uu64/

├── .github/
│   └── workflows/
│       └── pythonpackage.yaml
├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── bindings/
│   ├── README
│   ├── encodings.cc
│   ├── gen_enc.py
│   ├── gen_test.py
│   ├── pycldmodule.cc
│   ├── test.py
│   └── test_shuffle.py
├── cld2/
│   ├── LICENSE
│   ├── docs/
│   │   ├── CLD2UnitTestFullOutput.html
│   │   ├── CLD2UnitTestOutput.html
│   │   ├── CLD2UnitTestOutputVerbose.html
│   │   ├── a_little_french_test_input.html
│   │   ├── evaluate_cld1_small_20110406.txt
│   │   ├── evaluate_cld2_large_20130720.txt
│   │   ├── evaluate_cld2_large_20140122.txt
│   │   ├── evaluate_cld2_small_20130715.txt
│   │   ├── evaluate_cld2_small_20140122.txt
│   │   ├── test_version.html
│   │   └── test_version.txt
│   ├── internal/
│   │   ├── cld2_do_score.cc
│   │   ├── cld2_dynamic_compat.h
│   │   ├── cld2_dynamic_data.cc
│   │   ├── cld2_dynamic_data.h
│   │   ├── cld2_dynamic_data_extractor.cc
│   │   ├── cld2_dynamic_data_extractor.h
│   │   ├── cld2_dynamic_data_loader.cc
│   │   ├── cld2_dynamic_data_loader.h
│   │   ├── cld2_dynamic_data_tool.cc
│   │   ├── cld2_generated_cjk_compatible.cc
│   │   ├── cld2_generated_deltaocta0122.cc
│   │   ├── cld2_generated_deltaocta0527.cc
│   │   ├── cld2_generated_deltaoctachrome.cc
│   │   ├── cld2_generated_deltaoctachrome0122.cc
│   │   ├── cld2_generated_deltaoctachrome0614.cc
│   │   ├── cld2_generated_distinctocta0122.cc
│   │   ├── cld2_generated_distinctocta0527.cc
│   │   ├── cld2_generated_distinctoctachrome.cc
│   │   ├── cld2_generated_distinctoctachrome0122.cc
│   │   ├── cld2_generated_distinctoctachrome0604.cc
│   │   ├── cld2_generated_octa2_dummy.cc
│   │   ├── cld2_generated_quad0122.cc
│   │   ├── cld2_generated_quad0720.cc
│   │   ├── cld2_generated_quadchrome0122_16.cc
│   │   ├── cld2_generated_quadchrome0122_19.cc
│   │   ├── cld2_generated_quadchrome0122_2.cc
│   │   ├── cld2_generated_quadchrome0715.cc
│   │   ├── cld2_generated_quadchrome_16.cc
│   │   ├── cld2_generated_quadchrome_2.cc
│   │   ├── cld2_unittest.cc
│   │   ├── cld2_unittest_full.cc
│   │   ├── cld2tablesummary.h
│   │   ├── cld_generated_cjk_delta_bi_32.cc
│   │   ├── cld_generated_cjk_delta_bi_4.cc
│   │   ├── cld_generated_cjk_uni_prop_80.cc
│   │   ├── cld_generated_score_quad_octa_0122.cc
│   │   ├── cld_generated_score_quad_octa_0122_2.cc
│   │   ├── cld_generated_score_quad_octa_1024_256.cc
│   │   ├── cld_generated_score_quad_octa_2.cc
│   │   ├── cldutil.cc
│   │   ├── cldutil.h
│   │   ├── cldutil_offline.cc
│   │   ├── cldutil_offline.h
│   │   ├── cldutil_shared.cc
│   │   ├── cldutil_shared.h
│   │   ├── clean.sh
│   │   ├── compact_lang_det.cc
│   │   ├── compact_lang_det_hint_code.cc
│   │   ├── compact_lang_det_hint_code.h
│   │   ├── compact_lang_det_impl.cc
│   │   ├── compact_lang_det_impl.h
│   │   ├── compact_lang_det_test.cc
│   │   ├── compile.sh
│   │   ├── compile_and_test_all.sh
│   │   ├── compile_dynamic.sh
│   │   ├── compile_full.sh
│   │   ├── compile_libs.sh
│   │   ├── debug.cc
│   │   ├── debug.h
│   │   ├── debug_empty.cc
│   │   ├── fixunicodevalue.cc
│   │   ├── fixunicodevalue.h
│   │   ├── generated_distinct_bi_0.cc
│   │   ├── generated_entities.cc
│   │   ├── generated_language.cc
│   │   ├── generated_language.h
│   │   ├── generated_ulscript.cc
│   │   ├── generated_ulscript.h
│   │   ├── getonescriptspan.cc
│   │   ├── getonescriptspan.h
│   │   ├── integral_types.h
│   │   ├── lang_script.cc
│   │   ├── lang_script.h
│   │   ├── langspan.h
│   │   ├── offsetmap.cc
│   │   ├── offsetmap.h
│   │   ├── port.h
│   │   ├── scoreonescriptspan.cc
│   │   ├── scoreonescriptspan.h
│   │   ├── scoreutf8text.cc
│   │   ├── stringpiece.h
│   │   ├── tote.cc
│   │   ├── tote.h
│   │   ├── unittest_data.h
│   │   ├── utf8acceptinterchange.h
│   │   ├── utf8prop_lettermarkscriptnum.h
│   │   ├── utf8repl_lettermarklower.h
│   │   ├── utf8scannot_lettermarkspecial.h
│   │   ├── utf8statetable.cc
│   │   └── utf8statetable.h
│   └── public/
│       ├── compact_lang_det.h
│       └── encodings.h
├── pycld2/
│   └── __init__.py
├── requirements.txt
├── setup.cfg
├── setup.py
└── test_pycld2.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/pythonpackage.yaml
================================================
name: Python Package

on:
  push:
    branches:
      - "**"
    tags:
      - "v*.*.*"

permissions:
  id-token: write  # Enables OIDC
  contents: read

jobs:
  tests:
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-22.04, macos-latest, windows-latest]
        python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']

    name: Test ${{ matrix.os }} Python ${{ matrix.python-version }}
    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip setuptools wheel
      - name: Build module
        run: |
          python setup.py build_ext -i
      - name: Run tests
        run: |
          python test_pycld2.py

  build-wheels:
    needs: [tests]

    strategy:
      matrix:
        os: [ubuntu-22.04, macos-latest, windows-latest]
        arch: [auto64]
        include:
          - os: ubuntu-22.04
            arch: aarch64

    name: Build wheels for ${{ matrix.os }}
    runs-on: ${{ matrix.os }}

    steps:
      - uses: actions/checkout@v4

      - name: Set up QEMU
        if: matrix.arch == 'aarch64'
        uses: docker/setup-qemu-action@v3

      - uses: actions/setup-python@v5
        with:
          python-version: '3.8'

      - name: Install dependencies
        run: |
          python -m pip install -U pip setuptools wheel cibuildwheel

      - name: Build wheels
        run: |
          python -m cibuildwheel --output-dir wheelhouse
        env:
          CIBW_PRERELEASE_PYTHONS: "false"
          CIBW_SKIP: "pp*"

      - uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.os }}-${{ matrix.arch }}
          path: wheelhouse/*.whl

  build-sdist:
    needs: [tests]

    name: Build sdist
    runs-on: ubuntu-22.04

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: |
          python -m pip install -U pip setuptools wheel build

      - name: Build sdist
        run: |
          python -m build --sdist

      - uses: actions/upload-artifact@v4
        with:
          name: sdist
          path: dist/*.tar.gz

  upload-pypi:
    needs: [build-wheels, build-sdist]
    runs-on: ubuntu-22.04
    if: startsWith(github.ref, 'refs/tags/v')

    steps:
      - uses: actions/download-artifact@v4
        with:
          pattern: '*'
          path: dist
          merge-multiple: true

      - name: Publish package to PyPI (OIDC)
        uses: pypa/gh-action-pypi-publish@release/v1

================================================
FILE: .gitignore
================================================
*.py[cod]

# C extensions
*.so

# Packages
*.egg
*.egg-info
dist
build
eggs
parts
bin
var
sdist
develop-eggs
.installed.cfg
lib
lib64

# Installer logs
pip-log.txt

# Unit test / coverage reports
.coverage
.tox
nosetests.xml
htmlcov

# Translations
*.mo

# Mr Developer
.mr.developer.cfg
.project
.pydevproject

# Complexity
output/*.html
output/*/index.html

# Sphinx
docs/_build

================================================
FILE: .travis.yml
================================================
# Config file for automatic testing at travis-ci.org

language: python

python:
  - "3.7"
  - "3.6"
  - "3.5"
  - "3.4"
  - "2.7"
#  - "pypy"

# command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
install:
  - pip install -r requirements.txt
  - python setup.py install

# command to run tests, e.g. python setup.py test
script:
  - python setup.py build_ext --inplace
  - nosetests


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2019 Michael McCandless, Rami Al-Rfou, Brad Solomon, Google

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
include cld2/internal/*.cc
include cld2/internal/*.h
include cld2/public/*.h
include bindings/*
include README*
include requirements.txt
include Makefile
include test_pycld2.py


================================================
FILE: Makefile
================================================
clean:
	rm -rf build/
	rm -rf dist/
	rm -rf pycld2.egg-info/
	rm -rf pycld2/__pycache__/
	rm -f pycld2/_pycld2*.so

build:
	python setup.py build
	python setup.py install

dist:
	make clean
	python setup.py sdist
	twine upload dist/*

dist-test:
	make clean
	python setup.py sdist
	twine upload --repository-url https://test.pypi.org/legacy/ dist/*


================================================
FILE: README.md
================================================
# PYCLD2 - Python Bindings to CLD2

Python bindings for the Compact Langauge Detect 2 (CLD2).

[![Downloads](https://img.shields.io/pypi/dm/pycld2.svg)](https://pypi.python.org/pypi/pycld2)
[![Latest version](https://img.shields.io/pypi/v/pycld2.svg)](https://pypi.python.org/pypi/pycld2)
[![Supported Python versions](https://img.shields.io/pypi/pyversions/pycld2.svg)](https://pypi.python.org/pypi/pycld2)
[![Development Status](https://img.shields.io/pypi/status/pycld2.svg)](https://pypi.python.org/pypi/pycld2)
[![Download format](https://img.shields.io/pypi/format/pycld2.svg)](https://pypi.python.org/pypi/pycld2)
[![Build status](https://travis-ci.org/aboSamoor/pycld2.png?branch=master)](https://travis-ci.org/aboSamoor/pycld2)

This package contains forks of:

- The [`cld2` C++ library](https://github.com/CLD2Owners/cld2), developed by Dick Sites
- The [`chromium-compact-language-detector` C++ extension module](https://github.com/mikemccand/chromium-compact-language-detector),
  originally created by Mike McCandless, which has been modified post-fork.
  These bindings, among other changes, make the support of over 165 languages
  the default.

The goal of this project is to consolidate the upstream library with its bindings, so the user can `pip install` one package instead of two.

The LICENSE is the same as Chromium's LICENSE and is included in the
LICENSE file for reference.

## Installing

```bash
$ python -m pip install -U pycld2
```

## Example

```python
import pycld2 as cld2

isReliable, textBytesFound, details = cld2.detect(
    "а неправильный формат идентификатора дн назад"
)

print(isReliable)
# True
details[0]
# ('RUSSIAN', 'ru', 98, 404.0)

fr_en_Latn = """\
France is the largest country in Western Europe and the third-largest in Europe as a whole.
A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections
et exporter Cet article concerne le pays européen aujourd’hui appelé République française.
Pour d’autres usages du nom France, Pour une aide rapide et effective, veuiller trouver votre aide
dans le menu ci-dessus.
Motoring events began soon after the construction of the first successful gasoline-fueled automobiles.
The quick brown fox jumped over the lazy dog."""

isReliable, textBytesFound, details, vectors = cld2.detect(
    fr_en_Latn, returnVectors=True
)
print(vectors)
# ((0, 94, 'ENGLISH', 'en'), (94, 329, 'FRENCH', 'fr'), (423, 139, 'ENGLISH', 'en'))
```

## API

This package exports one function, `detect()`. See `help(detect)` for the full docstring.

The first parameter (`utf8Bytes`) is the text for which you want to detect language.

`utf8Bytes` may be either:

- `str` (example: `"¼ cup of flour"`)
- `bytes` that have been encoded using UTF-8 (example: `"¼ cup of flour".encode("utf-8")`)

Bytes that are *not* UTF-8 encoded will raise a `pycld2.error`.  For example, passing
b"\xbc cup of flour" (which is `"¼ cup of flour".encode("latin-1")`) will raise.

All other parameters are optional:

| Parameter | Type/Default | Use |
| --------- | ------------ | --- |
| `utf8Bytes` | `str` or `bytes`\* | The text to detect language for. |
| `isPlainText` | `bool`, default `False` | If `False`, then the input is HTML and CLD will skip HTML tags, expand HTML entities, detect HTML `<lang ...>` tags, etc. |
| `hintTopLevelDomain` | `str` | E.g., `'id'` boosts Indonesian. |
| `hintLanguage` | `str` | E.g., `'ITALIAN'` or `'it'` boosts Italian; see `cld.LANGUAGES` for all known languages. |
| `hintLanguageHTTPHeaders` | `str` | E.g., `'mi,en'` boosts Maori and English. |
| `hintEncoding` | `str` | E.g, `'SJS'` boosts Japanese; see `cld.ENCODINGS` for all known encodings. |
| `returnVectors` |  `bool`, default `False` | If `True`, then the vectors indicating which language was detected in which byte range are returned in addition to details.  The vectors are a sequence of `(bytesOffset, bytesLength, languageName, languageCode)`, in order. `bytesOffset` is the start of the vector, `bytesLength `is the length of the vector.  Note that there is some added CPU cost if this is True.  (Approx. 2x performance hit.) |
| `debugScoreAsQuads` | `bool`, default `False` | Normally, several languages are detected solely by their Unicode script.  Combined with appropritate lookup tables, this flag forces them instead to be detected via quadgrams. This can be a useful refinement when looking for meaningful text in these languages, instead of just character sets. The default tables do not support this use. |
| `debugHTML` | `bool`, default `False` | For each detection call, write an HTML file to stderr, showing the text chunks and their detected languages. See `cld2/docs/InterpretingCLD2UnitTestOutput.pdf` to interpret this output. |
| `debugCR` | `bool`, default `False` | In that HTML file, force a new line for each chunk. |
| `debugVerbose` | `bool`, default `False` | In that HTML file, show every lookup entry. |
| `debugQuiet` | `bool`, default `False` | In that HTML file, suppress most of the output detail. |
| `debugEcho` | `bool`, default `False` | Echo every input buffer to stderr. |
| `bestEffort` | `bool`, default `False` | If `True`, then allow low-quality results for short text, rather than forcing the result to `"UNKNOWN_LANGUAGE"`.  This may be of use for those desiring approximate results on short input text, but there is no claim that these result are very good. |

<sup>\*If `bytes`, must be UTF-8 encoded bytes.</sup>

## Constants

This package exports these global constants:

| Constant | Description |
| -------- | ----------- |
| `pycld2.ENCODINGS` | list of the encoding names CLD recognizes (if you provide `hintEncoding`, it must be one of these names). |
| `pycld2.LANGUAGES` | list of languages and their codes (if you provide `hintLanguageCode`, it must be one of the codes from these codes). |
| `pycld2.EXTERNAL_LANGUAGES` | list of external languages and their codes. |
| `pycld2.DETECTED_LANGUAGES` | list of all detectable languages. |

## What About CLD3?

Python bindings for [`CLD3`](https://github.com/google/cld3) are available over here [`gcld3`](https://pypi.org/project/gcld3/).


================================================
FILE: bindings/README
================================================
Dick Sites (and others) at Google graciously provided a new version
2.0 of the compact language detector, here:

  https://code.google.com/p/cld2/

and I (lucene@mikemccandless.com) created the Python bindings and
ported the C++ test case to test.py.

This has been tested on Ubuntu 14.04, with both Python 2.7.6 and
3.4.0.

Updated Nov 11 2014 to the latest CLD2 release, adding new bestEffort
flag (to force a guess even when confidence is low), and cutting over
to the CLD2 methods that confirm incoming UTF-8 is valid.

To build:

  * First checkout cld2, cd internal, run compile_libs.sh.  This will
    create both libcld2.so (small tables, detects 83 languages) and
    libcld2_full.so (large tables, detects 163 languages).  Install
    those libraries somewhere on your LD_LIBRARY_PATH, for example
    copy them into /usr/lib.

  * Edit both setup.py and setup_full.py: change CLD2_PATH to point to
    where you checked out the CLD2 sources.

  * python setup.py build

  * python setup_full.py build

Note that all Python sources work with both python 2.x and 3.x so if
you want to install for python3.x just repeat the above steps using
python3 (or whatever python command runs python 3.x in your
environment).

To test both the small and full language tables:

  * python test.py

The test produces a lot of output, due to the test cases testing the
debug flags; this is normal.  As long as it says OK in the end then
the tests passed.

To install:

  * python setup.py install (as root)

  * python setup_full.py install (as root)

For documentation run:

  * python -c "import cld2; help(cld2.detect)"

NOTE: gen_test.py and gen_enc.py were used as temporary helpers during
development and are not needed for building

NOTE: you must pass only valid UTF-8 bytes to the detect function,
otherwise you can hit segmentation fault or get incorrect results.


================================================
FILE: bindings/encodings.cc
================================================
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

#include <stdio.h>
#include <string.h>  // Windows compat (vs. strings.h)
#include <ctype.h>

#include "compact_lang_det.h"
#include "encodings.h"

struct cld_encoding {
  const char *name;
  CLD2::Encoding encoding;
};

extern const cld_encoding cld_encoding_info[] = {
  {"ISO_8859_1", CLD2::ISO_8859_1},
  {"ISO_8859_2", CLD2::ISO_8859_2},
  {"ISO_8859_3", CLD2::ISO_8859_3},
  {"ISO_8859_4", CLD2::ISO_8859_4},
  {"ISO_8859_5", CLD2::ISO_8859_5},
  {"ISO_8859_6", CLD2::ISO_8859_6},
  {"ISO_8859_7", CLD2::ISO_8859_7},
  {"ISO_8859_8", CLD2::ISO_8859_8},
  {"ISO_8859_9", CLD2::ISO_8859_9},
  {"ISO_8859_10", CLD2::ISO_8859_10},
  {"JAPANESE_EUC_JP", CLD2::JAPANESE_EUC_JP},
  {"JAPANESE_SHIFT_JIS", CLD2::JAPANESE_SHIFT_JIS},
  {"JAPANESE_JIS", CLD2::JAPANESE_JIS},
  {"CHINESE_BIG5", CLD2::CHINESE_BIG5},
  {"CHINESE_GB", CLD2::CHINESE_GB},
  {"CHINESE_EUC_CN", CLD2::CHINESE_EUC_CN},
  {"KOREAN_EUC_KR", CLD2::KOREAN_EUC_KR},
  {"UNICODE_UNUSED", CLD2::UNICODE_UNUSED},
  {"CHINESE_EUC_DEC", CLD2::CHINESE_EUC_DEC},
  {"CHINESE_CNS", CLD2::CHINESE_CNS},
  {"CHINESE_BIG5_CP950", CLD2::CHINESE_BIG5_CP950},
  {"JAPANESE_CP932", CLD2::JAPANESE_CP932},
  {"UTF8", CLD2::UTF8},
  {"UNKNOWN_ENCODING", CLD2::UNKNOWN_ENCODING},
  {"ASCII_7BIT", CLD2::ASCII_7BIT},
  {"RUSSIAN_KOI8_R", CLD2::RUSSIAN_KOI8_R},
  {"RUSSIAN_CP1251", CLD2::RUSSIAN_CP1251},
  {"MSFT_CP1252", CLD2::MSFT_CP1252},
  {"RUSSIAN_KOI8_RU", CLD2::RUSSIAN_KOI8_RU},
  {"MSFT_CP1250", CLD2::MSFT_CP1250},
  {"ISO_8859_15", CLD2::ISO_8859_15},
  {"MSFT_CP1254", CLD2::MSFT_CP1254},
  {"MSFT_CP1257", CLD2::MSFT_CP1257},
  {"ISO_8859_11", CLD2::ISO_8859_11},
  {"MSFT_CP874", CLD2::MSFT_CP874},
  {"MSFT_CP1256", CLD2::MSFT_CP1256},
  {"MSFT_CP1255", CLD2::MSFT_CP1255},
  {"ISO_8859_8_I", CLD2::ISO_8859_8_I},
  {"HEBREW_VISUAL", CLD2::HEBREW_VISUAL},
  {"CZECH_CP852", CLD2::CZECH_CP852},
  {"CZECH_CSN_369103", CLD2::CZECH_CSN_369103},
  {"MSFT_CP1253", CLD2::MSFT_CP1253},
  {"RUSSIAN_CP866", CLD2::RUSSIAN_CP866},
  {"ISO_8859_13", CLD2::ISO_8859_13},
  {"ISO_2022_KR", CLD2::ISO_2022_KR},
  {"GBK", CLD2::GBK},
  {"GB18030", CLD2::GB18030},
  {"BIG5_HKSCS", CLD2::BIG5_HKSCS},
  {"ISO_2022_CN", CLD2::ISO_2022_CN},
  {"TSCII", CLD2::TSCII},
  {"TAMIL_MONO", CLD2::TAMIL_MONO},
  {"TAMIL_BI", CLD2::TAMIL_BI},
  {"JAGRAN", CLD2::JAGRAN},
  {"MACINTOSH_ROMAN", CLD2::MACINTOSH_ROMAN},
  {"UTF7", CLD2::UTF7},
  {"BHASKAR", CLD2::BHASKAR},
  {"HTCHANAKYA", CLD2::HTCHANAKYA},
  {"UTF16BE", CLD2::UTF16BE},
  {"UTF16LE", CLD2::UTF16LE},
  {"UTF32BE", CLD2::UTF32BE},
  {"UTF32LE", CLD2::UTF32LE},
  {"BINARYENC", CLD2::BINARYENC},
  {"HZ_GB_2312", CLD2::HZ_GB_2312},
  {"UTF8UTF8", CLD2::UTF8UTF8},
  {"TAM_ELANGO", CLD2::TAM_ELANGO},
  {"TAM_LTTMBARANI", CLD2::TAM_LTTMBARANI},
  {"TAM_SHREE", CLD2::TAM_SHREE},
  {"TAM_TBOOMIS", CLD2::TAM_TBOOMIS},
  {"TAM_TMNEWS", CLD2::TAM_TMNEWS},
  {"TAM_WEBTAMIL", CLD2::TAM_WEBTAMIL},
  {"KDDI_SHIFT_JIS", CLD2::KDDI_SHIFT_JIS},
  {"DOCOMO_SHIFT_JIS", CLD2::DOCOMO_SHIFT_JIS},
  {"SOFTBANK_SHIFT_JIS", CLD2::SOFTBANK_SHIFT_JIS},
  {"KDDI_ISO_2022_JP", CLD2::KDDI_ISO_2022_JP},
  {"SOFTBANK_ISO_2022_JP", CLD2::SOFTBANK_ISO_2022_JP},
};

// Re-written (simplified) from BSD strings.h
inline int
strcasecmp(const char *s1, const char *s2)
{
  for (;;) {
    int c1 = tolower(*((unsigned char *) s1++));
    int c2 = tolower(*((unsigned char *) s2++));
    if ((c1 != c2) || (c1 == '\0')) {
      return c1 - c2;
    }
  }
  return 0;
}

CLD2::Encoding EncodingFromName(const char *name) {
  for (int i = 0; i < CLD2::NUM_ENCODINGS; i++) {
    if (!strcasecmp(cld_encoding_info[i].name, name)) {
      return cld_encoding_info[i].encoding;
    }
  }

  return CLD2::UNKNOWN_ENCODING;
}


================================================
FILE: bindings/gen_enc.py
================================================
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Generates encodings.cc from ../../public/encodings.h

import re

s = open('../../public/encodings.h').read()


r = re.compile('\s*(.*?)\s+=\s*(\d+),')
l = []
for line in s.split('\n'):
  line = line.strip()
  m = r.match(line)
  if m is not None:
    l.append((m.group(1), int(m.group(2))))

print()
print('''struct cld_encoding {
  const char# name;
  CLD2::Encoding encoding;
};''')

print('const cld_encoding cld_encoding_info[] = {')

for k, v in l:
  print('  {"%s", CLD2::%s},' % (k, k))
print('};')



================================================
FILE: bindings/gen_test.py
================================================
# coding=utf-8

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Generates test.py from ../../internal/cld2_unittest.cc and ../../internal/unittest_data.h

import re

# NOTE: this generates just a starting point; I had to fixup a few
# tests by hand for silly diffs like Korean vs KOREAN

CLD_PATH = '../cld2'

r = re.compile(r'const char\* (.*?)\s+=\s+"(.*?)";')
testData = {}
f = open('%s/internal/unittest_data.h' % CLD_PATH)
for line in f.readlines():
  if line.find('#else') != -1:
    break
  m = r.search(line)
  if m is not None:
    testData[m.group(1).strip()] = m.group(2)
f.close()

# Carried over from internal/cld2_unittest.cc:

testData['kTeststr_en'] = 'confiscation of goods is assigned as the penalty part most of the courts consist of members and when it is necessary to bring public cases before a jury of members two courts combine for the purpose the most important cases of all are brought jurors or'

#testData['kTeststr_ks'] = 'नेपाल एसिया मंज अख मुलुक राजधानी काठ माडौं नेपाल अधिराज्य पेरेग्वाय दक्षिण अमेरिका महाद्वीपे मध् यक्षेत्रे एक देश अस् ति फणीश्वर नाथ रेणु फिजी छु दक्षिण प्रशान् त महासागर मंज अख देश बहामास छु केरेबियन मंज अख मुलुख राजधानी नसौ सम् बद्घ विषय बुरुंडी अफ्रीका महाद्वीपे मध् यक्षेत्रे देश अस् ति सम् बद्घ विषय'

# Manually extracted from internal/unittest_data.h:
#testData['kTeststr_fr_en_Latn'] = 'A acc\xC3\xA8s aux chiens et aux frontaux qui lui ont \xC3\xA9t\xC3\xA9 il peut consulter et modifier ses collections et exporter This article is about the country. France is the largest country in Western Europe and the third-largest in Europe as a whole. Cet article concerne le pays europ\xC3\xA9\x65n aujourd\xE2\x80\x99hui appel\xC3\xA9 R\xC3\xA9publique fran\xC3\xA7\x61ise. Pour d\xE2\x80\x99\x61utres usages du nom France, Motoring events began soon after the construction of the first successful gasoline-fueled automobiles. The quick brown fox jumped over the lazy dog'
testData['kTeststr_fr_en_Latn'] = "France is the largest country in Western Europe and the third-largest in Europe as a whole. " + \
                                  "A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter " + \
                                  "Cet article concerne le pays européen aujourd’hui appelé République française. Pour d’autres usages du nom France, " + \
                                  "Pour une aide rapide et effective, veuiller trouver votre aide dans le menu ci-dessus. " + \
                                  "Motoring events began soon after the construction of the first successful gasoline-fueled automobiles. The quick brown fox jumped over the lazy"

r = re.compile(r'\{(.*?), (.*?)\},')

count = 0

doFull = True

if doFull:
  f = open('%s/internal/cld2_unittest_full.cc' % CLD_PATH)
else:
  f = open('%s/internal/cld2_unittest.cc' % CLD_PATH)

small = set()
if doFull:
  import test
  for lang, data in test.testData:
    small.add((lang, data))

reComment = re.compile(r'/\*.*?\*/')

langs = set()
for line in f.readlines():
  m = r.search(line)
  if m is not None and not line.strip().startswith('//'):
    lang = m.group(1)
    if lang == 'UNKNOWN_LANGUAGE':
      break

    lang = reComment.sub('', lang).strip()

    if lang == 'CHINESE':
      lang = 'Chinese'
    elif lang == 'CHINESE_T':
      lang = 'ChineseT'
    elif lang == 'JAPANESE':
      lang = 'Japanese'
    elif lang == 'KOREAN':
      lang = 'Korean'

    testDataVar = m.group(2).strip()
    s = testData[testDataVar].replace('\'', '\\\'')
    if not doFull or (lang, s) not in small:
      print('  (\'%s\', \'%s\'),' % (lang, s))
      count += 1

    langs.add(lang)
    
    if False:
      print('')
      print('  def test_%s(self):' % testDataVar[9:])
      print('    self.runOne(\'%s\', \'%s\')' % (lang, testData[testDataVar].replace('\'', '\\\'')))

print('%d langs, %d cases' % (len(langs), count))



================================================
FILE: bindings/pycldmodule.cc
================================================
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <string.h> // Windows compat (vs. strings.h)

#if PY_MAJOR_VERSION >= 3
#define IS_PY3K
#endif

// From ../cld2/public/
#include "compact_lang_det.h"
#include "encodings.h"

// From ../cld2/internal/
#include "lang_script.h"

// The version of the Python bindings, which gets set to _pycld2.__version__.
// For a version of CLD2 itself, see CLD2::DetectLanguageVersion().
#define PYCLD2_VERSION "0.42"

// Implementation is in ./encodings.cc
CLD2::Encoding EncodingFromName(const char *name);

struct cld_encoding {
  const char *name;
  CLD2::Encoding encoding;
};

extern const cld_encoding cld_encoding_info[];
namespace CLD2 {
  extern const int kNameToLanguageSize;
  extern const CharIntPair kNameToLanguage[];
}

struct PYCLDState {
  PyObject *error;
};

#ifdef IS_PY3K
#define GETSTATE(m) ((struct PYCLDState*)PyModule_GetState(m))
#else
#define GETSTATE(m) (&_state)
static struct PYCLDState _state;
#endif

static PyObject *
detect(PyObject *self, PyObject *args, PyObject *kwArgs)
{
  const char *bytes = NULL;
  Py_ssize_t numBytes = 0;
  PyObject *inputBytes;

  CLD2::CLDHints cldHints;
  cldHints.tld_hint = 0;
  cldHints.content_language_hint = 0;

  int isPlainText = 0;
  const char *hintLanguage = 0;
  const char *hintEncoding = 0;

  int returnVectors = 0;

  int flagScoreAsQuads = 0;
  int flagHTML = 0;
  int flagCR = 0;
  int flagVerbose = 0;
  int flagQuiet = 0;
  int flagEcho = 0;
  int flagBestEffort = 0;

  static const char *kwList[] = {
    "utf8Bytes", "isPlainText", "hintTopLevelDomain", "hintLanguage",
    "hintLanguageHTTPHeaders", "hintEncoding", "returnVectors",
    "debugScoreAsQuads", "debugHTML", "debugCR", "debugVerbose",
    "debugQuiet", "debugEcho", "bestEffort", NULL
  };

  if (!PyArg_ParseTupleAndKeywords(args,
                                   kwArgs,
                                   "O|izzzziiiiiiii",
                                   (char **) kwList,
                                   &inputBytes,
                                   &isPlainText,
                                   &cldHints.tld_hint,
                                   &hintLanguage,
                                   &cldHints.content_language_hint,
                                   &hintEncoding,
                                   &returnVectors,
                                   &flagScoreAsQuads,
                                   &flagHTML,
                                   &flagCR,
                                   &flagVerbose,
                                   &flagQuiet,
                                   &flagEcho,
                                   &flagBestEffort)) {
    return NULL;
  }

  // Support both text (Unicode) and bytes objects.
  if (PyUnicode_Check(inputBytes)) {
      bytes = PyUnicode_AsUTF8AndSize(inputBytes, &numBytes);
      if (bytes == NULL)
          return NULL;
  } else if (PyBytes_Check(inputBytes)) {
      if (PyBytes_AsStringAndSize(inputBytes, (char **) &bytes, &numBytes) == -1)
          return NULL;
  } else {
      PyErr_SetString(PyExc_TypeError, "utf8Bytes must be str or bytes");
      return NULL;
  }

  int flags = 0;
  if (flagScoreAsQuads != 0) {
    flags |= CLD2::kCLDFlagScoreAsQuads;
  }
  if (flagHTML != 0) {
    flags |= CLD2::kCLDFlagHtml;
  }
  if (flagCR != 0) {
    flags |= CLD2::kCLDFlagCr;
  }
  if (flagVerbose != 0) {
    flags |= CLD2::kCLDFlagVerbose;
  }
  if (flagQuiet != 0) {
    flags |= CLD2::kCLDFlagQuiet;
  }
  if (flagEcho != 0) {
    flags |= CLD2::kCLDFlagEcho;
  }
  if (flagBestEffort != 0) {
    flags |= CLD2::kCLDFlagBestEffort;
  }

  PyObject *CLDError = GETSTATE(self)->error;

  if (hintLanguage == 0) {
    cldHints.language_hint = CLD2::UNKNOWN_LANGUAGE;
  }
  else {
    cldHints.language_hint = CLD2::GetLanguageFromName(hintLanguage);
    if (cldHints.language_hint == CLD2::UNKNOWN_LANGUAGE) {
      PyErr_Format(CLDError,
                      "Unrecognized language hint '%s' not in cld.LANGUAGES",
                      hintLanguage);
      return NULL;
    }
  }

  if (hintEncoding == 0) {
    cldHints.encoding_hint = CLD2::UNKNOWN_ENCODING;
  }
  else {
    cldHints.encoding_hint = EncodingFromName(hintEncoding);
    if (cldHints.encoding_hint == CLD2::UNKNOWN_ENCODING) {
      PyErr_Format(CLDError,
                   "Unrecognized encoding hint '%s' not in cld.ENCODINGS",
                   hintEncoding);
      return NULL;
    }
  }

  bool isReliable;
  CLD2::Language language3[3];
  int percent3[3];
  double normalized_score3[3];
  int textBytesFound;
  int validPrefixBytes;
  CLD2::ResultChunkVector resultChunkVector;

  Py_BEGIN_ALLOW_THREADS
  CLD2::ExtDetectLanguageSummaryCheckUTF8(bytes,
                                          numBytes,
                                          isPlainText != 0,
                                          &cldHints,
                                          flags,
                                          language3,
                                          percent3,
                                          normalized_score3,
                                          returnVectors != 0 ? &resultChunkVector : 0,
                                          &textBytesFound,
                                          &isReliable,
                                          &validPrefixBytes);
  Py_END_ALLOW_THREADS

  if (validPrefixBytes < numBytes) {
    PyErr_Format(CLDError,
                 "input contains invalid UTF-8 around byte %d (of %d)",
                 validPrefixBytes,
                 numBytes);
    return NULL;
  }

  PyObject *details = PyTuple_New(3);
  for (Py_ssize_t idx = 0; idx < 3; idx++) {
    CLD2::Language lang = language3[idx];
    // Steals ref
    PyTuple_SET_ITEM(details,
                     idx,
                     Py_BuildValue("(ssif)",
                                   CLD2::LanguageName(lang),
                                   CLD2::LanguageCode(lang),
                                   percent3[idx],
                                   normalized_score3[idx]));
  }

  PyObject *result;

  if (returnVectors != 0) {
    PyObject *resultChunks = PyTuple_New(resultChunkVector.size());
    for (Py_ssize_t i = 0; i < resultChunkVector.size(); i++) {
      CLD2::ResultChunk chunk = resultChunkVector.at(i);
      CLD2::Language lang = static_cast<CLD2::Language>(chunk.lang1);
      // Steals ref
      PyTuple_SET_ITEM(resultChunks,
                       i,
                       Py_BuildValue("(iiss)",
                                     chunk.offset,
                                     chunk.bytes,
                                     CLD2::LanguageName(lang),
                                     CLD2::LanguageCode(lang)));
    }
    result = Py_BuildValue("(OiOO)",
                           isReliable ? Py_True : Py_False,
                           textBytesFound,
                           details,
                           resultChunks);
  }
  else {
    result = Py_BuildValue("(OiO)",
                           isReliable ? Py_True : Py_False,
                           textBytesFound,
                           details);
  }

  Py_DECREF(details);
  return result;
}

PyDoc_STRVAR(detect_doc,
"Detect language from str or UTF-8 encoded bytes.\n\
\n\
Arguments:\n\
\n\
    utf8Bytes: str or UTF-8 encoded bytes\n\
        The text to detect.  If this is not valid UTF-8, then a cld2.error is\n\
        raised.\n\
\n\
    isPlainText: bool, default False\n\
        If False, then the input is HTML and CLD will skip HTML tags,\n\
        expand HTML entities, detect HTML <lang ...> tags, etc.\n\
\n\
    hintTopLevelDomain: str\n\
        E.g., 'id' boosts Indonesian.\n\
\n\
    hintLanguage: str\n\
        E.g., 'ITALIAN' or 'it' boosts Italian; see cld.LANGUAGES for all\n\
        known languages.\n\
\n\
    hintLanguageHTTPHeaders: str\n\
        E.g., 'mi,en' boosts Maori and English.\n\
\n\
    hintEncoding: str\n\
        E.g, 'SJS' boosts Japanese; see cld.ENCODINGS for all known\n\
        encodings.\n\
\n\
    returnVectors:  bool, default False\n\
        If True, then the vectors indicating which language was detected in\n\
        which byte range are returned in addition to details.  The vectors are\n\
        a sequence of (bytesOffset, bytesLength, languageName, languageCode),\n\
        in order. bytesOffset is the start of the vector, bytesLength is the\n\
        length of the vector.  Note that there is some added CPU cost if this\n\
        is True.  (Approx. 2x performance hit.)\n\
\n\
    debugScoreAsQuads: bool, default False\n\
        Normally, several languages are detected solely by their Unicode\n\
        script.  Combined with appropritate lookup tables, this flag forces\n\
        them instead to be detected via quadgrams. This can be a useful\n\
        refinement when looking for meaningful text in these languages,\n\
        instead of just character sets. The default tables do not support\n\
        this use.\n\
\n\
    debugHTML: bool, default False\n\
        For each detection call, write an HTML file to stderr, showing the\n\
        text chunks and their detected languages.\n\
        See docs/InterpretingCLD2UnitTestOutput.pdf to interpret this output.\n\
\n\
    debugCR: bool, default False\n\
        In that HTML file, force a new line for each chunk.\n\
\n\
    debugVerbose: bool, default False\n\
        In that HTML file, show every lookup entry.\n\
\n\
    debugQuiet: bool, default False\n\
        In that HTML file, suppress most of the output detail.\n\
\n\
    debugEcho: bool, default False\n\
        Echo every input buffer to stderr.\n\
\n\
    bestEffort: bool, default False\n\
        If True, then allow low-quality results for short text, rather than\n\
        forcing the result to UNKNOWN_LANGUAGE.  This may be of use for\n\
        those desiring approximate results on short input text, but there\n\
        is no claim that these result are very good.\n\
\n\
  Returns: tuple\n\
\n\
    If returnVectors is False:\n\
\n\
        (isReliable, textBytesFound, details)\n\
\n\
    If returnVectors is True:\n\
\n\
        (isReliable, textBytesFound, details, vectors)\n\
\n\
    Where:\n\
\n\
    isReliable: bool\n\
        True if the detection is high confidence.\n\
\n\
    textBytesFound: int\n\
        Total number of bytes of text detected.\n\
\n\
    details: tuple\n\
        Tuple of up to three detected languages, where each is\n\
        tuple is (languageName, languageCode, percent, score).  percent is\n\
        what percentage of the original text was detected as this language\n\
        and score is the confidence score for that language.\n\
\n\
    vectors: tuple\n\
        Vectors indicating which language was detected in which byte range.\n\
");

static PyMethodDef CLDMethods[] = {
  {"detect", (PyCFunction)detect, METH_VARARGS|METH_KEYWORDS, detect_doc},
  {NULL, NULL, 0, NULL}  // Sentinel
};

#ifdef IS_PY3K

static int cld_traverse(PyObject *m, visitproc visit, void *arg) {
  Py_VISIT(GETSTATE(m)->error);
  return 0;
}

static int cld_clear(PyObject *m) {
  Py_CLEAR(GETSTATE(m)->error);
  return 0;
}

static struct PyModuleDef moduledef = {
  PyModuleDef_HEAD_INIT,                    // m_base
  "cld",                                    // m_name
  NULL,                                     // m_doc
  sizeof(struct PYCLDState),                // m_size
  CLDMethods,                               // m_methods
  NULL,                                     // m_slots
  cld_traverse,                             // m_traverse
  cld_clear,                                // m_clear
  NULL                                      // m_free
};

#define INITERROR return NULL

// In Python 3, initialization function must be named PyInit_name(),
// where 'name' is the name of the module, hence this module will be named.
// stdlib does the same thing, such as PyInit__heapq for _heapq.
// _pycld2.
PyMODINIT_FUNC
PyInit__pycld2(void)

#else  // IS_PY3K

#define INITERROR return

PyMODINIT_FUNC
init_pycld2()
#endif
{
#ifdef IS_PY3K
  PyObject *m = PyModule_Create(&moduledef);
#else
  PyObject* m = Py_InitModule("_pycld2", CLDMethods);
#endif

  if (m == NULL) {
    INITERROR;
  }

  struct PYCLDState *st = GETSTATE(m);

  // Python name for the exception is 'pycld2.error'
  st->error = PyErr_NewException("pycld2.error", NULL, NULL);
  if (st->error == NULL) {
    Py_DECREF(m);
    INITERROR;
  }

  // Set module-global ENCODINGS tuple
  PyObject* pyEncs = PyTuple_New(CLD2::NUM_ENCODINGS - 1);
  // Steals ref:
  PyModule_AddObject(m, "ENCODINGS", pyEncs);
  unsigned int upto = 0;
  for (Py_ssize_t encIDX = 0; encIDX < CLD2::NUM_ENCODINGS; encIDX++) {
    if (static_cast<CLD2::Encoding>(encIDX) != CLD2::UNKNOWN_ENCODING) {
      if (upto == PyTuple_Size(pyEncs)) {
        PyErr_SetString(st->error, "failed to initialize cld.ENCODINGS");
        INITERROR;
      }
      PyTuple_SET_ITEM(pyEncs,
                       upto++,
                       PyUnicode_FromString(cld_encoding_info[encIDX].name));
    }
  }

  if (upto != PyTuple_Size(pyEncs)) {
    PyErr_SetString(st->error, "failed to initialize cld.ENCODINGS");
    INITERROR;
  }

  // Set module-global LANGUAGES tuple
  PyObject* pyLangs = PyTuple_New(CLD2::kNameToLanguageSize - 1);
  // Steals ref:
  PyModule_AddObject(m, "LANGUAGES", pyLangs);
  upto = 0;
  for (Py_ssize_t i = 0; i < CLD2::kNameToLanguageSize; i++) {
    const char *name = CLD2::kNameToLanguage[i].s;
    if (strcmp(name, "Unknown")) {
      if (upto == PyTuple_Size(pyLangs)) {
        PyErr_SetString(st->error, "failed to initialize cld.LANGUAGES");
        INITERROR;
      }
      CLD2::Language lang = CLD2::GetLanguageFromName(name);
      if (lang == CLD2::UNKNOWN_LANGUAGE) {
        PyErr_SetString(st->error, "failed to initialize cld.LANGUAGES");
        INITERROR;
      }
      // Steals ref
      PyTuple_SET_ITEM(pyLangs,
                       upto++,
                       Py_BuildValue("(zz)",
                                     LanguageName(lang),
                                     LanguageCode(lang)));
    }
  }

  if (upto != PyTuple_Size(pyLangs)) {
    PyErr_SetString(st->error, "failed to initialize cld.LANGUAGES");
    INITERROR;
  }

// VERSION is the C lib version, such as 'V2.0 - 20140204'
#ifdef IS_PY3K
  // Steals ref
  PyModule_AddObject(m,
                     "VERSION",
                     PyUnicode_FromString(CLD2::DetectLanguageVersion()));
#else
  // Steals ref
  PyModule_AddObject(m,
                     "VERSION",
                     PyString_FromString(CLD2::DetectLanguageVersion()));

#endif
  PyModule_AddStringConstant(m, "__version__", PYCLD2_VERSION);

  // Set module-global DETECTED_LANGUAGES tuple
  upto = 0;
  PyObject* detLangs = PyTuple_New(165);
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ABKHAZIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AFAR"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AFRIKAANS"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AKAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ALBANIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AMHARIC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ARABIC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ARMENIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ASSAMESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AYMARA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("AZERBAIJANI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BASHKIR"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BASQUE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BELARUSIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BENGALI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BIHARI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BISLAMA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BOSNIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BRETON"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BULGARIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("BURMESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CATALAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CEBUANO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CHEROKEE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CORSICAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CROATIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("CZECH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("Chinese"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ChineseT"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("DANISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("DHIVEHI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("DUTCH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("DZONGKHA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ENGLISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ESPERANTO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ESTONIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("FAROESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("FIJIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("FINNISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("FRENCH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("FRISIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GALICIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GANDA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GEORGIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GERMAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GREEK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GREENLANDIC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GUARANI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("GUJARATI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HAITIAN_CREOLE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HAUSA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HAWAIIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HEBREW"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HINDI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HMONG"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("HUNGARIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ICELANDIC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("IGBO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("INDONESIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("INTERLINGUA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("INTERLINGUE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("INUKTITUT"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("INUPIAK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("IRISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ITALIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("JAVANESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("Japanese"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KANNADA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KASHMIRI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KAZAKH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KHASI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KHMER"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KINYARWANDA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KURDISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("KYRGYZ"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("Korean"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LAOTHIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LATIN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LATVIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LIMBU"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LINGALA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LITHUANIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("LUXEMBOURGISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MACEDONIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MALAGASY"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MALAY"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MALAYALAM"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MALTESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MANX"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MAORI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MARATHI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MAURITIAN_CREOLE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("MONGOLIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NAURU"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NDEBELE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NEPALI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NORWEGIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NORWEGIAN_N"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("NYANJA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("OCCITAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ORIYA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("OROMO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("PASHTO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("PEDI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("PERSIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("POLISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("PORTUGUESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("PUNJABI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("QUECHUA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("RHAETO_ROMANCE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ROMANIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("RUNDI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("RUSSIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SAMOAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SANGO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SANSKRIT"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SCOTS"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SCOTS_GAELIC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SERBIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SESELWA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SESOTHO"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SHONA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SINDHI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SINHALESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SISWANT"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SLOVAK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SLOVENIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SOMALI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SPANISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SUNDANESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SWAHILI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SWEDISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("SYRIAC"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TAGALOG"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TAJIK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TAMIL"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TATAR"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TELUGU"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("THAI"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TIBETAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TIGRINYA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TONGA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TSONGA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TSWANA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TURKISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("TURKMEN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("UIGHUR"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("UKRAINIAN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("URDU"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("UZBEK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("VENDA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("VIETNAMESE"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("VOLAPUK"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("WARAY_PHILIPPINES"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("WELSH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("WOLOF"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("XHOSA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("X_Buginese"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("X_Gothic"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("X_KLINGON"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("X_PIG_LATIN"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("YIDDISH"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("YORUBA"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ZHUANG"));
  PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("ZULU"));

  // Steals ref:
  PyModule_AddObject(m, "DETECTED_LANGUAGES", detLangs);

  if (upto != PyTuple_Size(detLangs)) {
    PyErr_SetString(st->error, "failed to initialize cld.DETECTED_LANGUAGES");
    INITERROR;
  }

  // Steals ref:
  PyModule_AddObject(m, "error", st->error);
#ifdef IS_PY3K
  return m;
#endif
}


================================================
FILE: bindings/test.py
================================================
# coding=utf-8

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

import os
import sys
import stat
import unittest
import traceback

# Get just major.minor version of currently running Python, ie 3.2.3
# -> 3.2:
version = sys.version.split()[0]
version = version[:version.rfind('.')]

# Find right .so under build:
moduleDir = None
for dir, subDirs, files in os.walk('build'):
  if dir.endswith(version) and dir.find('/lib') != -1:
    for file in files:
      if file.endswith('.so'):
        moduleDir = dir
        break
    
if moduleDir is None:
  raise RuntimeError('could not find built libcld.so or cld.cpython-32mu.so; be sure to first run python setup.py build')

sys.path.insert(0, moduleDir)

import cld2full
import cld2

VERBOSE = False

fr_en_Latn = 'France is the largest country in Western Europe and the third-largest in Europe as a whole. A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française. Pour d’autres usages du nom France, Pour une aide rapide et effective, veuiller trouver votre aide dans le menu ci-dessus. Motoring events began soon after the construction of the first successful gasoline-fueled automobiles. The quick brown fox jumped over the lazy dog'

testData = (
  ('ENGLISH', 'confiscation of goods is assigned as the penalty part most of the courts consist of members and when it is necessary to bring public cases before a jury of members two courts combine for the purpose the most important cases of all are brought jurors or'),
  ('ARMENIAN', ' ա յ եվ նա հիացած աչքերով նայում է հինգհարկանի շենքի տարօրինակ փոքրիկ քառակուսի պատուհաններին դեռ մենք շատ ենք հետամնաց ասում է նա այսպես է'),
  ('CHEROKEE', 'ᎠᎢᏍᎩ ᎠᏟᎶᏍᏗ ᏥᏄᏍᏛᎩ ᎦᎫᏍᏛᏅᎯ ᎾᎥᎢ'),
  ('DHIVEHI', ' ހިންދީ ބަހުން ވާހަކަ ދައްކާއިރު ދެވަނަ ބަހެއްގެ ގޮތުގައާއި އެނޫން ގޮތްގޮތުން ހިންދީ ބަހުން ވާހަކަ ދައްކާ މީހުންގެ އަދަދު މިލިއަނަށް'),
  ('GEORGIAN', ' ა ბირთვიდან მიღებული ელემენტი მენდელეევის პერიოდულ სიტემაში გადაინაცვლებს ორი უჯრით'),
  ('GREEK', ' ή αρνητική αναζήτηση λέξης κλειδιού καταστήστε τις μεμονωμένες λέξεις κλειδιά περισσότερο στοχοθετημένες με τη μετατροπή τους σε'),
  ('GUJARATI', ' આના પરિણામ પ્રમાણસર ફોન્ટ અવતરણ ચિન્હવાળા પાઠને છુપાવો બધા સમૂહો શોધાયા હાલનો જ સંદેશ વિષયની'),
  ('INUKTITUT', 'ᐃᑯᒪᒻᒪᑦ ᕿᓈᖏᓐᓇᓲᖑᒻᒪᑦ ᑎᑎᖅᑕᓕᒫᖅᓃᕕᑦ ᑎᑦᕆᐊᑐᓐᖏᑦᑕᑎᑦ ᑎᑎᖅᑕᑉᐱᑦ ᓯᕗᓂᖓᓂ ᑎᑎᖅᖃᖅ ᑎᑎᕆᐊᑐᓐᖏᑕᐃᑦ ᕿᓂᓲᖑᔪᒍᑦ ᑎᑎᖅᑕᓕᒫᖅᓃᕕᑦ'),
  ('KANNADA', ' ಂಠಯ್ಯನವರು ತುಮಕೂರು ಜಿಲ್ಲೆಯ ಚಿಕ್ಕನಾಯಕನಹಳ್ಳಿ ತಾಲ್ಲೂಕಿನ ತೀರ್ಥಪುರ ವೆಂಬ ಸಾಧಾರಣ ಹಳ್ಳಿಯ ಶ್ಯಾನುಭೋಗರ'),
  ('KHMER', ' ក ខ គ ឃ ង ច ឆ ជ ឈ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ផ ព ភ ម យ រ ល វ ស ហ ឡ អ ឥ ឦ ឧ ឪ ឫ ឬ ឯ ឱ ទាំងអស់'),
  ('LAOTHIAN', ' ກຫາທົ່ວທັງເວັບ ແລະໃນເວັບໄຮ້ສາຍ ທຳອິດໃຫ້ທຳການຊອກຫາກ່ອນ ຈາກນັ້ນ ໃຫ້ກົດປຸ່ມເມນູ ໃນໜ້າຜົນໄດ້'),
  ('LIMBU', 'ᤁᤡᤖᤠᤳ ᤕᤠᤰᤌᤢᤱ ᤆᤢᤶᤗᤢᤱᤖᤧ ᤛᤥᤎᤢᤱᤃᤧᤴ ᤀᤡᤔᤠᤴᤛᤡᤱ ᤆᤧᤶᤈᤱᤗᤧ ᤁᤢᤔᤡᤱᤅᤥ ᤏᤠᤈᤡᤖᤡ ᤋᤱᤒᤣ ᥈᥆᥆᥉ ᤒᤠ ᤈᤏᤘᤖᤡ ᤗᤠᤏᤢᤀᤠᤱ ᤁ᤹ᤏᤠ ᤋᤱᤒᤣ ᤁᤠᤰ ᤏᤠ᤺ᤳᤋᤢ ᤕᤢᤖᤢᤒᤠ ᤀᤡᤔᤠᤴᤛᤡᤱ ᤋᤱᤃᤡᤵᤛᤡᤱ ᤌᤡᤶᤒᤣᤴ ᤂᤠᤃᤴ ᤛᤡᤛᤣ᤺ᤰᤗᤠ ᥇᥍ ᤂᤧᤴ ᤀᤡᤛᤡᤰ ᥇ ᤈᤏᤘᤖᤡ ᥈᥆᥆᥊ ᤀᤥ ᤏᤠᤛᤢᤵ ᤆᤥ᤺ᤰᤔᤠ ᤌᤡᤶᤒᤣ ᤋᤱᤃᤠᤶᤛᤡᤱᤗ ᤐᤳᤐᤠ ᤀᤡᤱᤄᤱ ᤘᤠ᤹'),
  ('MALAYALAM', ' ം അങ്ങനെ ഞങ്ങള് അവരുടെ മുമ്പില് നിന്നു ഔടും ഉടനെ നിങ്ങള് പതിയിരിപ്പില് നിന്നു എഴുന്നേറ്റു'),
  ('ORIYA', 'ଅକ୍ଟୋବର ଡିସେମ୍ବର'),
  ('PUNJABI', ' ਂ ਦਿਨਾਂ ਵਿਚ ਭਾਈ ਸਾਹਿਬ ਦੀ ਬੁੱਚੜ ਗੋਬਿੰਦ ਰਾਮ ਨਾਲ ਅੜਫਸ ਚੱਲ ਰਹੀ ਸੀ ਗੋਬਿੰਦ ਰਾਮ ਨੇ ਭਾਈ ਸਾਹਿਬ ਦੀਆਂ ਭੈਣਾ'),
  ('SINHALESE', ' අනුරාධ මිහිඳුකුල නමින් සකුරා ට ලිපියක් තැපෑලෙන් එවා තිබුණා කි ් රස්ටි ෂෙල්ටන් ප ් රනාන්දු ද'),
  ('SYRIAC', 'ܐܕܪܝܣ ܓܛܘ ܫܘܪܝܐ ܡܢ ܦܪܢܣܐ ܡܢ ܐܣܦܢܝܐ ܚܐܪܘܬܐ ܒܐܕܪ ܒܢܝܣܢ ܫܛܝܚܘܬܐ ܟܠܢܝܐ ܡܝ̈ܐ ܒܥܠܡܐ'),
  ('TAGALOG', ' ᜋᜇ᜔ ᜐᜓᜎᜆ᜔ ᜃ ᜈᜅ᜔ ᜊᜌ᜔ᜊᜌᜒᜈ᜔ ᜂᜉᜅ᜔᜔ ᜋᜐᜈᜌ᜔ ᜎᜅ᜔ ᜁᜐ ᜉᜅ᜔ ᜀᜃ᜔ᜎᜆ᜔ ᜆᜓᜅ᜔ᜃᜓᜎ᜔ ᜐ ᜊᜌ᜔ᜊᜌᜒᜈ᜔ ᜐ ᜆᜒᜅᜒᜈ᜔ ᜃᜓ'),
  ('TAMIL', ' அங்கு ராஜேந்திர சோழனால் கட்டப்பட்ட பிரம்மாண்டமான சிவன் கோவில் ஒன்றும் உள்ளது தொகு'),
  ('TELUGU', ' ఁ దనర జయించిన తత్వ మరసి చూడఁ దాన యగును రాజయోగి యిట్లు తేజరిల్లుచు నుండు విశ్వదాభిరామ వినర వేమ'),
  ('THAI', ' กฏในการค้นหา หรือหน้าเนื้อหา หากท่านเลือกลงโฆษณา ท่านอาจจะปรับต้องเพิ่มงบประมาณรายวันตา'),
  ('Chinese', '产品的简报和公告 提交该申请后无法进行更改 请确认您的选择是正确的 对于要提交的图书 我确认 我是版权所有者或已得到版权所有者的授权 要更改您的国家 地区 请在此表的最上端更改您的'),
  ('ChineseT', ' 之前為 帳單交易作業區 已變更 廣告內容 之前為 銷售代表 之前為 張貼日期為 百分比之前為 合約 為 目標對象條件已刪除 結束日期之前為'),
  ('Japanese', ' このペ ジでは アカウントに指定された予算の履歴を一覧にしています それぞれの項目には 予算額と特定期間のステ タスが表示されます 現在または今後の予算を設定するには'),
  ('Korean', ' 개별적으로 리포트 액세스 권한을 부여할 수 있습니다 액세스 권한 부여사용자에게 프로필 리포트에 액세스할 수 있는 권한을 부여하시려면 가용 프로필 상자에서 프로필 이름을 선택한 다음'),
  ('AFRIKAANS', ' aam skukuza die naam beteken hy wat skoonvee of hy wat alles onderstebo keer wysig bosveldkampe boskampe is kleiner afgeleë ruskampe wat oor min fasiliteite beskik daar is geen restaurante of winkels nie en slegs oornagbesoekers word toegelaat bateleur'),
  ('ALBANIAN', ' a do të kërkoni nga beogradi që të njohë pavarësinë e kosovës zoti thaçi prishtina është gati ta njoh pavarësinë e serbisë ndërsa natyrisht se do të kërkohet një gjë e tillë që edhe beogradi ta njoh shtetin e pavarur dhe sovran të'),
  ('ARABIC', 'احتيالية بيع أي حساب'),
  ('AZERBAIJANI', ' a az qalıb breyn rinq intellektual oyunu üzrə yarışın zona mərhələləri keçirilib miq un qalıqlarının dənizdən çıxarılması davam edir məhəmməd peyğəmbərin karikaturalarını çap edən qəzetin baş redaktoru iş otağında ölüb'),
  ('BASQUE', ' a den eraso bat honen kontra hortaz eragiketa bakarrik behar dituen eraso batek aes apurtuko luke nahiz eta oraingoz eraso bideraezina izan gaur egungo teknologiaren mugak direla eta oraingoz kezka hauek alde batera utzi daitezke orain arteko indar'),
  ('BELARUSIAN', ' а друкаваць іх не было тэхнічна магчыма бліжэй за вільню тым самым часам нямецкае кіраўніцтва прапаноўвала апроч ўвядзення лацінкі яе'),
  ('BENGALI', 'গ্যালারির ৩৮ বছর পূর্তিতে মূল্যছাড় অর্থনীতি বিএনপির ওয়াক আউট তপন চৌধুরী হারবাল অ্যাসোসিয়েশনের সভাপতি আন্তর্জাতিক পরামর্শক বোর্ড দিয়ে শরিয়াহ্ ইনন্ডেক্স করবে সিএসই মালিকপক্ষের কান্না, শ্রমিকের অনিশ্চয়তা মতিঝিলে সমাবেশ নিষিদ্ধ: এফবিসিসিআইয়ের ধন্যবাদ বিনোদন বিশেষ প্রতিবেদন বাংলালিংকের গ্র্যান্ডমাস্টার সিজন-৩ ব্রাজিলে বিশ্বকাপ ফুটবল আয়োজনবিরোধী বিক্ষোভ দেশের নিরাপত্তার  চেয়ে অনেক বেশি সচেতন । প্রার্থীদের দক্ষতা  ও যোগ্যতার পাশাপাশি তারা জাতীয় ইস্যুগুলোতে প্রাধান্য দিয়েছেন । ” পাঁচটি সিটিতে ২০ লাখ ভোটারদের দিয়ে জাতীয় নির্বাচনে ৮ কোটি ভোটারদের সঙ্গে তুলনা করা যাবে কি একজন দর্শকের এমন প্রশ্নে জবাবে আব্দুল্লাহ আল নোমান বলেন , “ এই পাঁচটি সিটি কর্পোরেশন নির্বাচন দেশের পাঁচটি বড় বিভাগের প্রতিনিধিত্ব করছে । এছাড়া এখানকার ভোটার রা সবাই সচেতন । তারা'),
  ('BIHARI', 'काल में उनका हमला से बचे खाती एहिजा भाग के अइले आ भोजपुर नाम से नगर बसवले. एकरा बारे में विस्तार से जानकारी नीचे दीहल गइल बा. बाकिर आश्चर्यजनक रूप से मालवा के राजा भोज के बिहार आवे आ भोजपुर नगर बसावे आ चाहे भोजपुरी के साथे उनकर कवनो संबंध होखे के कवनो जानकारी भोपाल के भोज संस्थान आ चाहे मध्य प्रदेश के इतिहासकार लोगन के तनिको नइखे. हालांकि ऊ सब लोग एह बात के मानत बा कि एकरा बारे में अबहीं तकले मूर्ति बनवइलें. राजा भोज के जवना जगहा पऽ वाग्देवी के दर्शन भइल रहे, ओही स्थान पऽ एह मूर्ति के स्थापना कइल गइल. अब अगर एह मंदिर के एह शिलालेख के तस्वीर (पृष्ठ संख्या 33 पऽ प्रकाशित) रउआ धेयान से देखीं तऽ एकरा पऽ कैथी लिपि में -सीताराम- लिखल साफ लउकत बा. कैथी भोजपुरी के बहुत प्रचलित लिपि रहल बिया. एकरा बारे में कवनो शंका संदेह बिहार-यूपी के जानकार लोगन में नइखे. एल. एस. एस. वो माले के लिखल पढ़ीं '),
  ('BULGARIAN', ' а дума попада в състояние на изпитание ключовите думи с предсказана малко под то изискване на страниците за търсене в'),
  ('CATALAN', 'al final en un únic lloc nhorabona l correu electrònic està concebut com a eina de productivitat aleshores per què perdre el temps arxivant missatges per després intentar recordar on els veu desar i per què heu d eliminar missatges importants per l'),
  ('CEBUANO', 'Ang Sugbo usa sa mga labing ugmad nga lalawigan sa nasod. Kini ang sentro sa komersyo, edukasyon ug industriya sa sentral ug habagatang dapit sa kapupod-an. Ang mipadayag sa Sugbo isip ikapito nga labing nindot nga pulo sa , ang nag-inusarang pulo sa Pilipinas nga napasidunggan sa maong magasin sukad pa sa tuig'),
  ('CROATIAN', 'Posljednja dva vladara su Kijaksar (Κυαξαρης; 625-585 prije Krista), fraortov sin koji će proširiti teritorij Medije i Astijag. Kijaksar je imao kćer ili unuku koja se zvala Amitis a postala je ženom Nabukodonosora II. kojoj je ovaj izgradio Viseće vrtove Babilona. Kijaksar je modernizirao svoju vojsku i uništio Ninivu 612. prije Krista. Naslijedio ga je njegov sin, posljednji medijski kralj, Astijag, kojega je detronizirao (srušio sa vlasti) njegov unuk Kir Veliki. Zemljom su zavladali Perzijanci.'),
  ('CZECH', ' a akci opakujte film uložen vykreslit gmail tokio smazat obsah adresáře nelze načíst systémový profil jednotky smoot okud používáte pro určení polokoule značky z západ nebo v východ používejte nezáporné hodnoty zeměpisné délky nelze'),
  ('DANISH', ' a z tallene og punktummer der er tilladte log ud angiv den ønskede adgangskode igen november gem personlige oplysninger kontrolspørgsmål det sidste tegn i dit brugernavn skal være et bogstav a z eller tal skriv de tegn du kan se i billedet nedenfor'),
  ('DUTCH', ' a als volgt te werk om een configuratiebestand te maken sitemap gen py ebruik filters om de s op te geven die moeten worden toegevoegd of uitgesloten op basis van de opmaaktaal elke sitemap mag alleen de s bevatten voor een bepaalde opmaaktaal dit'),
  ('ENGLISH', ' a backup credit card by visiting your billing preferences page or visit the adwords help centre for more details https adwords google com support bin answer py answer hl en we were unable to process the payment of for your outstanding google adwords'),
  ('ESTONIAN', ' a niipea kui sinu maksimaalne igakuine krediidi limiit on meie poolt heaks kiidetud on sinu kohustuseks see krediidilimiit'),
  ('FINNISH', ' a joilla olet käynyt tämä kerro meille kuka ä olet ei tunnistettavia käyttötietoja kuten virheraportteja käytetään google desktopin parantamiseen etsi näyttää mukautettuja uutisia google desktop keskivaihto leikkaa voit kaksoisnapsauttaa'),
  ('FRENCH', ' a accès aux collections et aux frontaux qui lui ont été attribués il peut consulter et modifier ses collections et exporter des configurations de collection toutefois il ne peut pas créer ni supprimer des collections enfin il a accès aux fonctions'),
  ('GALICIAN', '  debe ser como mínimo taranto tendas de venda polo miúdo cociñas servizos bordado canadá viaxes parques de vehículos de recreo hotel oriental habitación recibir unha postal no enderezo indicado anteriormente'),
  ('GANDA', ' abaana ba bani lukaaga mu ana mu babiri abaana ba bebayi lukaaga mu abiri mu basatu abaana ba azugaadi lukumi mu ebikumi bibiri mu abiri mu babiri abaana ba adonikamu lukaaga mu nltaaga mu mukaaga abaana ba biguvaayi enkumi bbiri mu ataano mu mukaaga'),
  ('GERMAN', ' abschnitt ordner aktivieren werden die ordnereinstellungen im farbabschnitt deaktiviert öchten sie wirklich fortfahren eldtypen angeben optional n diesem schritt geben sie für jedesfeld aus dem datenset den typ an ieser schritt ist optional eldtypen'),
  ('HAITIAN_CREOLE', ' ak pitit tout sosyete a chita se pou sa leta dwe pwoteje yo nimewo leta fèt pou li pwoteje tout paran ak pitit nan peyi a menm jan kit paran yo marye kit yo pa marye tout manman ki fè pitit leta fèt pou ba yo konkoul menm jan tou pou timoun piti ak pou'),
  ('HEBREW', ' או לערוך את העדפות ההפצה אנא עקוב אחרי השלבים הבאים כנס לחשבון האישי שלך ב'),
  ('HINDI', ' ं ऐडवर्ड्स विज्ञापनों के अनुभव पर आधारित हैं और इनकी मदद से आपको अपने विज्ञापनों का अधिकतम लाभ'),
  ('HMONG', ' Kuv hlub koj txawm lub ntuj yuav si ntshi nphaus los kuv tsis ua siab nkaug txawm ntiab teb yuav si ntshi nphaus los kuv tseem ua lon tsaug vim kuv hlub koj tag lub siab'),
  ('HUNGARIAN', ' a felhasználóim a google azonosító szöveget ikor látják a felhasználóim a google azonosító szöveget felhasználók a google azonosító szöveget fogják látni minden tranzakció után ha a vásárlását regisztrációját oldalunk'),
  ('ICELANDIC', ' a afköst leitarorða þinna leitarorð neikvæð leitarorð auglýsingahópa byggja upp aðallista yfir ný leitarorð fyrir auglýsingahópana og skoða ítarleg gögn um árangur leitarorða eins og samkeppni auglýsenda og leitarmagn er krafist notkun'),
  ('INDONESIAN', 'berdiri setelah pengurusnya yang berusia 83 tahun, Fayzrahman Satarov, mendeklarasikan diri sebagai nabi dan rumahnya sebagai negara Islam Satarov digambarkan sebagai mantan ulama Islam  tahun 1970-an. Pengikutnya didorong membaca manuskripnya dan kebanyakan dilarang meninggalkan tempat persembunyian bawah tanah di dasar gedung delapan lantai mereka. Jaksa membuka penyelidikan kasus kriminal pada kelompok itu dan menyatakan akan membubarkan kelompok kalau tetap melakukan kegiatan ilegal seperti mencegah anggotanya mencari bantuan medis atau pendidikan. Sampai sekarang pihak berwajib belum melakukan penangkapan meskipun polisi mencurigai adanya tindak kekerasan pada anak. Pengadilan selanjutnya akan memutuskan apakah anak-anak diizinkan tetap tinggal dengan orang tua mereka. Kazan yang berada sekitar 800 kilometer di timur Moskow merupakan wilayah Tatarstan yang'),
  ('IRISH', ' a bhfuil na focail go léir i do cheist le fáil orthu ní gá ach focail breise a chur leis na cinn a cuardaíodh cheana chun an cuardach a bheachtú nó a chúngú má chuirtear focal breise isteach aimseofar fo aicme ar leith de na torthaí a fuarthas'),
  ('ITALIAN', ' a causa di un intervento di manutenzione del sistema fino alle ore circa ora legale costa del pacifico del novembre le campagne esistenti continueranno a essere pubblicate come di consueto anche durante questo breve periodo di inattività ci scusiamo per'),
  ('JAVANESE', ' account ten server niki kalian username meniko tanpo judul cacahe account nggonanmu wes pol pesen mu wes diguwak pesenan mu wes di simpen sante wae pesenan mu wes ke kirim mbuh tekan ora pesenan e ke kethok pesenan mu wes ke kirim mbuh tekan ora pesenan'),
  ('KINYARWANDA', ' dore ibyo ukeneye kumenya ukwo watubona ibibazo byinshi abandi babaza ububonero byibibina google onjela ho izina dyikyibina kyawe onjela ho yawe mulugo kulaho ibyandiko byawe shyilaho tegula yawe tulubaka tukongeraho iyanya mishya buliko tulambula'),
  ('LATVIAN', ' a gadskārtējā izpārdošana slēpošana jāņi atlaide izmaiņas trafikā kas saistītas ar sezonas izpārdošanu speciālajām atlaidēm u c ir parastas un atslēgvārdi kas ir populāri noteiktos laika posmos šajā laikā saņems lielāku klikšķu'),
  ('LITHUANIAN', ' a išsijungia mano idėja dėl geriausio laiko po pastarųjų savo santykių pasimokiau penki dalykai be kurių negaliu gyventi mano miegamajame tu surasi ideali pora išsilavinimas aukštoji mokykla koledžas universitetas pagrindinis laipsnis metai'),
  ('MACEDONIAN', ' гласовите коалицијата на вмро дпмне како партија со најмногу освоени гласови ќе добие евра а на сметката на коализијата за македонија'),
  ('MALAY', 'pengampunan beramai-ramai supaya mereka pulang ke rumah masing-masing. Orang-orang besarnya enggan mengiktiraf sultan yang dilantik oleh Belanda sebagai Yang DiPertuan Selangor. Orang ramai pula tidak mahu menjalankan perniagaan bijih timah dengan Belanda, selagi raja yang berhak tidak ditabalkan. Perdagang yang lain dibekukan terus kerana untuk membalas jasa beliau yang membantu Belanda menentang Riau, Johor dan Selangor. Di antara tiga orang Sultan juga dipandang oleh rakyat sebagai seorang sultan yang paling gigih. 1 | 2 SULTAN Sebagai ganti Sultan Ibrahim ditabalkan Raja Muhammad iaitu Raja Muda. Walaupun baginda bukan anak isteri pertama bergelar Sultan Muhammad bersemayam di Kuala Selangor juga. Pentadbiran baginda yang lemah itu menyebabkan Kuala Selangor menjadi sarang ioleh Cina di Lukut tidak diambil tindakan, sedangkan baginda sendiri banyak berhutang kepada 1'),
  ('MALTESE', ' ata ikteb messaġġ lil indirizzi differenti billi tagħżilhom u tagħfas il buttuna ikteb żid numri tfittxijja tal kotba mur print home kotba minn pagni ghal pagna minn ghall ktieb ta aċċessa stieden habib iehor grazzi it tim tal gruppi google'),
  ('MARATHI', 'हैदराबाद  उच्चार ऐका (सहाय्य·माहिती)तेलुगू: హైదరాబాదు , उर्दू: حیدر آباد हे भारतातील आंध्र प्रदेश राज्याच्या राजधानीचे शहर आहे. हैदराबादची लोकसंख्या ७७ लाख ४० हजार ३३४ आहे. मोत्यांचे शहर अशी एकेकाळी ओळख असलेल्या या शहराला ऐतिहासिक, सांस्कृतिक आणि स्थापत्यशास्त्रीय वारसा लाभला आहे. १९९० नंतर शिक्षण आणि माहिती तंत्रज्ञान त्याचप्रमाणे औषधनिर्मिती आणि जैवतंत्रज्ञान क्षेत्रातील उद्योगधंद्यांची वाढ शहरात झाली. दक्षिण मध्य भारतातील पर्यटन आणि तेलुगू चित्रपटनिर्मितीचे हैदराबाद हे केंद्र आहे'),
  ('NEPALI', 'अरू ठाऊँबाटपनि खुलेको छ यो खाता अर अरू ठाऊँबाटपनि खुलेको छ यो खाता अर ू'),
  ('NORWEGIAN', ' a er obligatorisk tidsforskyvning plassering av katalogsøk planinformasjon loggfilbane gruppenavn kontoinformasjon passord domene gruppeinformasjon alle kampanjesporing alternativ bruker grupper oppgaveplanlegger oppgavehistorikk kontosammendrag antall'),
  ('PERSIAN', ' آب خوردن عجله می کردند به جای باز ی کتک کاری می کردند و همه چيز مثل قبل بود فقط من ماندم و يک دنيا حرف و انتظار تا عاقبت رسيد احضاريه ی ای با'),
  ('POLISH', ' a australii będzie widział inne reklamy niż użytkownik z kanady kierowanie geograficzne sprawia że reklamy są lepiej dopasowane do użytkownika twojej strony oznacza to także że możesz nie zobaczyć wszystkich reklam które są wyświetlane na'),
  ('PORTUGUESE', ' a abit prevê que a entrada desses produtos estrangeiros no mercado têxtil e vestuário do brasil possa reduzir os preços em cerca de a partir de má notícia para os empresários que terão que lutar para garantir suas margens de lucro mas boa notícia'),
  ('ROMANIAN', ' a anunţurilor reţineţi nu plătiţi pentru clicuri sau impresii ci numai atunci când pe site ul dvs survine o acţiune dorită site urile negative nu pot avea uri de destinaţie daţi instrucţiuni societăţii dvs bancare sau constructoare să'),
  ('ROMANIAN', 'оперативэ а органелор ши институциилор екзекутиве ши а органелор жудичиаре але путерий де стат фиекэруй орган ал путерий де стат и се'),
  ('RUSSIAN', ' а неправильный формат идентификатора дн назад'),
  ('SCOTS_GAELIC', ' air son is gum bi casg air a h uile briosgaid no gum faigh thu brath nuair a tha briosgaid a tighinn gad rannsachadh ghoogle gu ceart mura bheil briosgaidean ceadaichte cuiridh google briosgaid dha do neach cleachdaidh fa leth tha google a cleachdadh'),
  ('SERBIAN', 'балчак балчак на мапи србије уреди демографија у насељу балчак живи пунолетна становника а просечна старост становништва износи година'),
  ('SERBIAN', 'Društvo | četvrtak 1.08.2013 | 13:43 Krade se i izvorska voda Izvor:  Gornji Milanovac -- U gružanskom selu Belo Polje prošle noći ukradeno je više od 10.000 litara kojima je obijen bazen. Bazen je bio zaključan i propisno obezbeđen.'),
  ('SLOVAK', ' a aktivovať reklamnú kampaň ak chcete kampaň pred spustením ešte prispôsobiť uložte ju ako šablónu a pokračujte v úprave vyberte si jednu z možností nižšie a kliknite na tlačidlo uložiť kampaň nastavenia kampane môžete ľubovoľne'),
  ('SLOVENIAN', ' adsense stanje prijave za google adsense google adsense račun je bil začasno zamrznjen pozdravljeni hvala za vaše zanimanje v google adsense po pregledu vaše prijavnice so naši strokovnjaki ugotovili da spletna stran ki je trenutno povezana z vašim'),
  ('SPANISH', ' a continuación haz clic en el botón obtener ruta también puedes desplazarte hasta el final de la página para cambiar tus opciones de búsqueda gráfico y detalles ésta es una lista de los vídeos que te recomendamos nuestras recomendaciones se basan'),
  ('SWAHILI', ' a ujumbe mpya jumla unda tafuta na angalia vikundi vya kujadiliana na kushiriki mawazo iliyopangwa kwa tarehe watumiaji wapya futa orodha hizi lugha hoja vishikanisho vilivyo dhaminiwa ujumbe sanaa na tamasha toka udhibitisho wa neno kwa haraka fikia'),
  ('SWEDISH', ' a bort objekt från google desktop post äldst meny öretag dress etaljer alternativ för vad är inne yaste google skrivbord plugin program för nyheter google visa nyheter som är anpassade efter de artiklar som du läser om du till exempel läser'),
  ('TAGALOG', ' a na ugma sa google ay nakaka bantog sa gitna nang kliks na nangyayari sa pamamagitan nang ordinaryong paggagamit at sa kliks na likha nang pandaraya o hindi tunay na paggamit bunga nito nasasala namin ang mga kliks na hindi kailangan o hindi gusto nang'),
  ('TURKISH', ' a ayarlarınızı görmeniz ve yönetmeniz içindir eğer kampanyanız için günlük bütçenizi gözden geçirebileceğiniz yeri arıyorsanız kampanya yönetimi ne gidin kampanyanızı seçin ve kampanya ayarlarını düzenle yi tıklayın sunumu'),
  ('UKRAINIAN', ' а більший бюджет щоб забезпечити собі максимум прибутків від переходів відстежуйте свої об яви за датою географічним розташуванням'),
  ('URDU', ' آپ کو کم سے کم ممکنہ رقم چارج کرتا ہے اس کی مثال کے طور پر فرض کریں اگر آپ کی زیادہ سے زیادہ قیمت فی کلِک امریکی ڈالر اور کلِک کرنے کی شرح ہو تو'),
  ('VIETNAMESE', ' adsense cho nội dung nhà cung cấp dịch vụ di động xác minh tín dụng thay đổi nhãn kg các ô xem chi phí cho từ chối các đơn đặt hàng dạng cấp dữ liệu ác minh trang web của bạn để xem'),
  ('WELSH', ' a chofrestru eich cyfrif ymwelwch a unwaith i chi greu eich cyfrif mi fydd yn cael ei hysbysu o ch cyfeiriad ebost newydd fel eich bod yn gallu cadw mewn cysylltiad drwy gmail os nad ydych chi wedi clywed yn barod am gmail mae n gwasanaeth gwebost'),
  ('YIDDISH', 'און פאנטאזיע ער איז באקאנט צים מערסטן פאר זיינע באַלאַדעס ער האָט געוווינט אין ווארשע יעס פאריס ליווערפול און לאנדאן סוף כל סוף איז ער'),

  ('BOSNIAN', 'Novi predsjednik Mešihata Islamske zajednice u Srbiji (IZuS) i muftija dr. Mevlud ef. Dudić izjavio je u intervjuu za Anadolu Agency (AA) kako je uvjeren da će doći do vraćanja jedinstva među muslimanima i unutar Islamske zajednice na prostoru Sandžaka, te da je njegova ruka pružena za povratak svih u okrilje Islamske zajednice u Srbiji nakon skoro sedam godina podjela u tom dijelu Srbije. Dudić je za predsjednika Mešihata IZ u Srbiji izabran 4. januara, a zvanična inauguracija će biti obavljena u prvoj polovini februara. Kako se očekuje, prisustvovat će joj i reisu-l-ulema Islamske zajednice u Srbiji Husein ef. Kavazović koji će i zvanično promovirati Dudića u novog prvog čovjeka IZ u Srbiji. Dudić će danas boraviti u prvoj zvaničnoj posjeti reisu Kavazoviću, što je njegov privi simbolični potez nakon imenovanja. '),
  ('INDONESIAN', 'sukiyaki wikipedia indonesia ensiklopedia bebas berbahasa bebas berbahasa indonesia langsung ke navigasi cari untuk pengertian lain dari sukiyaki lihat sukiyaki irisan tipis daging sapi sayur sayuran dan tahu di dalam panci besi yang dimasak di atas meja makan dengan cara direbus sukiyaki dimakan dengan mence'),
  ('MALAY', 'sukiyaki wikipedia bahasa melayu ensiklopedia bebas sukiyaki dari wikipedia bahasa melayu ensiklopedia bebas lompat ke navigasi gelintar sukiyaki sukiyaki  hirisan tipis daging lembu sayur sayuran dan tauhu di dalam periuk besi yang dimasak di atas meja makan dengan cara rebusan sukiyaki dimakan dengan mence'),
  ('FRENCH', fr_en_Latn),

  # Added 2014.10.15
  ('KAZAKH',  'а билердің өзіне рұқсат берілмеген егер халық талап етсе ғана хан келісім берген өздеріңіз білесіздер қр қыл мыс тық кодексінде жазаның'),
  ('KURDISH',  'Nû pêvajo ya ezmûn ya pêşin di dîtin ku cezayên pêkan bi biryar standin, jûriyên neh zilam û sê jin wê gelektir govanan guhdar bike, bendewarî nav 3-mehan xilas be, ku zilamê Fransî yê 37 salê wê bi berdarî û heta mirinê bi avêtin zindanê.'),         # aka kmr
  ('KYRGYZ',  'агай эле оболу мен садыбакас аганын өзү менен эмес эмгектери менен тааныштым жылдары ташкенде өзбекстан илимдер академиясынын баяны'),
  ('MALAGASY',  'amporisihin i ianao mba hijery ny dika teksta ranofotsiny an ity lahatsoratra ity tsy ilaina ny opérateur efa karohina daholo ny teny rehetra nosoratanao ampiasao anaovana dokambarotra i google telugu datin ny takelaka fikarohana sary renitakelak i'),
  ('MALAYALAM',  'ം അങ്ങനെ ഞങ്ങള് അവരുടെ മുമ്പില് നിന്നു ഔടും ഉടനെ നിങ്ങള് പതിയിരിപ്പില് നിന്നു എഴുന്നേറ്റു'),
  ('BURMESE',  'တက္ကသုိလ္ မ္ဟ ပ္ရန္ လာ္ရပီးေနာက္ န္ဟစ္ အရ္ဝယ္ ဦးသန္ ့သည္ ပန္ းတနော္ အမ္ယုိးသား ေက္ယာင္ း'),
  ('NYANJA',  'Boma ndi gawo la dziko lomwe linapangidwa ndi cholinga chothandiza ntchito yolamulira. Kuŵalako kulikuunikabe mandita, Edipo nyima unalephera kugonjetsa kuŵalako.'),
  ('SINHALESE',  'අනුරාධ මිහිඳුකුල නමින් සකුරා ට ලිපියක් තැපෑලෙන් එවා තිබුණා කි ් රස්ටි ෂෙල්ටන් ප ් රනාන්දු ද'),     # aka SINHALA
  ('SESOTHO',  'bang ba nang le thahasello matshwao a sehlooho thuto e thehilweng hodima diphetho ke tsela ya ho ruta le ho ithuta e totobatsang hantle seo baithuti ba lokelang ho se fihlella ntlhatheo eo e sebetsang ka yona ke ya hore titjhere o hlakisa pele seo'),
  ('SUNDANESE',  'Nu ngatur kahirupan warga, keur kapentingan pamarentahan diatur ku RT, RW jeung Kepala Dusun, sedengkeun urusan adat dipupuhuan ku Kuncen jeung kepala adat. Sanajan Kampung Kuta teu pati anggang jeung lembur sejenna nu aya di wewengkon Desa Pasir Angin, tapi boh wangunan imah atawa tradisi kahirupan masarakatna nenggang ti nu lian.'),
  ('TAJIK',  'адолат ва инсондӯстиро бар фашизм нажодпарастӣ ва адоват тарҷеҳ додааст чоп кунед ба дигарон фиристед чоп кунед ба дигарон фиристед'),
  ('UZBEK',  'abadiylashtirildi aqsh ayol prezidentga tayyormi markaziy osiyo afg onistonga qanday yordam berishi mumkin ukrainada o zbekistonlik muhojirlar tazyiqdan shikoyat qilmoqda gruziya va ukraina hozircha natoga qabul qilinmaydi afg oniston o zbekistonni g'),
  ('UZBEK',  'а гапирадиган бўлсак бунинг иккита йўли бор биринчиси мана шу қуриган сатҳини қумликларни тўхтатиш учун экотизимни мустаҳкамлаш қумга'),

  # This is just the "version marker":
  ('TURKISH', 'qpdbmrmxyzptlkuuddlrlrbas las les qpdbmrmxyzptlkuuddlrlrbas el la qpdbmrmxyzptlkuuddlrlrbas'),
  )

fullTestData = tuple(x for x in testData[:-1] if x[0] != 'KURDISH') + (
  # Moved from small table to full table as of Oct 2014 release:
  ('SOMALI', ' a oo maanta bogga koobaad ugu qoran yahey beesha caalamka laakiin si kata oo beesha caalamku ula guntato soomaaliya waxa aan shaki ku jirin in aakhirataanka dadka soomaalida oo kaliya ay yihiin ku soomaaliya ka saari kara dhibka ay ku jirto'),
  ('IGBO', 'Chineke bụ aha ọzọ ndï omenala Igbo kpọro Chukwu. Mgbe ndị bekee bịara, ha mee ya nke ndi Christian. N\'echiche ndi ekpere chi Omenala Ndi Igbo, Christianity, Judaism, ma Islam, Chineke nwere ọtụtụ utu aha, ma nwee nanị otu aha. Ụzọ abụọ e si akpọ aha ahụ bụ Jehovah ma Ọ bụ Yahweh. Na ọtụtụ Akwụkwọ Nsọ, e wepụla aha Chineke ma jiri utu aha bụ Onyenwe Anyị ma ọ bụ Chineke dochie ya. Ma mgbe e dere akwụkwọ nsọ, aha ahụ bụ Jehova pụtara n’ime ya, ihe dị ka ugboro pụkụ asaa(7,000).'),
  ('HAUSA', ' a cikin a kan sakamako daga sakwannin a kan sakamako daga sakwannin daga ranar zuwa a kan sakamako daga guda daga ranar zuwa a kan sakamako daga shafukan daga ranar zuwa a kan sakamako daga guda a cikin last hour a kan sakamako daga guda daga kafar'),
  ('YORUBA', ' abinibi han ikawe alantakun le ni opolopo ede abinibi ti a to lesese bi eniyan to fe lo se fe lati se atunse jowo mo pe awon oju iwe itakunagbaye miran ti ako ni oniruru ede abinibi le faragba nipa atunse ninu se iwadi blogs ni ori itakun agbaye ti e ba'),
  ('ZULU', ' ana engu uma inkinga iqhubeka siza ubike kwi isexwayiso ngenxa yephutha lomlekeleli sikwazi ukubuyisela emuva kuphela imiphumela engaqediwe ukuthola imiphumela eqediwe zama ukulayisha kabusha leli khasi emizuzwini engu uma inkinga iqhubeka siza uthumele'),

  ('MONGOLIAN', 'ᠦᠭᠡ ᠵᠢᠨ ᠴᠢᠨᠭ᠎ᠠ ᠬᠦᠨᠳᠡᠢ ᠵᠢ ᠢᠯᠭᠠᠬᠣ'),
  ('X_Buginese', 'ᨄᨛᨑᨊᨒ ᨑᨗ ᨔᨒᨗᨓᨛ ᨕᨗᨋᨗᨔᨗ ᨒᨛᨄ ᨑᨛᨔᨛᨆᨗᨊ'),
  ('X_Gothic', '𐌰 𐌰𐌱𐍂𐌰𐌷𐌰𐌼 𐌰𐌲𐌲𐌹𐌻𐌹𐍃𐌺𐍃 𐌸𐌹𐌿𐌳𐌹𐍃𐌺𐍃 𐍆𐍂𐌰𐌲𐌺𐌹𐍃𐌺𐍃'),
  ('ABKHAZIAN', ' а зуа абзиара дақәшәоит ан лыбзиабара ахә амаӡам ауаҩы игәы иҭоу ихы иҿы ианубаалоит аҧҳәыс ҧшӡа ахацәа лышьҭоуп аҿаасҭа лара дрышьҭоуп'),
  ('AFAR', ' nagay tanito nagay tanto nagayna naharsi nahrur nake nala nammay nammay haytu nanu narig ne ni num numu o obare obe obe obisse oggole ogli olloyta ongorowe orbise othoga r rabe rade ra e rage rakub rasitte rasu reyta rog ruddi ruga s sa al bada sa ala'),
  ('AKAN', 'Wɔwoo Hilla Limann Mumu-Ɔpɛnimba 12 afe 1934. Wɔwoo no wɔ Gwollu wɔ Sisala Mantaw mu Nna ne maame yɛ Mma Hayawah. Ne papa so nna ɔyɛ Babini Yomu. Ɔwarr Fulera Limann ? Ne mba yɛ esuon-- Lariba Montia [wɔwoo no Limann]; Baba Limann; Sibi Andan [wɔwoo no Limann]; Lida Limann; Danni Limann; Zilla Limann na Salma Limann. Ɔtenaa ase kɔpemm Sanda-Kwakwa da ɛtɔ so 23 wɔ afe 1998 wɔ ?.'),
  ('AMHARIC', ' ለመጠይቅ ወደ እስክንድርያ ላኩዋቸውና የእስክንድርያ ጳጳስ አቴናስዮስ ፍሬምንጦስን እራሳቸውን ሾመው ልከዋል ከዚያ እስከ ዓ ም ድረስ የኢትዮጵያ አቡነ'),
  ('ASSAMESE', 'অঞ্চল নতুন সদস্যবৃন্দ সকলোৱে ভৰ্তি হব পাৰে মুল পৃষ্ঠা জন লেখক গুগ ল দল সাৰাংশ ই পত্ৰ টা বাৰ্তা এজন'),
  ('AYMARA', ' aru wijar aru ispañula ukaran aru witanam aru kurti aru kalis aru warani aru malta aru yatiyawi niya jakitanaka isluwiñ aru lmir phuran aru masirunan aru purtukal aru kruwat aru jakira urtu aru inklisa pirsan aru suyku aru malay aru jisk aptayma thaya'),
  ('BASHKIR', ' арналђан бындай ђилми эш тіркињлњ тњјге тапєыр нњшер ителњ ғинуар бєхет именлектє етешлектє ауыл ўќмерџєре хеџмєт юлын ћайлаѓанда'),
  ('BISLAMA', ' king wantaem nomo hem i sakem setan mo ol rabis enjel blong hem oli aot long heven oli kamdaon long wol taswe ol samting oli kam nogud olgeta long wol ya stat long revelesen ol faet kakae i sot ol sik mo fasin blong brekem loa oli kam antap olgeta samting'),
  ('BRETON', ' a chom met leuskel a ra e blas da jack irons dilabour hag aet kuit eus what is this dibab a reont da c houde michael beinhorn evit produiñ an trede pladenn kavet e vez ar ganaouennoù buhan ha buhan ganto setu stummet ar bladenn adkavet e vez enni funk'),
  ('BURMESE', ' တက္ကသုိလ္ မ္ဟ ပ္ရန္ လာ္ရပီးေနာက္ န္ဟစ္ အရ္ဝယ္ ဦးသန္ ့သည္ ပန္ းတနော္ အမ္ယုိးသား ေက္ယာင္ း'),
  ('CORSICAN', ' a prupusitu di risultati for utilizà a scatula per ricercà ind issi risultati servore errore u servore ha incuntratu una errore pruvisoria é ùn ha pussutu compie a vostra dumanda per piacè acimenta dinò ind una minuta tuttu listessu ligami truvà i'),
  ('DZONGKHA', ' རྩིས བརྐྱབ ཚུལ ལྡན དང ངེས བདེན སྦ སྟོན ནིའི དོན ལུ ཁྱོད གུག ཤད ལག ལེན འཐབ དགོ ག དང ཨིན པུཊི གྲལ ཐིག གུ'),
  ('ESPERANTO', ' a jarcento refoje per enmetado de koncerna pastro tiam de reformita konfesio ekde refoje ekzistis luteranaj komunumanoj tamen tiuj fondis propran komunumon nur en ambaŭ apartenis ekde al la evangela eklezio en prusio resp ties rejnlanda provinceklezio en'),
  ('FAROESE', ' at verða átaluverdar óhóskandi ella áloypandi vit kunnu ikki garanterða at google leitanin ikki finnur naka sum er áloypandi óhóskandi ella átaluvert og google tekur onga ábyrgd yvir tær síður sum koma við í okkara leitiskipan fá tær ein'),
  ('FIJIAN', ' i kina na i iri ka duatani na matana main a meke wesi se meke mada na meke ni yaqona oqo na meke ka dau vakayagataki ena yaqona vakaturaga e dau caka toka ga kina na vucu ka dau lagati tiko kina na ka e yaco tiko na talo ni wai ni yaqona na lewai ni wai'),
  ('FRISIAN', ' adfertinsjes gewoan lytse adfertinsjes mei besibbe siden dy t fan belang binne foar de ynhâld fan jo berjochten wolle jo mear witte fan gmail foardat jo jo oanmelde gean dan nei wy wurkje eltse dei om gmail te ferbetterjen dêrta sille wy jo sa út en'),
  ('GREENLANDIC', ' at nittartakkalli uani toqqarsimasatta akornanni nittartakkanut allanut ingerlaqqittoqarsinnaavoq kanukoka tassaavoq kommuneqarfiit kattuffiat nuna tamakkerlugu kommunit nittartagaannut ingerlaqqiffiusinnaasoq kisitsiserpassuit nunatsinnut tunngasut'),
  ('GUARANI', ' aháta añe ë ne mbo ehára ndive ajeruréta chupe oporandujey haĝua peëme mba épa pekaru ha áĝa oporandúvo nde eréta avei re paraguaýpe kachíke he i leúpe ndépa re úma kure tatakuápe ha leu ombohovái héë ha ujepéma kachíke he ijey'),
  ('HAWAIIAN', 'He puke noiʻi kūʻikena kūnoa ʻo Wikipikia. E ʻoluʻolu nō, e hāʻawi mai i kāu ʻike, kāu manaʻo, a me kou leo no ke kūkulu ʻana a me ke kākoʻo ʻana mai i ka Wikipikia Hawaiʻi. He kahua pūnaewele Hawaiʻi kēia no ka hoʻoulu ʻana i ka ʻike Hawaiʻi. Inā hiki iā ʻoe ke ʻōlelo Hawaiʻi, e ʻoluʻolu nō, e kōkua mai a e hoʻololi i nā ʻatikala ma ʻaneʻi, a pono e haʻi aku i kou mau hoa aloha e pili ana i ka Wikipikia Hawaiʻi. E ola mau nō ka ʻōlelo Hawaiʻi a mau loa aku.'),
  ('IGBO', 'Chineke bụ aha ọzọ ndï omenala Igbo kpọro Chukwu. Mgbe ndị bekee bịara, ha mee ya nke ndi Christian. N\'echiche ndi ekpere chi Omenala Ndi Igbo, Christianity, Judaism, ma Islam, Chineke nwere ọtụtụ utu aha, ma nwee nanị otu aha. Ụzọ abụọ e si akpọ aha ahụ bụ Jehovah ma Ọ bụ Yahweh. Na ọtụtụ Akwụkwọ Nsọ, e wepụla aha Chineke ma jiri utu aha bụ Onyenwe Anyị ma ọ bụ Chineke dochie ya. Ma mgbe e dere akwụkwọ nsọ, aha ahụ bụ Jehova pụtara n’ime ya, ihe dị ka ugboro pụkụ asaa(7,000).'),
  ('INTERLINGUA', ' super le sitos que tu visita isto es necessari pro render disponibile alcun functionalitates del barra de utensiles a fin que nos pote monstrar informationes ulterior super un sito le barra de utensiles debe dicer a nos le'),
  ('INTERLINGUE', ' abhorre exceptiones in li derivation plu cardinal por un l i es li regularità del flexion conjugation ples comparar latino sine flexione e li antiqui projectes naturalistic queles have quasi null regules de derivation ma si on nu examina li enunciationes'),
  ('INUPIAK', 'sabvaqjuktuq sabvaba atiqaqpa atiqaqpa ibiq iebiq ixafich niuqtulgiññatif uvani natural gas tatpikka ufasiksigiruaq maaffa savaannafarufa mi tatkivani navy qanuqjugugguuq taaptuma inna uqsrunik ivaqjiqhutik       taktuk allualiuqtuq sigukun nanuq puuvraatuq taktuum amugaa kalumnitigun nanuq agliruq allualiuqtuq'),
  ('KASHMIRI', ' ژماں سرابن منز  گرٲن چھِہ خابٕک کھلونہٕ ؤڈراواں   تُلتِھ نِیَس تہٕ گوشہِ گوشہِ مندچھاوى۪س   دِلس چھُہ وون٘ت وُچھان از ستم قلم  صبوٝرٕ وول مسٲفر لیۆکھُن بێتابن منز   ورل سوال چھُہ تراواں جوابن منز    کالہٕ پھۯستہٕ پھن٘ب پگَہہ پہ   پۆت نظر دِژ نہٕ ژھالہٕ مٔت آرن     مٲنز مسول متھان چھےٚ مس والن  وۅن چھےٚ غارن تہِ نارٕ ژھٹھ ژاپان  رێش تۅرگ تراوٕہن تہٕ ون رٹہٕ ہن  ہوشہِ ہێۆچھ نہٕ پوشنوٝلس نِش  مۅہرٕ دی دی زٕلاں چھِ زى۪و حرفن  لۆدرٕ پھٔل ہى۪تھ ملر عازمؔ  سۆدرٕ کھۅنہِ منز منگاں چھُہ ندرى۪ن پن   ژے تھى۪کی یہِ مسٲفر پنن وُڈو تہٕ پڑاو   گٕتَو گٕتَو چھےٚ یہِ کۅل بُتھ تہٕ بانہٕ سٕہہ گۅردٕ چھہِ سپداں دمہٕ پُھٹ  چھِٹہ پونپر پکھہٕ داران سُہ یتى۪ن تۯاوِ  کم نظر دۯاکھ تہٕ باسیوے سُہ مۆہ ہیو یێران  مےٚ ژى۪تُرمُت چھُہ سُلی تس چھےٚ کتى۪ن تھپھ  شاد مس کراں وُچھ مےٚ خون  ژٕ خبر کیازِ کراں دۯاکھ تمِس پى۪ٹھ ماتم  أز کہِ شبہٕ آو مےٚ بێیہِ پیش سفر زانہِ خدا  دارِ پى۪ٹھ ژٲنگ ہنا تھو زِ ژے چھےٚ مێون  أنہٕ کپٹاں چھُہ زٕژن سون مظفّر عازمؔ  پوشہ برگن چھُہ سُواں چاکھ سُہ الماس قلم   لوِ کٔ ڈ نوِ سرٕ سونتس کل   پروِ بۆر بێیہ از بانبرِ ہۆت  یمبرزلہِ ٹارى۪ن منز نار   وزملہِ کۅسہٕ کتھ کٔر اظہار  کچھہِ منزٕ ؤن رووُم اچھہِ  چشمو ژوپُم کٔنڈ انبار   تماشہِ چھہِ تگاں'),
  ('KAZAKH', ' ﺎ ﻗﻴﺎﻧﺎﺕ ﺑﻮﻟﻤﺎﻳﺪﻯ ﺑﯘﻝ ﭘﺮﻭﺗﺴﻪﺳﯩﻦ ﻳﺎﻋﻨﻲ ﻗﺎﻻ ﻭﻣﯩﺮﯨﻨﺪﻩ ﻗﺎﺯﺍﻕ ء ﺗﯩﻠﯩﻨﯩﯔ ﻗﻮﻟﺪﺍﻧﯩﻠﻤﺎﯞﻯ ﻗﺎﺯﺍﻕ ﺟﻪﺭﯨﻨﺪﻩ'),
  ('KAZAKH', ' а билердің өзіне рұқсат берілмеген егер халық талап етсе ғана хан келісім берген өздеріңіз білесіздер қр қыл мыс тық кодексінде жазаның'),
  ('KHASI', ' kaba jem jai sa sngap thuh ia ki bynta ba sharum naka sohbuin jong phi nangta sa pynhiar ia ka kti kadiang jong phi sha ka krung jong phi bad da kaba pyndonkam kumjuh ia ki shympriahti jong phi sa sngap thuh shapoh ka tohtit jong phi pyndonkam ia kajuh ka'),
  ('KURDISH', ' بۆ به ڕێوه بردنی نامه ی که دێتن ڕاسته وخۆ ڕه وان بکه نامه کانی گ مایل بۆ حسابی پۆستێکی تر هێنانی په یوه ندکاره کان له'),
  ('KYRGYZ', ' جانا انى تانۇۇ ۇلۇتۇن تانۇۇ قىرعىزدى بئلۉۉ دەگەندىك اچىق ايتساق ماناستى تاانىعاندىق ۅزۉڭدۉ تاانىعاندىق بۉگۉن تەما جۉكتۅمۅ ق ى رع ى ز ت ى ل ى'),
  ('KYRGYZ', ' агай эле оболу мен садыбакас аганын өзү менен эмес эмгектери менен тааныштым жылдары ташкенде өзбекстан илимдер академиясынын баяны'),
  ('LATIN', ' a deo qui enim nocendi causa mentiri solet si iam consulendi causa mentiatur multum profecit sed aliud est quod per se ipsum laudabile proponitur aliud quod in deterioris comparatione praeponitur aliter enim gratulamur cum sanus est homo aliter cum melius'),
  ('LINGALA', ' abakisamaki ndenge esengeli moyebami abongisamaki solo mpenza kombo ya moyebami elonguamaki kombo ya bayebami elonguamaki kombo eleki molayi po na esika epesameli limbisa esika ya kotia ba kombo esuki boye esengeli olimbola ndako na yo ya mikanda kombo'),
  ('LUXEMBOURGISH', ' a gewerkschaften och hei gefuerdert dir dammen an dir häre vun de gewerkschaften denkt un déi aarm wann der äer fuerderunge formuléiert d sechst congés woch an aarbechtszäitverkierzung hëllefen hinnen net d unhiewe vun de steigerungssäz bei de'),
  ('MALAGASY', ' amporisihin i ianao mba hijery ny dika teksta ranofotsiny an ity lahatsoratra ity tsy ilaina ny opérateur efa karohina daholo ny teny rehetra nosoratanao ampiasao anaovana dokambarotra i google telugu datin ny takelaka fikarohana sary renitakelak i'),
  ('MALAY', 'bilik sebelah berkata julai pada pm ladymariah hmm sume ni terpulang kepada individu mungkin anda bernasib baik selama ini dalam membeli hp yang bagus deli berkata julai pada pm walaupun bukan bahsa baku tp tetap bahasa melayu kan perubahan boleh dibuat'),
  ('MANX', ' and not ripe as i thought yn assyl yn shynnagh as yn lion the ass the fox and the lion va assyl as shynnagh ayns commee son nyn vendeilys as sauchys hie ad magh ayns y cheyll dy shelg cha row ad er gholl feer foddey tra veeit ad rish lion yn shynnagh'),
  ('MAORI', ' haere ki te kainga o o haere ki te kainga o o haere ki te kainga o te rapunga ahua o haere ki te kainga o ka tangohia he ki to rapunga kaore au mohio te tikanga whakatiki o te ra he whakaharuru te pai rapunga a te rapunga ahua a e kainga o nga awhina o te'),
  ('MAURITIAN_CREOLE', 'Anz dir mwa, Sa bann delo ki to trouve la, kot fam prostitie asize, samem bann pep, bann lafoul dimoun, bann nasion ek bann langaz. Sa dis korn ki to finn trouve, ansam avek bebet la, zot pou ena laenn pou prostitie la; zot pou pran tou seki li ena e met li touni, zot pou manz so laser e bril seki reste dan dife. Parski Bondie finn met dan zot leker proze pou realiz so plan. Zot pou met zot dakor pou sed zot pouvwar bebet la ziska ki parol Bondie fini realize.'),
  ('MONGOLIAN', ' а боловсронгуй болгох орон нутгийн ажил үйлсийг уялдуулж зохицуулах дүрэм журам боловсруулах орон нутгийн өмч хөрөнгө санхүүгийн'),
  ('NAURU', ' arcol obabakaen riringa itorere ibibokiei ababaro min kuduwa airumena baoin tokin rowiowet itiket keram damadamit eigirow etoreiy row keitsito boney ibingo itsiw dorerin naoerodelaporte s nauruan dictionary a c a c d g h o p s t y aiquen ion eins aiquen'),
  ('NDEBELE', "ikomiti elawulako yegatja  emhlanganweni walo ]imithetho mgomo ye anc ibekwa malunga wayo begodu ubudosiphambili kugandelela lokho okutjhiwo yi  lokha nayithi abantu ngibo  "),
  ('NORWEGIAN_N', ' a for verktylina til å hjelpa deg å nå oss merk at pagerank syninga ikkje automatisk kjem til å henta inn informasjon frå sider med argument dvs frå sider med eit i en dersom datamaskina di er plassert bak ein mellomtenar for vevsider kan det verka'),
  ('NYANJA', 'Boma ndi gawo la dziko lomwe linapangidwa ndi cholinga chothandiza ntchito yolamulira. Kuŵalako kulikuunikabe mandita, Edipo nyima unalephera kugonjetsa kuŵalako.'),
  ('OCCITAN', '  Pasmens, la classificacion pus admesa uei (segon Juli Ronjat e Pèire Bèc) agropa lei parlars deis Aups dins l\'occitan vivaroaupenc e non dins lo dialècte provençau.'),
  ('OROMO', ' afaan katalaa bork bork bork hiikaa jira hin argamne gareen barbaadame hin argamne gargarsa qube en gar bayee jira garee walitti firooman gareewwan walitti firooman fuula web akka tartiiba qubeetiin agarsiisi akka tartiiba qubeetiin agarsiisaa jira akka'),
  ('PASHTO', ' اتو مستقل رياست جوړ شو او د پخواني ادبي انجمن څانګې ددې رياست جز شوی او ددې انجمن د ژبې مديريت د پښتو ټولنې په لوی مديريت واوښت لوی مدير يې د'),
  ('PEDI', 'Bophara bja Asia ekaba 8.6% bja lefase goba 29.4% bja naga ya lefase (ntle le mawatle). Asia enale badudu bao bakabago dimillione millione tše nne (4 billion) yeo e bago 60% ya badudi ba lefase ka bophara. A bapolelwa rena sefapanong mehleng ya Pontius Pilatus. A hlokofatšwa, A bolokwa, A tsoga ka letšatši la boraro, ka mo mangwalo a bolelago ka gona, a rotogela magodimong, '),
  ('QUECHUA', ' is t ipanakunatapis rikuchinankupaq qanpa simiykipi noqaykoqpa uya jllanakunamanta kunan jamoq simikunaman qelqan tiyan watukuy qpa uyata qanpa llaqtaykipi llank anakuna simimanta yanapakuna simimanta mayqen llaqtallapis kay simimanta t ijray qpa qelqa'),
  ('RHAETO_ROMANCE', ' Cur ch’il chantun Turitg ha dà il dretg da votar a las dunnas (1970) è ella vegnida elegida en il cussegl da vischnanca da Zumikon per la Partida liberaldemocratica svizra (PLD). Da 1974 enfin 1982 è ella stada presidenta da vischnanca da Zumikon. L’onn 1979 è Elisabeth Kopp vegnida elegida en il Cussegl naziunal e reelegida quatter onns pli tard cun in resultat da sur 100 000 vuschs. L’onn 1984 è ella daventada vicepresidenta da la PLD.'),
  ('RUNDI', ' ishaka mu ndero y abana bawe ganira n abigisha nimba hari ingorane izo ari zo zose ushobora gusaba kubonana n umwigisha canke kuvugana nawe kuri terefone inyuma y uko babarungikira urutonde rw amanota i muhira mu bisanzwe amashure aratumira abavyeyi'),
  ('SAMOAN', ' autu mea o lo totonu le e le minaomia matou te tuu i totonu i le faamatalaina o le suesuega i taimi uma mea o lo totonu fuafua i mea e tatau fa afoi tala mai le newsgroup mataupu fa afoi mai tala e ai le mataupu e ai totonu tusitala o le itu o faamatalaga'),
  ('SANGO', ' atâa na âkotta zo me lâkwê angbâ gï tarrango nî âkotta zo tî koddoro nî âde agbû tenne nî na kate töngana mbênî kotta kpalle tî nzönî dutï tî halëzo pëpe atâa sô âla lü gbâ tî ândya tî mâi na sahngo asâra gbâ tî'),
  ('SANSKRIT', ' ं क र्मणस् त स्य य त्कि ङ्चेह करो त्यय ं त स्माल् लोका त्पु नरै ति अस्मै लोका य क र्मण इ ति नु काम'),
  ('SANSKRIT', ' brahmā tatraivāntaradhīyata tataḥ saśiṣyo vālmīkir munir vismayam āyayau tasya śiṣyās tataḥ sarve jaguḥ ślokam imaṃ punaḥ muhur muhuḥ prīyamāṇāḥ prāhuś ca bhṛśavismitāḥ samākṣaraiś caturbhir yaḥ pādair gīto'),
  ('SCOTS', ' a gless an geordie runciman ower a gless an tamson their man preached a hale hoor aboot the glorious memories o forty three an backsliders an profane persons like esau an aboot jeroboam the son o nebat that gaed stravagin to anither kirk an made aa israel'),
  ('SESELWA', 'Sesel ou menm nou sel patri. Kot nou viv dan larmoni. Lazwa, lanmour ek lape. Nou remersye Bondye. Preserv labote nou pei. Larises nou losean. En leritaz byen presye. Pour boner nou zanfan. Reste touzour dan linite. Fer monte nou paviyon. Ansanm pou tou leternite. Koste Seselwa!'),
  ('SESOTHO', ' bang ba nang le thahasello matshwao a sehlooho thuto e thehilweng hodima diphetho ke tsela ya ho ruta le ho ithuta e totobatsang hantle seo baithuti ba lokelang ho se fihlella ntlhatheo eo e sebetsang ka yona ke ya hore titjhere o hlakisa pele seo'),
  ('SHONA', ' chete vanyori vanotevera vakabatsira kunyora zvikamu zvino kumba home tinyorere tsamba chikamu chakumbirwa hachina kuwanikwa chikamu ichi cheninge chakayiswa kuimwe nzvimbo mudhairekitori rino chimwe chikamu chopadhuze pane chinhu chatadza kushanda bad'),
  ('SINDHI', ' اضافو ٿي ٿيو پر اها خبر عثمان کي بعد پيئي ته سگريٽ ڇڪيندڙ مسلمان نه هو بلڪ هندو هو دڪان تي پهچي عثمان ڪسبت کولي گراهڪن جي سيرب لاهڻ شروع ڪئي پر'),
  ('SISWANT', ' bakhokhintsela yesikhashana bafake imininingwane ye akhawunti leliciniso kulelifomu nangabe akukafakwa imininingwane leliciniso imali lekhokhiwe angeke ifakwe kumkhokhintsela lofanele imininingwane ye akhawunti ime ngalendlela lelandzelako inombolo'),
  ('SUNDANESE', 'Nu ngatur kahirupan warga, keur kapentingan pamarentahan diatur ku RT, RW jeung Kepala Dusun, sedengkeun urusan adat dipupuhuan ku Kuncen jeung kepala adat. Sanajan Kampung Kuta teu pati anggang jeung lembur sejenna nu aya di wewengkon Desa Pasir Angin, tapi boh wangunan imah atawa tradisi kahirupan masarakatna nenggang ti nu lian.'),
  ('TAJIK', ' адолат ва инсондӯстиро бар фашизм нажодпарастӣ ва адоват тарҷеҳ додааст чоп кунед ба дигарон фиристед чоп кунед ба дигарон фиристед'),
  ('TATAR', 'ачарга да бирмәде чәт чәт килеп тора безнең абыйнымы олы абыйнымы эштән'),
  ('TATAR', ' alarnı eşkärtü proğramnarın eşläwen däwam itü tatar söylämen buldıru wä sizep alu sistemnarın eşläwen däwat itü häm başqalar yılnıñ mayında tatar internetı ictimağıy oyışması milli ts isemle berençe däräcäle häm tat'),
  ('TIBETAN', ' ་གྱིས་ཁ་ཆེའི་ཕྱག་འཚལ་ཁང་ཞིག་བཤིག་སྲིད་པ། ཡར་ཀླུང་གཙང་པོར་ཆ ུ་མཛོང་བརྒྱག་རྒྱུའི་ལས་འཆར་ལ་རྒྱ་གར་གྱི་སེམས་ཚབས། རྒྱ་གརགྱི་མཚོ་འོག་དམག་གྲུར་སྦར་གས་བྱུང་བ། པ་ཀི་སི་ཏན་གྱིས་རྒྱ་གར་ལ་མི་སེར་བསད་པའི་སྐྱོན་འཛུགས་བྱས་པ། རྩོམ་ཡིག་མང་བ། འབྲེལ་མཐུད་བརྒྱུད་ལམ། ཐོན་སྐྱེད་དང་སྲི་ཞུ། ་ཐོག་དེབ་བཞི་ དཔར་འགྲེམས་གནང་ཡོད་པ་དང་བོད་ཡིག་དྲ་ཚིགས་ཁག་ནང་ལ་ཡང་རྩོམ་ཡང་ཡང་བྲིས་གནང་མཁན་རེད། ལེ་ཚན་ཁག ལེ་ཚན་ཁག འབྲེལ་ཡོད། འགྲེམ་སྟོན། རྒྱུད་ལམ་སྣ་མང་ཡིག་མཛོད། བཀོལ་སྤྱོད་པའི་འཇོག་ཡུལ་དྲ་ངོས། སྔོན་མ། རྗེས་མ། བསྟན་འཛིན་བདེ་སྐྱིད། ཚེ་རིང་རྣམ་རྒྱལ། བསྟན་འཛིན་ངག་དབང་། ཡོལ་གདོང་ཚེ་རིང་ལྷག་པ།  ་དབང་ ཕྱུག་གཉིས་ཀྱིས་བརྗོད་གཞི་བྱེ་བྲག་པ་ཞིག་ལ་བགྲོ་གླེང་གཏིང་ཟབ་བྱེད་པའི་གཟའ་ འཁོར་གཉིས་རེའི་མཚམས་ཀྱི་ལེ་ཚན་ཞིག་ཡིན། དཔྱད་ཞིབ་ཀྱིས་རྒྱ་ནག་ནང་ཁུལ་གྱི་འགྱུར་ལྡོག་དང༌། རྒྱ་ནག་དང་རྒྱལ་སྤྱིའི་འབྲེལ་བར་དམིགས་སུ་བཀར་ནས་བགྲོ་གླེང་བྱེད་ཀྱི་ཡོད།། རྒྱང་སྲིང་དུས་ཚོད།'),
  ('TIGRINYA', ' ሃገር ተረፎም ዘለዉ ኢትዮጵያውያን ኣብቲ ምስ ኢትዮጵያ ዝዳውብ ኣውራጃ ደቡብ ንኽነብሩ ኣይፍቀደሎምን እዩ ካብ ሃገር ንኽትወጽእ ዜጋ ኹን ወጻእተኛ ናይ'),
  ('TONGA', ' a ke kumi oku ikai ke ma u vakai ki hono hokohoko faka alafapeti api pe ko e uluaki peesi a ho o fekumi faka malatihi fekumi ki he lea oku fakaha atu pe ko ha fonua fekumi ki he fekumi ki he peesi oku ngaahi me a oku sai imisi alu ki he ki he ulu aki'),
  ('TSONGA', ' a ku na timhaka leti nga ta vulavuriwa na google google yi hlonipha yi tlhela yi sirheleta vanhu hinkwavo lava tirhisaka google toolbar ku dyondza hi vusireleli eka system ya hina hi kombela u hlaya vusireleli bya hina eka toolbar mbulavulo wu tshikiwile'),
  ('TSWANA', ' go etela batla ditsebe tsa web tse di nang le le batla ditsebe tse di golaganya le tswang mo leka go batla web yotlhe batla mo web yotlhe go bona home page ya google batla mo a o ne o batla gore a o ne o batla ditsebe tsa bihari batla mo re maswabi ga go'),
  ('TURKMEN', ' айдянларына ынанярмыка эхли боз мейданлары сурулип гутарылан тебигы ота гарып гумлукларда миллиондан да артыкмач ири шахлы малы миллиона'),
  ('TURKMEN', ' akyllylyk çyn söýgi üçin böwet däl de tebigylykdyr duýgularyň gödeňsiligi aç açanlygy bahyllygy söýgini betnyşanlyk derejesine düşürýändir söýeni söý söýmedige süýkenme özüni söýmeýändigini görmek ýigit üçin uly'),
  ('AKAN', ' amammui tumidifo no bɛtow ahyɛ atoro som so mpofirim na wɔasɛe no pasaa ma ayɛ nwonwa dɛn na ɛbɛka wɔn ma wɔayɛ saa bible no ma ho mmuae wɔ adiyisɛm nhoma no mu sɛ onyankopɔn na ɔde hyɛɛ wɔn komam sɛ wɔmma ne nsusuwii mmra mu'),
  ('UIGHUR', ' ئالەملەرنىڭ پەرۋەردىگارىدىن تىلەيمەن سىلەر بۇ يەرلەردە باغچىلاردىن بۇلاقلاردىن زىرائەتلەردىن يۇمشاق پىشقان خورمىلاردىن بەھرىمەن بولۇپ'),
  ('UIGHUR', ' а башлиди әмма бу қетимқи канада мәтбуатлириниң хәвәрлиридә илгирикидәк хитай һөкүмәт мәтбуатлиридин нәқил алидиған вә уни көчүрүп'),
  ('UZBEK', ' آرقلی بوتون سیاسی حزب و گروه لرفعالیتیگه رخصت بیرگن اخبارات واسطه لری شو ییل مدتیده مثال سیز ترقی تاپکن و اهالی نینگ اقتصادی وضعیتی اوتمیش'),
  ('UZBEK', ' а гапирадиган бўлсак бунинг иккита йўли бор биринчиси мана шу қуриган сатҳини қумликларни тўхтатиш учун экотизимни мустаҳкамлаш қумга'),
  ('UZBEK', ' abadiylashtirildi aqsh ayol prezidentga tayyormi markaziy osiyo afg onistonga qanday yordam berishi mumkin ukrainada o zbekistonlik muhojirlar tazyiqdan shikoyat qilmoqda gruziya va ukraina hozircha natoga qabul qilinmaydi afg oniston o zbekistonni g'),
  ('VENDA', 'Vho ṱanganedzwa kha Wikipedia nga tshiVenḓa. Vhadivhi vha manwalo a TshiVenda vha talusa divhazwakale na vhubvo ha Vhavenda ngau fhambana. Vha tikedza mbuno dzavho uya nga mawanwa a thoduluso dze vha ita. Vhanwe vha vhatodulusi vhari Vhavenda vho tumbuka Afrika vhukati vha tshimbila vha tshiya Tshipembe ha Afrika, Rhodesia hune ha vho vhidzwa Zimbagwe namusi.'),
  ('VOLAPUK', ' brefik se volapükavol nüm balid äpubon ün dü lif lölik okas redakans älaipübons gasedi at nomöfiko äd ai mu kuratiko pläo timü koup nedäna fa ns deutän kü päproibon fa koupanef me gased at ästeifülom ad propagidön volapüki as sam ün'),
  ('WARAY_PHILIPPINES', 'Amo ini an balay han Winaray o Binisaya nga Lineyte-Samarnon nga Wikipedia, an libre ngan gawasnon nga ensayklopedya nga bisan hin-o puyde magliwat o mag-edit. An Wikipedia syahan gintikang ha Iningles nga yinaknan han tuig 2001. Ini nga bersyon Winaray gintikang han ika-25 han Septyembre 2005 ngan ha yana mayda 514,613 nga artikulo. Kon karuyag niyo magsari o magprobar, pakadto ha . An Gastrotheca pulchra[2] in uska species han Anura nga ginhulagway ni Ulisses Caramaschi ngan Rodrigues hadton 2007. An Gastrotheca pulchra in nahilalakip ha genus nga Gastrotheca, ngan familia nga Hemiphractidae.[3][4] Ginklasipika han IUCN an species komo kulang hin datos.[1] Waray hini subspecies nga nakalista.[3]'),
  ('WOLOF', ' am ak dëgg dëggam ak gëm aji bind ji te gëstu ko te jëfandikoo tegtalu xel ci saxal ko sokraat nag jëfandikoo woon na xeltu ngir tas jikko yu rafet ci biir nit ñi ak dëggu ak soppante sokraat nag ñëw na mook aflaton platon sukkandiku ci ñaari'),
  ('XHOSA', ' a naynga zonke futhi libhengezwa kwiwebsite yebond yasemzantsi afrika izinga elisebenzayo xa usenza olu tyalo mali liya kusebenza de liphele ixesha lotyalo mali lwakho inzala ihlawulwa rhoqo emva kweenyanga ezintandathu ngomhla wamashumi amathathu ananye'),
  ('X_KLINGON', ' a ghuv bid soh naq jih lodni yisov chich wo vamvo qeylis lunge pu chah povpu vodleh a dah ghah cho ej dah wo che pujwi bommu tlhegh darinmohlahchu pu majqa horey so lom qa ip quv law may vad suvtahbogh wa sanid utlh quv pus datu pu a vitu chu pu johwi tar'),
  ('X_PIG_LATIN', ' away ackupbay editcray ardcay ybay isitingvay ouryay illingbay eferencespray agepay orway isitvay ethay adwordsway elphay entrecay orfay oremay etailsday adwordsway ooglegay omcay upportsay'),
  ('ZHUANG', ' dih yinzminz ndaej daengz bujbienq youjyau dih cingzyin caeuq cinhingz diuz daihit boux boux ma daengz lajmbwn couh miz cwyouz cinhyenz caeuq genzli bouxboux bingzdaengj gyoengq vunz miz lijsing caeuq liengzsim wngdang daih gyoengq de lumj beixnuengx'),

  # This is just the "version marker":
  ('ICELANDIC', 'qpdbmrmxyzptlkuuddlrlrbas las les qpdbmrmxyzptlkuuddlrlrbas el la qpdbmrmxyzptlkuuddlrlrbas'),
)

# Simple English with bad UTF-8
TEST_EN_LATN_BAD_UTF8 = b'Forty good bytes followed by bad UTF-8:\'\xC0\xA9\' and then good again.';

class TestCLD(unittest.TestCase):

  langsSeen = set()
  fullLangsSeen = set()

  def runOne(self, expectedLangName, s, doFull = False):
    if VERBOSE:
      print('')
      print('Test: %s [%d bytes]' % (expectedLangName, len(s)))
    failed = False
    for isPlainText in False, True:
      if doFull:
        detector = cld2full.detect
      else:
        detector = cld2.detect
      isReliable, textBytesFound, details = detector(s, isPlainText=isPlainText)
      if len(details) > 0:
        detectedLangName, detectedLangeCode = details[0][:2]

        if VERBOSE:
          print('  detected: %s' % detectedLangName)
          print('  reliable: %s' % (isReliable != 0))
          print('  textBytes: %s' % textBytesFound)
          print('  details: %s' % str(details))

        try:
          self.assertEqual(expectedLangName, detectedLangName, 'full?=%s %s != %s; details: %s' % (doFull, detectedLangName, expectedLangName, str(details)))
        except:
          traceback.print_exc()
          failed = True
          break
        if doFull:
          self.fullLangsSeen.add(detectedLangName)
        else:
          self.langsSeen.add(detectedLangName)
      else:
        try:
          self.fail('no language detected; expected %s' % expectedLangName)
        except:
          traceback.print_exc()
          failed = True
          break

    if failed:
      self.fail('some languages were wrong')

  def test_basic(self):
    for lang, text in testData:
      self.runOne(lang, text)
    for lang, text in fullTestData:
      self.runOne(lang, text, True)

  # End of per-language tests; start tests for specific functions:
  def test_vectors(self):
    for detector in cld2, cld2full:
      for lang, text in testData:
        isReliable, textBytesFound, details, vectors = detector.detect(text, returnVectors=True)
        self.assertTrue(textBytesFound > 0)
        if text == fr_en_Latn:
          self.assertEqual(3, len(vectors))
          self.assertEqual(('en', 'fr', 'en'), tuple(x[3] for x in vectors))

  def test_encoding_hint(self):
    for detector in cld2, cld2full:
      for lang, text in testData:
        for encoding in cld2.ENCODINGS:
          detector.detect(text, hintEncoding=encoding)

  def test_language_hint(self):
    for detector in cld2, cld2full:
      for lang, text in testData:
        for langHint in cld2.LANGUAGES:
          detector.detect(text, hintLanguage=langHint[0])
          detector.detect(text, hintLanguage=langHint[1])

  def test_top_level_domain_hint(self):
    for detector in cld2, cld2full:
      for lang, text in testData:
        detector.detect(text, hintTopLevelDomain='edu')
        detector.detect(text, hintTopLevelDomain='com')
        detector.detect(text, hintTopLevelDomain='id')

  def test_language_http_headers_hint(self):
    for detector in cld2, cld2full:
      for lang, text in testData:
        detector.detect(text, hintLanguageHTTPHeaders='mi,en')

  def test_debug_flags(self):
    for detector in cld2, cld2full:
      detector.detect(fr_en_Latn, debugScoreAsQuads=True)
      detector.detect(fr_en_Latn, debugHTML=True)
      detector.detect(fr_en_Latn, debugHTML=True, debugCR=True)
      detector.detect(fr_en_Latn, debugHTML=True, debugQuiet=True)
      detector.detect(fr_en_Latn, debugHTML=True, debugVerbose=True)
      detector.detect(fr_en_Latn, debugHTML=True, debugEcho=True)

  def test_unreliable(self):
    for detector in cld2, cld2full:
      isReliable, textBytesFound, details, vectors = detector.detect('interaktive infografik \xc3\xbcber videospielkonsolen', returnVectors = True)
      self.assertEqual(3, len(details))

  def test_random_bytes(self):
    for detector in cld2, cld2full:
      for i in range(100):
        # This hits SEGV in versions before 20141016:
        try:
          isReliable, textBytesFound, details, vectors = detector.detect(os.urandom(100), returnVectors = True)
        except detector.error:
          # expected
          pass

  def test_invalid_utf8(self):
    for detector in cld2, cld2full:
      try:
        isReliable, textBytesFound, details, vectors = detector.detect(TEST_EN_LATN_BAD_UTF8, returnVectors = True)
        self.fail('did not hit expected exception: %s vs %s' % (textBytesFound, len(TEST_EN_LATN_BAD_UTF8)))
      except detector.error:
        # expected
        pass
      except:
        print('GOT WRONG EXC: %s vs %s: %s' % (str(sys.exc_info()), cld2.error, cld2.error == sys.exc_info()[0]))

  def test_best_effort(self):
    for detector in cld2, cld2full:
      isReliable, textBytesFound, details = detector.detect('interaktive infografik \xc3\xbcber videospielkonsolen')

      # Too little text:
      self.assertFalse(isReliable)
      self.assertEqual(details[0][0], 'Unknown')

      # Do it again, forcing bestEffort:
      isReliable, textBytesFound, details = detector.detect('interaktive infografik \xc3\xbcber videospielkonsolen', bestEffort=True)
      self.assertTrue(isReliable)
      self.assertNotEqual(details[0][0], 'Unknown')

if __name__ == '__main__':
  try:
    unittest.main()
  finally:

    # Confirm that cld2.DETECTED_LANGUAGES == all languages detected by
    # the test cases:
    for lang in cld2.DETECTED_LANGUAGES:
      # Raises KeyError if lang was never detected by the test:
      TestCLD.langsSeen.remove(lang)
    # Confirm that no languages detected by the test were not listed in cld2.DETECTED_LANGUAGES:
    if len(TestCLD.langsSeen) != 0:
      raise RuntimeError('unexpected additional languages were detected: %s' % TestCLD.langsSeen)

    if False:
      l = list(TestCLD.fullLangsSeen)
      l.sort()
      for x in l:
        print('PyTuple_SET_ITEM(detLangs, upto++, PyUnicode_FromString("%s"));' % x)

    # Confirm that cld2full.DETECTED_LANGUAGES == all languages detected by
    # the test cases:

    #print('FULL %d: %s' % (len(TestCLD.fullLangsSeen), ', '.join(TestCLD.fullLangsSeen)))
    for lang in cld2full.DETECTED_LANGUAGES:
      # Raises KeyError if lang was never detected by the test:
      TestCLD.fullLangsSeen.remove(lang)
    # Confirm that no languages detected by the test were not listed in cld2full.DETECTED_LANGUAGES:
    if len(TestCLD.fullLangsSeen) != 0:
      raise RuntimeError('unexpected additional languages were detected: %s' % TestCLD.fullLangsSeen)


================================================
FILE: bindings/test_shuffle.py
================================================
import time
import re

USE_FULL_TABLES = True

if USE_FULL_TABLES:
  import cld2full as cld2detect
else:
  import cld2 as cld2detect

reOneLine = re.compile('^Samp ([^] ]+) /(.*?)/ (.*)$')
lineCount = 0

# NOTE: some lines have \r in them and we want to NOT make a new line for that:
def readlines(f):
  buf = bytearray()
  while True:
    c = f.read(1)
    if len(c) == 0:
      return
    buf.append(c[0])
    if c == b'\n': 
      yield buf.decode('utf-8')
      buf = bytearray()

correct = 0
wrong = 0
t0 = time.time()
with open('../cld2/internal/test_shuffle_1000_48_666.utf8', 'rb') as f:
  for lineCount, line in enumerate(readlines(f)):
    m = reOneLine.match(line)
    if m is None:
      raise RuntimeError('malformed line %d: %s' % (lineCount, line))
    lang = m.group(1)
    source = m.group(2)
    text = m.group(3)

    # Ignore odd combinations:
    if lang in ('ar-Latn',  # Arabic
                'hr-Cyrl',  # Croatian
                'ko-Latn',  # Korean
                'fa-Latn'):
      print('NOTE: skip odd lang/script combination %s: source=%s, text=%s' % (lang, source, text))
      continue
    
    isReliable, textBytesFound, details = cld2detect.detect(text, isPlainText=True)
    langCode = lang.split('-')[0]
    if langCode == details[0][1]:
    #if langCode in [x[1] for x in details]:
      correct += 1
    else:
      wrong += 1
      print("wrong: %s vs %s: %s" % (langCode, details, text))
    #print('%s: %s, %s' % (lang, isReliable, details))
    #print('%s: %s' % (lang, source))

t1 = time.time()
total = correct + wrong
print('Took %.1f sec (%.3f msec per test); %d correct of %d total: %.3f %% accuracy' % \
      (t1-t0,
       1000*(t1-t0)/total,
       correct,
       total,
       100.*correct/total))


================================================
FILE: cld2/LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: cld2/docs/CLD2UnitTestFullOutput.html
================================================
<html><meta charset="UTF-8"><body>
<style media="print" type="text/css"> :root { -webkit-print-color-adjust: exact; } </style>
<span style="font-size: 7pt">
file = cld2_unittest<br>
[en] <span style="background:#FFFFF4;color:#000000;">
confiscation of goods is assigned as the penalty part most of the courts consist of members and when it </span><br>
[] <span style="background:#FFFFF4;color:#000000;">
is necessary to bring public cases before a jury of members two courts combine for the purpose the most important cases of all are brought jurors or </span><br>
DocTote::Dump
[ 0]  en    253B   320p 25300R,
  2 chunks scored<br>
en.100R(99%) 254 bytes = ENGLISH  <br><br>
[hy] <span style="background:#F8FFD8;color:#007F1F;">
ա յ եվ նա հիացած աչքերով նայում է հինգհարկանի շենքի տարօրինակ փոքրիկ քառակուսի պատուհաններին դեռ մենք շատ ենք հետամնաց ասում է նա այսպես է </span><br>
DocTote::Dump
[ 1]  hy    255B   255p 25500R,
  1 chunks scored<br>
hy.100R(100%) 255 bytes = ARMENIAN  <br><br>
[chr] <span style="background:#FFEBD8;color:#007F1F;">
ᎠᎢᏍᎩ ᎠᏟᎶᏍᏗ ᏥᏄᏍᏛᎩ ᎦᎫᏍᏛᏅᎯ ᎾᎥᎢ </span><br>
DocTote::Dump
[11] chr     75B    75p 7500R,
  1 chunks scored<br>
chr.100R(100%) 75 bytes = CHEROKEE  <br><br>
[dv] <span style="background:#FFD8F7;color:#007F1F;">
ހިންދީ ބަހުން ވާހަކަ ދައްކާއިރު ދެވަނަ ބަހެއްގެ ގޮތުގައާއި އެނޫން ގޮތްގޮތުން ހިންދީ ބަހުން ވާހަކަ ދައްކާ މީހުންގެ އަދަދު މިލިއަނަށް </span><br>
DocTote::Dump
[10]  dv    249B   249p 24900R,
  1 chunks scored<br>
dv.100R(100%) 249 bytes = DHIVEHI  <br><br>
[ka] <span style="background:#FFEBD8;color:#3F7F00;">
ა ბირთვიდან მიღებული ელემენტი მენდელეევის პერიოდულ სიტემაში გადაინაცვლებს ორი უჯრით </span><br>
DocTote::Dump
[11]  ka    233B   233p 23300R,
  1 chunks scored<br>
ka.100R(100%) 233 bytes = GEORGIAN  <br><br>
[el] <span style="background:#D8FFE7;color:#7F2F00;">
ή αρνητική αναζήτηση λέξης κλειδιού καταστήστε τις μεμονωμένες λέξεις κλειδιά περισσότερο στοχοθετημένες με τη μετατροπή τους σε </span><br>
DocTote::Dump
[ 2]  el    242B   242p 24200R,
  1 chunks scored<br>
el.100R(100%) 242 bytes = GREEK  <br><br>
[gu] <span style="background:#EFD8FF;color:#6F7F00;">
આના પરિણામ પ્રમાણસર ફોન્ટ અવતરણ ચિન્હવાળા પાઠને છુપાવો બધા સમૂહો શોધાયા હાલનો જ સંદેશ વિષયની </span><br>
DocTote::Dump
[ 4]  gu    250B   250p 25000R,
  1 chunks scored<br>
gu.100R(100%) 250 bytes = GUJARATI  <br><br>
[iu] <span style="background:#D8FFF3;color:#007F7F;">
ᐃᑯᒪᒻᒪᑦ ᕿᓈᖏᓐᓇᓲᖑᒻᒪᑦ ᑎᑎᖅᑕᓕᒫᖅᓃᕕᑦ ᑎᑦᕆᐊᑐᓐᖏᑦᑕᑎᑦ ᑎᑎᖅᑕᑉᐱᑦ ᓯᕗᓂᖓᓂ ᑎᑎᖅᖃᖅ ᑎᑎᕆᐊᑐᓐᖏᑕᐃᑦ ᕿᓂᓲᖑᔪᒍᑦ ᑎᑎᖅᑕᓕᒫᖅᓃᕕᑦ </span><br>
DocTote::Dump
[13]  iu    254B   254p 25400R,
  1 chunks scored<br>
iu.100R(100%) 254 bytes = INUKTITUT  <br><br>
[kn] <span style="background:#FFEBD8;color:#6F7F00;">
ಂಠಯ್ಯನವರು ತುಮಕೂರು ಜಿಲ್ಲೆಯ ಚಿಕ್ಕನಾಯಕನಹಳ್ಳಿ ತಾಲ್ಲೂಕಿನ ತೀರ್ಥಪುರ ವೆಂಬ ಸಾಧಾರಣ ಹಳ್ಳಿಯ ಶ್ಯಾನುಭೋಗರ </span><br>
DocTote::Dump
[11]  kn    254B   254p 25400R,
  1 chunks scored<br>
kn.100R(100%) 254 bytes = KANNADA  <br><br>
[km] <span style="background:#D8FFFF;color:#007F1F;">
ក ខ គ ឃ ង ច ឆ ជ ឈ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ផ ព ភ ម យ រ ល វ ស ហ ឡ អ ឥ ឦ ឧ ឪ ឫ ឬ ឯ ឱ ទាំងអស់ </span><br>
DocTote::Dump
[ 8]  km    187B   187p 18700R,
  1 chunks scored<br>
km.100R(100%) 187 bytes = KHMER  <br><br>
[lo] <span style="background:#D8FFE7;color:#007F1F;">
ກຫາທົ່ວທັງເວັບ ແລະໃນເວັບໄຮ້ສາຍ ທຳອິດໃຫ້ທຳການຊອກຫາກ່ອນ ຈາກນັ້ນ ໃຫ້ກົດປຸ່ມເມນູ ໃນໜ້າຜົນໄດ້ </span><br>
DocTote::Dump
[ 2]  lo    256B   256p 25600R,
  1 chunks scored<br>
lo.100R(100%) 256 bytes = LAOTHIAN  <br><br>
[lif] <span style="background:#D8FFF3;color:#007F1F;">
ᤁᤡᤖᤠᤳ ᤕᤠᤰᤌᤢᤱ ᤆᤢᤶᤗᤢᤱᤖᤧ ᤛᤥᤎᤢᤱᤃᤧᤴ ᤀᤡᤔᤠᤴᤛᤡᤱ ᤆᤧᤶᤈᤱᤗᤧ ᤁᤢᤔᤡᤱᤅᤥ ᤏᤠᤈᤡᤖᤡ ᤋᤱᤒᤣ ᤒᤠ ᤈᤏᤘᤖᤡ ᤗᤠᤏᤢᤀᤠᤱ ᤁ᤹ᤏᤠ ᤋᤱᤒᤣ ᤁᤠᤰ ᤏᤠ᤺ᤳᤋᤢ ᤕᤢᤖᤢᤒᤠ ᤀᤡᤔᤠᤴᤛᤡᤱ ᤋᤱᤃᤡᤵᤛᤡᤱ ᤌᤡᤶᤒᤣᤴ ᤂᤠᤃᤴ ᤛᤡᤛᤣ᤺ᤰᤗᤠ ᤂᤧᤴ ᤀᤡᤛᤡᤰ ᤈᤏᤘᤖᤡ ᤀᤥ ᤏᤠᤛᤢᤵ ᤆᤥ᤺ᤰᤔᤠ ᤌᤡᤶᤒᤣ ᤋᤱᤃᤠᤶᤛᤡᤱᤗ ᤐᤳᤐᤠ ᤀᤡᤱᤄᤱ ᤘᤠ᤹ </span><br>
DocTote::Dump
[13] lif    580B   580p 58000R,
  1 chunks scored<br>
lif.100R(100%) 580 bytes = LIMBU  <br><br>
[ml] <span style="background:#E3D8FF;color:#7F5F00;">
ം അങ്ങനെ ഞങ്ങള് അവരുടെ മുമ്പില് നിന്നു ഔടും ഉടനെ നിങ്ങള് പതിയിരിപ്പില് നിന്നു എഴുന്നേറ്റു </span><br>
DocTote::Dump
[ 9]  ml    247B   247p 24700R,
  1 chunks scored<br>
ml.100R(100%) 247 bytes = MALAYALAM  <br><br>
[mn] <span style="background:#FFD8D8;color:#007F1F;">
ᠦᠭᠡ ᠵᠢᠨ ᠴᠢᠨᠭ ᠠ ᠬᠦᠨᠳᠡᠢ ᠵᠢ ᠢᠯᠭᠠᠬᠣ </span><br>
DocTote::Dump
[ 0]  mn     83B    83p 8300R,
  1 chunks scored<br>
mn.100R(100%) 83 bytes = MONGOLIAN  <br><br>
[or] <span style="background:#D8E7FF;color:#007F1F;">
ଅକ୍ଟୋବର ଡିସେମ୍ବର </span><br>
DocTote::Dump
[14]  or     48B    48p 4800R,
  1 chunks scored<br>
or.100R(100%) 48 bytes = ORIYA  <br><br>
[pa] <span style="background:#EFFFD8;color:#6F7F00;">
ਂ ਦਿਨਾਂ ਵਿਚ ਭਾਈ ਸਾਹਿਬ ਦੀ ਬੁੱਚੜ ਗੋਬਿੰਦ ਰਾਮ ਨਾਲ ਅੜਫਸ ਚੱਲ ਰਹੀ ਸੀ ਗੋਬਿੰਦ ਰਾਮ ਨੇ ਭਾਈ ਸਾਹਿਬ ਦੀਆਂ ਭੈਣਾ </span><br>
DocTote::Dump
[12]  pa    247B   247p 24700R,
  1 chunks scored<br>
pa.100R(100%) 247 bytes = PUNJABI  <br><br>
[si] <span style="background:#F8D8FF;color:#3F7F00;">
අනුරාධ මිහිඳුකුල නමින් සකුරා ට ලිපියක් තැපෑලෙන් එවා තිබුණා කි ් රස්ටි ෂෙල්ටන් ප ් රනාන්දු ද </span><br>
DocTote::Dump
[15]  si    243B   243p 24300R,
  1 chunks scored<br>
si.100R(100%) 243 bytes = SINHALESE  <br><br>
[syr] <span style="background:#EFFFD8;color:#007F1F;">
ܐܕܪܝܣ ܓܛܘ ܫܘܪܝܐ ܡܢ ܦܪܢܣܐ ܡܢ ܐܣܦܢܝܐ ܚܐܪܘܬܐ ܒܐܕܪ ܒܢܝܣܢ ܫܛܝܚܘܬܐ ܟܠܢܝܐ ܡܝ̈ܐ ܒܥܠܡܐ </span><br>
DocTote::Dump
[12] syr    143B   143p 14300R,
  1 chunks scored<br>
syr.100R(100%) 143 bytes = SYRIAC  <br><br>
[tl] <span style="background:#FFD8D8;color:#7F5F00;">
ᜋᜇ᜔ ᜐᜓᜎᜆ᜔ ᜃ ᜈᜅ᜔ ᜊᜌ᜔ᜊᜌᜒᜈ᜔ ᜂᜉᜅ᜔᜔ ᜋᜐᜈᜌ᜔ ᜎᜅ᜔ ᜁᜐ ᜉᜅ᜔ ᜀᜃ᜔ᜎᜆ᜔ ᜆᜓᜅ᜔ᜃᜓᜎ᜔ ᜐ ᜊᜌ᜔ᜊᜌᜒᜈ᜔ ᜐ ᜆᜒᜅᜒᜈ᜔ ᜃᜓ </span><br>
DocTote::Dump
[ 0]  tl    228B   228p 22800R,
  1 chunks scored<br>
tl.100R(100%) 228 bytes = TAGALOG  <br><br>
[ta] <span style="background:#D8E7FF;color:#7F5F00;">
அங்கு ராஜேந்திர சோழனால் கட்டப்பட்ட பிரம்மாண்டமான சிவன் கோவில் ஒன்றும் உள்ளது தொகு </span><br>
DocTote::Dump
[14]  ta    227B   227p 22700R,
  1 chunks scored<br>
ta.100R(100%) 227 bytes = TAMIL  <br><br>
[te] <span style="background:#EFFFD8;color:#7F5F00;">
ఁ దనర జయించిన తత్వ మరసి చూడఁ దాన యగును రాజయోగి యిట్లు తేజరిల్లుచు నుండు విశ్వదాభిరామ వినర వేమ </span><br>
DocTote::Dump
[12]  te    253B   253p 25300R,
  1 chunks scored<br>
te.100R(100%) 253 bytes = TELUGU  <br><br>
[th] <span style="background:#FFD8EB;color:#6F7F00;">
กฏในการค้นหา หรือหน้าเนื้อหา หากท่านเลือกลงโฆษณา ท่านอาจจะปรับต้องเพิ่มงบประมาณรายวันตา </span><br>
DocTote::Dump
[ 5]  th    257B   257p 25700R,
  1 chunks scored<br>
th.100R(100%) 257 bytes = THAI  <br><br>
[xx-Bugi] <span style="background:#FFD8EB;color:#6F7F00;">
ᨄᨛᨑᨊᨒ ᨑᨗ ᨔᨒᨗᨓᨛ ᨕᨗᨋᨗᨔᨗ ᨒᨛᨄ ᨑᨛᨔᨛᨆᨗᨊ </span><br>
DocTote::Dump
[ 5] xx-Bugi     91B    91p 9100R,
  1 chunks scored<br>
xx-Bugi.100R(100%) 91 bytes = X_Buginese  <br><br>
[xx-Goth] <span style="background:#FFF7D8;color:#7F5F00;">
𐌰 𐌰𐌱𐍂𐌰𐌷𐌰𐌼 𐌰𐌲𐌲𐌹𐌻𐌹𐍃𐌺𐍃 𐌸𐌹𐌿𐌳𐌹𐍃𐌺𐍃 𐍆𐍂𐌰𐌲𐌺𐌹𐍃𐌺𐍃 </span><br>
DocTote::Dump
[ 6] xx-Goth    142B   142p 14200R,
  1 chunks scored<br>
xx-Goth.100R(100%) 142 bytes = X_Gothic  <br><br>
[zh] <span style="background:#FFD8D8;color:#7F2F00;">
产品的简报和公告 提交该申请后无法进行更改 请确认您的选择是正确的 对于要提交的图书 我</span><br>
[] <span style="background:#FFD8D8;color:#7F2F00;">
确认 我是版权所有者或已得到版权所有者的授权 要更改您的国家 地区 请在此表的最上端更改您的 </span><br>
DocTote::Dump
[ 0]  zh    255B   506p 25500R,
  2 chunks scored<br>
zh.100R(99%) 256 bytes = Chinese  <br><br>
[zh-Hant] <span style="background:#FFD8EB;color:#3F7F00;">
之前為 帳單交易作業區 已變更 廣告內容 之前為 銷售代表 之前為 張貼日期為 百分比之前為 合約 為 目標對象條件已刪除 結束日期之前為 </span><br>
DocTote::Dump
[ 5] zh-Hant    184B   343p 18400R,
  1 chunks scored<br>
zh-Hant.100R(99%) 185 bytes = ChineseT  <br><br>
[ja] <span style="background:#D8FFFF;color:#000000;">
このペ ジでは アカウントに指定された予算の履歴を一覧にしています それぞれの項目に</span><br>
[] <span style="background:#D8FFFF;color:#000000;">
は 予算額と特定期間のステ タスが表示されます 現在または今後の予算を設定するには </span><br>
DocTote::Dump
[ 8]  ja    238B   766p 23800R,
  2 chunks scored<br>
ja.100R(99%) 239 bytes = Japanese  <br><br>
[ko] <span style="background:#E3D8FF;color:#000000;">
개별적으로 리포트 액세스 권한을 부여할 수 있습니다 액세스 권한 부여사용자에게 프로필 리포</span><br>
[] <span style="background:#E3D8FF;color:#000000;">
트에 액세스할 수 있는 권한을 부여하시려면 가용 프로필 상자에서 프로필 이름을 선택한 다음 </span><br>
DocTote::Dump
[ 9]  ko    255B   924p 25500R,
  2 chunks scored<br>
ko.100R(99%) 256 bytes = Korean  <br><br>
[ab] <span style="background:#D8FFE7;color:#007F7F;">
а зуа абзиара дақәшәоит ан лыбзиабара ахә амаӡам ауаҩы </span><br>
[] <span style="background:#D8FFE7;color:#007F7F;">
игәы иҭоу ихы иҿы ианубаалоит аҧҳәыс ҧшӡа ахацәа лышьҭоуп аҿаасҭа лара дрышьҭоуп </span><br>
DocTote::Dump
[ 2]  ab    251B   205p 25100R,
  2 chunks scored<br>
ab.100R(99%) 252 bytes = ABKHAZIAN  <br><br>
[aa] <span style="background:#D8F3FF;color:#007F7F;">
nagay tanito nagay tanto nagayna naharsi nahrur nake nala nammay nammay haytu nanu narig ne ni num numu o obare obe obe </span><br>
[] <span style="background:#D8F3FF;color:#007F7F;">
obisse oggole ogli olloyta ongorowe orbise othoga r rabe rade ra e rage rakub rasitte rasu reyta rog ruddi ruga s sa al bada sa ala </span><br>
DocTote::Dump
[ 3]  aa    252B   158p 25200R,
  2 chunks scored<br>
aa.100R(99%) 253 bytes = AFAR  <br><br>
[af] <span style="background:#FFD8EB;color:#007F1F;">
aam skukuza die naam beteken hy wat skoonvee of hy wat alles onderstebo keer wysig bo</span><br>
[] <span style="background:#FFD8EB;color:#007F1F;">
sveldkampe boskampe is kleiner afgeleë ruskampe wat oor min fasiliteite beskik daar is geen re</span><br>
[] <span style="background:#FFD8EB;color:#007F1F;">
staurante of winkels nie en slegs oornagbesoekers word toegelaat bateleur </span><br>
DocTote::Dump
[ 5]  af    254B   215p 25400R,
  3 chunks scored<br>
af.100R(99%) 255 bytes = AFRIKAANS  <br><br>
[ak] <span style="background:#F8FFD8;color:#001F7F;">
wɔwoo hilla limann mumu ɔpɛnimba afe wɔwoo no wɔ gwollu wɔ sisala mantaw mu nna ne </span><br>
[] <span style="background:#F8FFD8;color:#001F7F;">
maame yɛ mma hayawah ne papa so nna ɔyɛ babini yomu ɔwarr fulera limann ne mba yɛ esuon la</span><br>
[ak*.52/en.17] <span style="background:#F8FFD8;color:#001F7F;">
riba montia wɔwoo no limann baba limann sibi andan wɔwoo no limann lida limann danni limann zilla limann na salma limann ɔtenaa ase kɔpemm sanda kwakwa da ɛtɔ so wɔ afe wɔ </span><br>
DocTote::Dump
[ 1]  ak    364B   164p 20195R,
  3 chunks scored<br>
ak.55R(99%) 365 bytes = AKAN  <br><br>
[sq] <span style="background:#D8FFF3;color:#7F5F00;">
a do të kërkoni nga beogradi që të njohë pavarësinë e kosovës zoti tha</span><br>
[] <span style="background:#D8FFF3;color:#7F5F00;">
çi prishtina është gati ta njoh pavarësinë e serbisë ndërsa natyrisht se do të </span><br>
[] <span style="background:#D8FFF3;color:#7F5F00;">
kërkohet një gjë e tillë që edhe beogradi ta njoh shtetin e pavarur dhe sovran të </span><br>
DocTote::Dump
[13]  sq    253B   391p 25126R,
  3 chunks scored<br>
sq.99R(99%) 254 bytes = ALBANIAN  <br><br>
[am] <span style="background:#E3D8FF;color:#3F7F00;">
ለመጠይቅ ወደ እስክንድርያ ላኩዋቸውና የእስክንድርያ ጳጳስ አቴናስዮስ ፍሬምንጦስን እራሳቸውን ሾመው ልከዋል ከዚያ እስከ ዓ ም ድረስ የኢትዮጵያ አቡነ </span><br>
DocTote::Dump
[ 9]  am    249B    78p 24900R,
  1 chunks scored<br>
am.100R(99%) 250 bytes = AMHARIC  <br><br>
[ar*.28/ps.7] <span style="background:#FFF7D8;color:#6F7F00;">
احتيالية بيع أي حساب </span><br>
DocTote::Dump
[ 6]  ar     38B    28p 2736R,
  1 chunks scored<br>
ar.72R(97%) 39 bytes = ARABIC  <br><br>
[as] <span style="background:#F8D8FF;color:#007F1F;">
অঞ্চল নতুন সদস্যবৃন্দ সকলোৱে ভৰ্তি হব পাৰে মুল পৃষ্ঠা জন লেখক গুগ ল দল সাৰাংশ ই পত্ৰ টা বাৰ্তা এজন </span><br>
DocTote::Dump
[15]  as    257B    75p 25700R,
  1 chunks scored<br>
as.100R(99%) 258 bytes = ASSAMESE  <br><br>
[ay] <span style="background:#EFD8FF;color:#007F7F;">
aru wijar aru ispañula ukaran aru witanam aru kurti aru kalis aru warani aru malta aru yatiyawi niya jaki</span><br>
[ay*.55/et.24] <span style="background:#EFD8FF;color:#007F7F;">
tanaka isluwiñ aru lmir phuran aru masirunan aru purtukal aru kruwat aru jakira urtu aru inklisa pirsan aru suyku aru malay aru jisk aptayma thaya </span><br>
DocTote::Dump
[ 4]  ay    254B   108p 17996R,
  2 chunks scored<br>
ay.70R(99%) 255 bytes = AYMARA  <br><br>
[az] <span style="background:#FFD8F7;color:#3F7F00;">
a az qalıb breyn rinq intellektual oyunu üzrə yarışın zona mərhələləri keçiri</span><br>
[] <span style="background:#FFD8F7;color:#3F7F00;">
lib miq un qalıqlarının dənizdən çıxarılması davam edir məhə</span><br>
[] <span style="background:#FFD8F7;color:#3F7F00;">
mməd peyğəmbərin karikaturalarını çap edən qəzetin baş redaktoru iş otağında ölüb </span><br>
DocTote::Dump
[10]  az    256B   337p 25600R,
  3 chunks scored<br>
az.100R(99%) 257 bytes = AZERBAIJANI  <br><br>
[ba*.81/tt.31] <span style="background:#FFD8EB;color:#007F7F;">
арналђан бындай ђилми эш тіркињлњ тњјге тапєыр нњшер ителњ ғинуар бєхет именлектє етешлектє ауыл ўќмерџєре хеџмєт юлын ћайлаѓанда </span><br>
DocTote::Dump
[ 5]  ba    242B    81p 15972R,
  1 chunks scored<br>
ba.66R(99%) 243 bytes = BASHKIR  <br><br>
[eu] <span style="background:#E3D8FF;color:#6F7F00;">
a den eraso bat honen kontra hortaz eragiketa bakarrik behar dituen eraso batek aes </span><br>
[] <span style="background:#E3D8FF;color:#6F7F00;">
apurtuko luke nahiz eta oraingoz eraso bideraezina izan gaur egungo teknologiaren mugak </span><br>
[] <span style="background:#E3D8FF;color:#6F7F00;">
direla eta oraingoz kezka hauek alde batera utzi daitezke orain arteko indar </span><br>
DocTote::Dump
[ 9]  eu    249B   331p 24900R,
  3 chunks scored<br>
eu.100R(99%) 250 bytes = BASQUE  <br><br>
[be] <span style="background:#F8D8FF;color:#7F5F00;">
а друкаваць іх не было тэхнічна магчыма бліжэй за вільню тым самым ча</span><br>
[] <span style="background:#F8D8FF;color:#7F5F00;">
сам нямецкае кіраўніцтва прапаноўвала апроч ўвядзення лацінкі яе </span><br>
DocTote::Dump
[15]  be    248B   269p 24800R,
  2 chunks scored<br>
be.100R(99%) 249 bytes = BELARUSIAN  <br><br>
[bn] <span style="background:#FFD8EB;color:#7F5F00;">
ংখ্যা নমুনায়ন বিন্যাস পরিসংখ্যানিক মডেল পরিসংখ্যানিক সিদ্ধান্ত ফাংশন পরিসংখ্যানিক </span><br>
DocTote::Dump
[ 5]  bn    231B   164p 23100R,
  1 chunks scored<br>
bn.100R(99%) 232 bytes = BENGALI  <br><br>
[hi] <span style="background:#D8F3FF;color:#7F5F00;">
विकिपीडिया इंटरनेट आधारित एक मुक्त ज्ञानकोष परियोजना ह ई विकि के रुप मेँ </span><br>
[bh] <span style="background:#D8F3FF;color:#6F7F00;">
बा यानी एगो अईसन जाल पृष्ठ जे सभन के संपादन करे के छूट देवेला विकिपी</span><br>
[] <span style="background:#D8F3FF;color:#6F7F00;">
डिया शब्द विकि अउर इनसाइक्लोपीडिया ज्ञानकोष शब्दन के मिला के बनल बा विकिपीडिया </span><br>
[hi] <span style="background:#D8F3FF;color:#7F5F00;">
एक बहुभाषीय प्रकल्प ह अउर स्वयंसेवकन के सहकार से निर्मित बा जेहु के भी इंटर</span><br>
[bh] <span style="background:#D8F3FF;color:#6F7F00;">
नेट तक पहुँच बा ऊ विकिपीडिया पर लिख सकत बा अउर लेखन के संपादन कर सकत बा </span><br>
DocTote::Dump
[ 3]  hi    390B   209p 38803R,
[11]  bh    569B   396p 56900R,
  5 chunks scored<br>
{CloseLangPair: hi.99R,390B => bh}<br>
bh.99R(99%) 960 bytes = BIHARI  <br><br>
[bi] <span style="background:#FFF7D8;color:#007F7F;">
king wantaem nomo hem i sakem setan mo ol rabis enjel blong hem oli aot long heven oli kamdaon long </span><br>
[] <span style="background:#FFF7D8;color:#007F7F;">
wol taswe ol samting oli kam nogud olgeta long wol ya stat long revelesen ol faet </span><br>
[] <span style="background:#FFF7D8;color:#007F7F;">
kakae i sot ol sik mo fasin blong brekem loa oli kam antap olgeta samting </span><br>
DocTote::Dump
[ 6]  bi    256B   368p 25600R,
  3 chunks scored<br>
bi.100R(99%) 257 bytes = BISLAMA  <br><br>
[br] <span style="background:#E3D8FF;color:#0F7F00;">
a chom met leuskel a ra e blas da jack irons dilabour hag aet kuit eus what is this dibab a reont da c houde michael beinhorn </span><br>
[] <span style="background:#E3D8FF;color:#0F7F00;">
evit produiñ an trede pladenn kavet e vez ar ganaouennoù buhan ha buhan ganto setu stummet ar bladenn adkavet e vez enni funk </span><br>
DocTote::Dump
[ 9]  br    254B   238p 24140R,
  2 chunks scored<br>
br.95R(99%) 255 bytes = BRETON  <br><br>
[bg] <span style="background:#FFEBD8;color:#7F2F00;">
а дума попада в състояние на изпитание ключовите думи с предсказана малко под то изискване на страниците за търсене в </span><br>
DocTote::Dump
[11]  bg    216B   146p 21600R,
  1 chunks scored<br>
bg.100R(99%) 217 bytes = BULGARIAN  <br><br>
[my] <span style="background:#E3FFD8;color:#007F1F;">
တက္ကသုိလ္ မ္ဟ ပ္ရန္ လာ္ရပီးေနာက္ န္ဟစ္ အရ္ဝယ္ ဦးသန္ ့သည္ ပန္ းတနော္ အမ္ယုိးသား ေက္ယာင္ း </span><br>
DocTote::Dump
[ 7]  my    242B   242p 24200R,
  1 chunks scored<br>
my.100R(100%) 242 bytes = BURMESE  <br><br>
[ca] <span style="background:#E3FFD8;color:#6F7F00;">
al final en un únic lloc nhorabona l correu electrònic està concebut com a eina de productivitat aleshores per què perdre el temps ar</span><br>
[] <span style="background:#E3FFD8;color:#6F7F00;">
xivant missatges per després intentar recordar on els veu desar i per què heu d eliminar missatges importants per l </span><br>
DocTote::Dump
[ 7]  ca    255B   173p 25500R,
  2 chunks scored<br>
ca.100R(99%) 256 bytes = CATALAN  <br><br>
[ceb*.63/war.57] <span style="background:#FFD8EB;color:#001F7F;">
ang sugbo usa sa mga labing ugmad nga lalawigan sa nasod kini </span><br>
[] <span style="background:#FFD8EB;color:#001F7F;">
ang sentro sa komersyo edukasyon ug industriya sa sentral ug habagatang dapit sa kapupod an ang mipada</span><br>
[] <span style="background:#FFD8EB;color:#001F7F;">
yag sa sugbo isip ikapito nga labing nindot nga pulo sa ang nag </span><br>
[war*.43/ceb.39] <span style="background:#FFF7D8;color:#0F007F;">
inusarang pulo sa pilipinas nga napasidunggan sa maong magasin sukad pa sa tuig </span><br>
DocTote::Dump
[ 5] ceb    228B   212p 19700R,
[ 6] war     80B    43p 3200R,
  4 chunks scored<br>
{Unreli war.40R,80B} ceb.86R(73%) 309 bytes = CEBUANO* <br><br>
[co] <span style="background:#FFD8D8;color:#007F4F;">
a prupusitu di risultati for utilizà a scatula per ricercà ind issi risultati servore errore u servore ha incuntratu una er</span><br>
[] <span style="background:#FFD8D8;color:#007F4F;">
rore pruvisoria é ùn ha pussutu compie a vostra dumanda per piacè acimenta dinò ind una minuta tuttu listessu ligami truvà i </span><br>
DocTote::Dump
[ 0]  co    255B   165p 25500R,
  2 chunks scored<br>
co.100R(99%) 256 bytes = CORSICAN  <br><br>
[sr] <span style="background:#D8FFF3;color:#7F2F00;">
posljednja dva vladara su kijaksar </span><br>
[el] <span style="background:#D8FFE7;color:#7F2F00;">
κυαξαρης </span><br>
[hr] <span style="background:#EFFFD8;color:#7F2F00;">
prije krista fraortov sin koji će proširiti teritorij medije i astijag kijaksar je imao kćer ili unuku </span><br>
[] <span style="background:#EFFFD8;color:#7F2F00;">
koja se zvala amitis a postala je ženom nabukodonosora ii kojoj je ovaj izgradio viseće vrtove babilona kijaksar je modernizirao svoju vojsku i </span><br>
[] <span style="background:#EFFFD8;color:#7F2F00;">
uništio ninivu prije krista naslijedio ga je njegov sin posljednji medijski kralj asti</span><br>
[sr] <span style="background:#D8FFF3;color:#7F2F00;">
jag kojega je detronizirao srušio sa vlasti njegov unuk kir veliki zemljom su zavladali perzijanci </span><br>
DocTote::Dump
[ 2]  el     18B    18p 1800R,
[12]  hr    339B   234p 33900R,
[13]  sr    135B   122p 13500R,
  6 chunks scored<br>
{CloseLangPair: sr.100R,135B => hr}<br>
hr.100R(95%) el.100R(4%) 494 bytes = CROATIAN  <br><br>
[cs] <span style="background:#F8FFD8;color:#7F2F00;">
a akci opakujte film uložen vykreslit gmail tokio smazat obsah adresáře nelze </span><br>
[] <span style="background:#F8FFD8;color:#7F2F00;">
načíst systémový profil jednotky smoot okud používáte pro určení polokoule značky z </span><br>
[] <span style="background:#F8FFD8;color:#7F2F00;">
západ nebo v východ používejte nezáporné hodnoty zeměpisné délky nelze </span><br>
DocTote::Dump
[ 1]  cs    255B   320p 25176R,
  3 chunks scored<br>
cs.98R(99%) 256 bytes = CZECH  <br><br>
[da] <span style="background:#F8FFD8;color:#000000;">
a z tallene og punktummer der er tilladte log ud angiv den ønskede adgangskode igen november gem personlige oplysni</span><br>
[] <span style="background:#F8FFD8;color:#000000;">
nger kontrolspørgsmål det sidste tegn i dit brugernavn skal være et bogstav a z eller tal skriv de tegn du kan se i billedet nedenfor </span><br>
DocTote::Dump
[ 1]  da    253B   214p 25300R,
  2 chunks scored<br>
da.100R(99%) 254 bytes = DANISH  <br><br>
[nl] <span style="background:#D8FFE7;color:#000000;">
a als volgt te werk om een configuratiebestand te maken sitemap gen py ebruik filters om de s op te geven die </span><br>
[] <span style="background:#D8FFE7;color:#000000;">
moeten worden toegevoegd of uitgesloten op basis van de opmaaktaal elke sitemap mag alleen de s bevatten voor een bepaalde opmaaktaal dit </span><br>
DocTote::Dump
[ 2]  nl    248B   226p 24800R,
  2 chunks scored<br>
nl.100R(99%) 249 bytes = DUTCH  <br><br>
[dz] <span style="background:#E3FFD8;color:#007F7F;">
རྩིས བརྐྱབ ཚུལ ལྡན དང ངེས བདེན སྦ སྟོན ནིའི དོན ལུ ཁྱོད གུག ཤད ལག ལེན འཐབ དགོ ག དང ཨིན པུཊི གྲལ ཐིག གུ </span><br>
DocTote::Dump
[ 7]  dz    257B   171p 25700R,
  1 chunks scored<br>
dz.100R(99%) 258 bytes = DZONGKHA  <br><br>
[en] <span style="background:#FFFFF4;color:#000000;">
a backup credit card by visiting your billing preferences page or visit the adwords help centre for more de</span><br>
[] <span style="background:#FFFFF4;color:#000000;">
tails https adwords google com support bin answer py answer </span><br>
[] <span style="background:#FFFFF4;color:#000000;">
hl en we were unable to process the payment of for your outstanding google adwords </span><br>
DocTote::Dump
[ 0]  en    250B   247p 25000R,
  3 chunks scored<br>
en.100R(99%) 251 bytes = ENGLISH  <br><br>
[eo] <span style="background:#D8FFFF;color:#6F7F00;">
a jarcento refoje per enmetado de koncerna pastro tiam de reformita konfesio ekde refoje ekzistis luteranaj komunumanoj tamen tiuj </span><br>
[] <span style="background:#D8FFFF;color:#6F7F00;">
fondis propran komunumon nur en ambaŭ apartenis ekde al la evangela eklezio en prusio resp ties rejnlanda provinceklezio en </span><br>
DocTote::Dump
[ 8]  eo    256B   143p 25350R,
  2 chunks scored<br>
eo.99R(99%) 257 bytes = ESPERANTO  <br><br>
[et] <span style="background:#D8FFFF;color:#7F2F00;">
a niipea kui sinu maksimaalne igakuine krediidi limiit on meie poolt heaks kiidetud on sinu kohustuseks see krediidilimiit </span><br>
DocTote::Dump
[ 8]  et    123B   121p 12300R,
  1 chunks scored<br>
et.100R(99%) 124 bytes = ESTONIAN  <br><br>
[fo] <span style="background:#FFF7D8;color:#3F7F00;">
at verða átaluverdar óhóskandi ella áloypandi vit kunnu ikki garanterða at google leita</span><br>
[] <span style="background:#FFF7D8;color:#3F7F00;">
nin ikki finnur naka sum er áloypandi óhóskandi ella átaluvert og google </span><br>
[] <span style="background:#FFF7D8;color:#3F7F00;">
tekur onga ábyrgd yvir tær síður sum koma við í okkara leitiskipan fá tær ein </span><br>
DocTote::Dump
[ 6]  fo    256B   240p 25600R,
  3 chunks scored<br>
fo.100R(99%) 257 bytes = FAROESE  <br><br>
[fj] <span style="background:#D8FFFF;color:#007F7F;">
i kina na i iri ka duatani na matana main a meke wesi se meke mada na meke ni yaqona oqo na meke ka dau vakayagataki </span><br>
[] <span style="background:#D8FFFF;color:#007F7F;">
ena yaqona vakaturaga e dau caka toka ga kina na vucu ka dau lagati tiko kina na ka e yaco tiko na talo ni wai ni yaqona na lewai ni wai </span><br>
DocTote::Dump
[ 8]  fj    254B   241p 25400R,
  2 chunks scored<br>
fj.100R(99%) 255 bytes = FIJIAN  <br><br>
[fi] <span style="background:#D8F3FF;color:#000000;">
a joilla olet käynyt tämä kerro meille kuka ä olet ei tunnistettavia käyttötietoja kuten vi</span><br>
[] <span style="background:#D8F3FF;color:#000000;">
rheraportteja käytetään google desktopin parantamiseen etsi näyttää mukau</span><br>
[] <span style="background:#D8F3FF;color:#000000;">
tettuja uutisia google desktop keskivaihto leikkaa voit kaksoisnapsauttaa </span><br>
DocTote::Dump
[ 3]  fi    250B   300p 25000R,
  3 chunks scored<br>
fi.100R(99%) 251 bytes = FINNISH  <br><br>
[fr] <span style="background:#EFD8FF;color:#000000;">
a accès aux collections et aux frontaux qui lui ont été attribués il peut consulter </span><br>
[] <span style="background:#EFD8FF;color:#000000;">
et modifier ses collections et exporter des configurations de collection toutefois il ne </span><br>
[] <span style="background:#EFD8FF;color:#000000;">
peut pas créer ni supprimer des collections enfin il a accès aux fonctions </span><br>
DocTote::Dump
[ 4]  fr    254B   236p 25323R,
  3 chunks scored<br>
fr.99R(99%) 255 bytes = FRENCH  <br><br>
[fy] <span style="background:#D8F3FF;color:#3F7F00;">
adfertinsjes gewoan lytse adfertinsjes mei besibbe siden dy t </span><br>
[] <span style="background:#D8F3FF;color:#3F7F00;">
fan belang binne foar de ynhâld fan jo berjochten wolle jo mear witte fan gmail foardat jo jo oa</span><br>
[] <span style="background:#D8F3FF;color:#3F7F00;">
nmelde gean dan nei wy wurkje eltse dei om gmail te ferbetterjen dêrta sille wy jo sa út en </span><br>
DocTote::Dump
[ 3]  fy    253B   263p 24621R,
  3 chunks scored<br>
fy.97R(99%) 254 bytes = FRISIAN  <br><br>
[gl] <span style="background:#F8D8FF;color:#7F2F00;">
debe ser como mínimo taranto tendas de venda polo miúdo cociñas servizos bordado canadá </span><br>
[] <span style="background:#F8D8FF;color:#7F2F00;">
viaxes parques de vehículos de recreo hotel oriental habitación recibir unha postal no enderezo indicado anteriormente </span><br>
DocTote::Dump
[15]  gl    213B   152p 20932R,
  2 chunks scored<br>
gl.98R(99%) 214 bytes = GALICIAN  <br><br>
[lg] <span style="background:#D8E7FF;color:#004F7F;">
abaana ba bani lukaaga mu ana mu babiri abaana ba bebayi lukaaga mu abiri mu basatu abaana ba azugaadi lukumi mu ebikumi </span><br>
[] <span style="background:#D8E7FF;color:#004F7F;">
bibiri mu abiri mu babiri abaana ba adonikamu lukaaga mu nltaaga mu mukaaga abaana ba biguvaayi enkumi bbiri mu ataano mu mukaaga </span><br>
DocTote::Dump
[14]  lg    251B   204p 25100R,
  2 chunks scored<br>
lg.100R(99%) 252 bytes = GANDA  <br><br>
[de] <span style="background:#FFD8EB;color:#000000;">
abschnitt ordner aktivieren werden die ordnereinstellungen im farbabschnitt deaktiviert öchten </span><br>
[] <span style="background:#FFD8EB;color:#000000;">
sie wirklich fortfahren eldtypen angeben optional n diesem schritt ge</span><br>
[] <span style="background:#FFD8EB;color:#000000;">
ben sie für jedesfeld aus dem datenset den typ an ieser schritt ist optional eldtypen </span><br>
DocTote::Dump
[ 5]  de    252B   240p 25200R,
  3 chunks scored<br>
de.100R(99%) 253 bytes = GERMAN  <br><br>
[kl] <span style="background:#E3D8FF;color:#007F7F;">
at nittartakkalli uani toqqarsimasatta akornanni nittartakkanut allanut ingerlaqqitto</span><br>
[] <span style="background:#E3D8FF;color:#007F7F;">
qarsinnaavoq kanukoka tassaavoq kommuneqarfiit kattuffiat nuna tamakkerlu</span><br>
[] <span style="background:#E3D8FF;color:#007F7F;">
gu kommunit nittartagaannut ingerlaqqiffiusinnaasoq kisitsiserpassuit nunatsinnut tunngasut </span><br>
DocTote::Dump
[ 9]  kl    250B   439p 25000R,
  3 chunks scored<br>
kl.100R(99%) 251 bytes = GREENLANDIC  <br><br>
[gn] <span style="background:#FFD8EB;color:#0F7F00;">
aháta añe ë ne mbo ehára ndive ajeruréta chupe oporandujey haĝua peëme </span><br>
[] <span style="background:#FFD8EB;color:#0F7F00;">
mba épa pekaru ha áĝa oporandúvo nde eréta avei re paraguaýpe kachíke he i leúpe </span><br>
[] <span style="background:#FFD8EB;color:#0F7F00;">
ndépa re úma kure tatakuápe ha leu ombohovái héë ha ujepéma kachíke he ijey </span><br>
DocTote::Dump
[ 5]  gn    251B   306p 25100R,
  3 chunks scored<br>
gn.100R(99%) 252 bytes = GUARANI  <br><br>
[ht] <span style="background:#FFEBD8;color:#007F7F;">
ak pitit tout sosyete a chita se pou sa leta dwe pwoteje yo nimewo leta fèt pou li pwoteje </span><br>
[] <span style="background:#FFEBD8;color:#007F7F;">
tout paran ak pitit nan peyi a menm jan kit paran yo marye kit yo pa marye tout </span><br>
[] <span style="background:#FFEBD8;color:#007F7F;">
manman ki fè pitit leta fèt pou ba yo konkoul menm jan tou pou timoun piti ak pou </span><br>
DocTote::Dump
[11]  ht    256B   342p 25600R,
  3 chunks scored<br>
ht.100R(99%) 257 bytes = HAITIAN_CREOLE  <br><br>
[ha] <span style="background:#FFD8F7;color:#007F7F;">
a cikin a kan sakamako daga sakwannin a kan sakamako daga sakwannin daga ranar zuwa a kan sakamako </span><br>
[] <span style="background:#FFD8F7;color:#007F7F;">
daga guda daga ranar zuwa a kan sakamako daga shafukan daga ranar zuwa a kan sakamako daga guda a cikin last hour a kan sakamako daga guda daga kafar </span><br>
DocTote::Dump
[10]  ha    249B   227p 24900R,
  2 chunks scored<br>
ha.100R(99%) 250 bytes = HAUSA  <br><br>
[haw] <span style="background:#EFD8FF;color:#001F7F;">
he puke noi i kū ikena kūnoa o wikipikia e olu olu nō e hā awi mai i kāu ike kāu mana o a me kou leo no ke kūkulu </span><br>
[] <span style="background:#EFD8FF;color:#001F7F;">
ana a me ke kāko o ana mai i ka wikipikia hawai i he kahua pūnaewele hawai i kēia no ka ho oulu </span><br>
[] <span style="background:#EFD8FF;color:#001F7F;">
ana i ka ike hawai i inā hiki iā oe ke ōlelo hawai i e olu olu nō e kōkua mai a e ho ololi i nā atikala ma ane i a pono e ha i </span><br>
[] <span style="background:#EFD8FF;color:#001F7F;">
aku i kou mau hoa aloha e pili ana i ka wikipikia hawai i e ola mau nō ka ōlelo hawai i a mau loa aku </span><br>
DocTote::Dump
[ 4] haw    457B   328p 45700R,
  4 chunks scored<br>
haw.100R(99%) 458 bytes = HAWAIIAN  <br><br>
[iw] <span style="background:#FFF7D8;color:#000000;">
או לערוך את העדפות ההפצה אנא עקוב אחרי השלבים הבאים כנס לחשבון האישי שלך ב </span><br>
DocTote::Dump
[ 6]  iw    135B   117p 13500R,
  1 chunks scored<br>
iw.100R(99%) 136 bytes = HEBREW  <br><br>
[hi] <span style="background:#D8F3FF;color:#7F5F00;">
ं ऐडवर्ड्स विज्ञापनों के अनुभव पर आधारित हैं और इनकी मदद से आपको अपने विज्ञापनों का अधिकतम लाभ </span><br>
DocTote::Dump
[ 3]  hi    249B   182p 24900R,
  1 chunks scored<br>
hi.100R(99%) 250 bytes = HINDI  <br><br>
[blu] <span style="background:#D8FFFF;color:#001F7F;">
kuv hlub koj txawm lub ntuj yuav si ntshi nphaus los kuv tsis ua siab nkaug txa</span><br>
[] <span style="background:#D8FFFF;color:#001F7F;">
wm ntiab teb yuav si ntshi nphaus los kuv tseem ua lon tsaug vim kuv hlub koj tag lub siab </span><br>
DocTote::Dump
[ 8] blu    170B   354p 17000R,
  2 chunks scored<br>
blu.100R(99%) 171 bytes = HMONG  <br><br>
[hu] <span style="background:#E3FFD8;color:#7F2F00;">
a felhasználóim a google azonosító szöveget ikor látják a felhasználóim a google azo</span><br>
[] <span style="background:#E3FFD8;color:#7F2F00;">
nosító szöveget felhasználók a google azonosító szöveget fo</span><br>
[] <span style="background:#E3FFD8;color:#7F2F00;">
gják látni minden tranzakció után ha a vásárlását regisztrációját oldalunk </span><br>
DocTote::Dump
[ 7]  hu    246B   348p 24600R,
  3 chunks scored<br>
hu.100R(99%) 247 bytes = HUNGARIAN  <br><br>
[is] <span style="background:#D8F3FF;color:#7F2F00;">
a afköst leitarorða þinna leitarorð neikvæð leitarorð auglýsingahópa byggja upp aðallista yfir ný leitarorð fyrir </span><br>
[] <span style="background:#D8F3FF;color:#7F2F00;">
auglýsingahópana og skoða ítarleg gögn um árangur leitarorða eins og samkeppni auglýsenda og leitarmagn er krafist notkun </span><br>
DocTote::Dump
[ 3]  is    256B   295p 25600R,
  2 chunks scored<br>
is.100R(99%) 257 bytes = ICELANDIC  <br><br>
[ig] <span style="background:#D8FFE7;color:#001F7F;">
chineke bụ aha ọzọ ndï omenala igbo kpọro chukwu mgbe ndị bekee bịara ha mee ya nke ndi christian n </span><br>
[] <span style="background:#D8FFE7;color:#001F7F;">
echiche ndi ekpere chi omenala ndi igbo christianity judaism ma islam chineke nwere ọtụtụ utu aha ma nwee nanị </span><br>
[] <span style="background:#D8FFE7;color:#001F7F;">
otu aha ụzọ abụọ e si akpọ aha ahụ bụ jehovah ma ọ bụ yahweh na ọtụtụ akwụkwọ nsọ e wepụla </span><br>
[] <span style="background:#D8FFE7;color:#001F7F;">
aha chineke ma jiri utu aha bụ onyenwe anyị ma ọ bụ chineke dochie ya ma mgbe e </span><br>
[] <span style="background:#D8FFE7;color:#001F7F;">
dere akwụkwọ nsọ aha ahụ bụ jehova pụtara n ime ya ihe dị ka ugboro pụkụ asaa </span><br>
DocTote::Dump
[ 2]  ig    539B   602p 53900R,
  5 chunks scored<br>
ig.100R(99%) 540 bytes = IGBO  <br><br>
[id] <span style="background:#FFF7D8;color:#7F5F00;">
geng pengembaraan bermula adalah film animasi d cgi pertama yang diproduksi di malaysia film ini dibuat oleh les co</span><br>
[] <span style="background:#FFF7D8;color:#7F5F00;">
paque production lcp dan dirilis di bioskop bioskop seluruh malaysia pada februari film geng pertama kali diluncurkan da</span><br>
[] <span style="background:#FFF7D8;color:#7F5F00;">
lam sebuah acara peluncuran pada september bersama dengan serial animasi pendek upin ipin yang berhubungan dengan </span><br>
[] <span style="background:#FFF7D8;color:#7F5F00;">
film tersebut pembuatan film ini didukung oleh berbagai pihak </span><br>
[] <span style="background:#FFF7D8;color:#7F5F00;">
seperti kementerian sains teknologi dan inovasi malaysia mosti dengan memberi bantuan berupa dana sebesar rm juta </span><br>
DocTote::Dump
[ 6]  id    525B   526p 51818R,
  5 chunks scored<br>
id.98R(99%) 526 bytes = INDONESIAN  <br><br>
[ia] <span style="background:#FFD8F7;color:#6F7F00;">
super le sitos que tu visita isto es necessari pro render disponibile alcun functionalitates del barra de utensiles a fin que nos pote monstrar informationes ulterior super un sito le barra de utensiles debe dicer a nos le </span><br>
DocTote::Dump
[10]  ia    223B    83p 22300R,
  1 chunks scored<br>
ia.100R(99%) 224 bytes = INTERLINGUA  <br><br>
[ie] <span style="background:#F8FFD8;color:#007F4F;">
abhorre exceptiones in li derivation plu cardinal por un l i es li regularità del flexion conjugation ples comparar </span><br>
[] <span style="background:#F8FFD8;color:#007F4F;">
latino sine flexione e li antiqui projectes naturalistic queles have quasi null regules de derivation ma si on nu examina li enunciationes </span><br>
DocTote::Dump
[ 1]  ie    256B   151p 23793R,
  2 chunks scored<br>
ie.92R(99%) 257 bytes = INTERLINGUE  <br><br>
[om*.21/so.18] <span style="background:#D8FFE7;color:#004F7F;">
kuubuuraqabniqsuq ataruamik colville mi aasii tavrani siku kilaabman sulukpaukkat makua niksisu</span><br>
[kl*.27/ik.16] <span style="background:#E3D8FF;color:#007F7F;">
grufagivut tavrani sunaimña atifa quaqqat ii quaqqat aasii ukiabmagu </span><br>
[ik*.24/kl.18] <span style="background:#EFFFD8;color:#007F7F;">
utiqhuta tamaufa utqiabvifñun aasiiñ tatpaaffaqapta tuvaaqatinifarufa aasiiñ </span><br>
DocTote::Dump
[ 2]  om     95B    21p    0R,
[ 9]  kl     70B    27p    0R,
[12]  ik     80B    24p 4800R,
  3 chunks scored<br>
{Unreli om.0R,95B} {Unreli kl.0R,70B} ik.60R(32%) 246 bytes = INUPIAK* <br><br>
[ga] <span style="background:#D8E7FF;color:#7F2F00;">
a bhfuil na focail go léir i do cheist le fáil orthu ní gá ach focail breise a chur </span><br>
[] <span style="background:#D8E7FF;color:#7F2F00;">
leis na cinn a cuardaíodh cheana chun an cuardach a bheachtú nó a chúngú má chuir</span><br>
[] <span style="background:#D8E7FF;color:#7F2F00;">
tear focal breise isteach aimseofar fo aicme ar leith de na torthaí a fuarthas </span><br>
DocTote::Dump
[14]  ga    255B   350p 25500R,
  3 chunks scored<br>
ga.100R(99%) 256 bytes = IRISH  <br><br>
[it] <span style="background:#E3FFD8;color:#000000;">
a causa di un intervento di manutenzione del sistema fino alle ore circa ora legale costa del pacifico del novembre le campagne esistenti continueranno a essere pubblicate come di consueto anche durante questo breve periodo di inattività ci scusiamo per </span><br>
DocTote::Dump
[ 7]  it    255B   120p 25500R,
  1 chunks scored<br>
it.100R(99%) 256 bytes = ITALIAN  <br><br>
[jw] <span style="background:#FFD8D8;color:#6F7F00;">
account ten server niki kalian username meniko tanpo judul cacahe account nggonanmu wes pol pesen mu wes di</span><br>
[] <span style="background:#FFD8D8;color:#6F7F00;">
guwak pesenan mu wes di simpen sante wae pesenan mu wes ke kirim mbuh tekan ora pesenan e ke kethok pesenan mu wes ke kirim mbuh tekan ora pesenan </span><br>
DocTote::Dump
[ 0]  jw    254B   144p 25400R,
  2 chunks scored<br>
jw.100R(99%) 255 bytes = JAVANESE  <br><br>
[ks] <span style="background:#D8E7FF;color:#007F7F;">
پیٹھ سٮ۪اگت آکھ آزاد گیانکوشٖٔ ہۄ کٲنٛسِہ تِہ ہٮ۪کُن اٮ۪ڑِٹ تور چھک مَضموٗنن منز کٲشُر ویکیپیٖڈیا چھُ آکھ مَنصوٗبہٕ خٲطرٕ بنَاوُن آکھ گیانکوشٖٔ سۭتۍ آزاد منز زَبانَن تٔمِس یۄسہٕ ژٕ سۭتۍ تُہُنٛد گیان ہُرٮ۪ر کَرُن ہٮ۪کُن </span><br>
DocTote::Dump
[14]  ks    402B    83p 40200R,
  1 chunks scored<br>
ks.100R(99%) 403 bytes = KASHMIRI  <br><br>
[kk] <span style="background:#D8FFE7;color:#007F4F;">
ﺎ ﻗﻴﺎﻧﺎﺕ ﺑﻮﻟﻤﺎﻳﺪﻯ ﺑﯘﻝ ﭘﺮﻭﺗﺴﻪﺳﯩﻦ ﻳﺎﻋﻨﻲ ﻗﺎﻻ ﻭﻣﯩﺮﯨﻨﺪﻩ ﻗﺎﺯﺍﻕ ء ﺗﯩﻠﯩﻨﯩﯔ ﻗﻮﻟﺪﺍﻧﯩﻠﻤﺎﯞﻯ ﻗﺎﺯﺍﻕ ﺟﻪﺭﯨﻨﺪﻩ </span><br>
DocTote::Dump
[ 2]  kk    253B   131p 25300R,
  1 chunks scored<br>
kk.100R(99%) 254 bytes = KAZAKH  <br><br>
[kk] <span style="background:#D8FFE7;color:#007F4F;">
а билердің өзіне рұқсат берілмеген егер халық талап етсе ғана хан </span><br>
[] <span style="background:#D8FFE7;color:#007F4F;">
келісім берген өздеріңіз білесіздер қр қыл мыс тық кодексінде жазаның </span><br>
DocTote::Dump
[ 2]  kk    251B   235p 25100R,
  2 chunks scored<br>
kk.100R(99%) 252 bytes = KAZAKH  <br><br>
[kha] <span style="background:#EFFFD8;color:#004F7F;">
kaba jem jai sa sngap thuh ia ki bynta ba sharum naka sohbuin jong phi nangta sa pynhiar ia ka kti </span><br>
[] <span style="background:#EFFFD8;color:#004F7F;">
kadiang jong phi sha ka krung jong phi bad da kaba pyndonkam kumjuh ia ki shympriahti jong phi sa sngap thuh shapoh ka tohtit jong phi pyndonkam ia kajuh ka </span><br>
DocTote::Dump
[12] kha    256B   269p 25600R,
  2 chunks scored<br>
kha.100R(99%) 257 bytes = KHASI  <br><br>
[rw] <span style="background:#F8D8FF;color:#007F7F;">
dore ibyo ukeneye kumenya ukwo watubona ibibazo byinshi abandi babaza ububonero byibibina google on</span><br>
[rw*.60/lg.41] <span style="background:#F8D8FF;color:#007F7F;">
jela ho izina dyikyibina kyawe onjela ho yawe mulugo kulaho ibyandiko byawe shyilaho tegula yawe tulubaka tukongeraho iyanya mishya buliko tulambula </span><br>
DocTote::Dump
[15]  rw    248B   175p 17797R,
  2 chunks scored<br>
rw.71R(99%) 249 bytes = KINYARWANDA  <br><br>
[ku] <span style="background:#F8D8FF;color:#0F7F00;">
بۆ به ڕێوه بردنی نامه ی که دێتن ڕاسته وخۆ ڕه وان بکه نامه </span><br>
[] <span style="background:#F8D8FF;color:#0F7F00;">
کانی گ مایل بۆ حسابی پۆستێکی تر هێنانی په یوه ندکاره کان له </span><br>
DocTote::Dump
[15]  ku    209B   200p 20900R,
  2 chunks scored<br>
ku.100R(99%) 210 bytes = KURDISH  <br><br>
[ky] <span style="background:#D8FFFF;color:#0F7F00;">
جانا انى تانۇۇ ۇلۇتۇن تانۇۇ قىرعىزدى بئلۉۉ دەگەندىك اچىق ايتساق ما</span><br>
[] <span style="background:#D8FFFF;color:#0F7F00;">
ناستى تاانىعاندىق ۅزۉڭدۉ تاانىعاندىق بۉگۉن تەما جۉكتۅمۅ ق ى رع ى ز ت ى ل ى </span><br>
DocTote::Dump
[ 8]  ky    256B   225p 25600R,
  2 chunks scored<br>
ky.100R(99%) 257 bytes = KYRGYZ  <br><br>
[ky] <span style="background:#D8FFFF;color:#0F7F00;">
агай эле оболу мен садыбакас аганын өзү менен эмес эмгектери менен </span><br>
[] <span style="background:#D8FFFF;color:#0F7F00;">
тааныштым жылдары ташкенде өзбекстан илимдер академиясынын баяны </span><br>
DocTote::Dump
[ 8]  ky    246B   195p 24600R,
  2 chunks scored<br>
ky.100R(99%) 247 bytes = KYRGYZ  <br><br>
[la] <span style="background:#E3FFD8;color:#7F5F00;">
a deo qui enim nocendi causa mentiri solet si iam consulendi causa mentiatur multum profecit sed aliud est </span><br>
[] <span style="background:#E3FFD8;color:#7F5F00;">
quod per se ipsum laudabile proponitur aliud quod in deterioris comparatione praeponitur aliter enim gratulamur cum sanus est homo aliter cum melius </span><br>
DocTote::Dump
[ 7]  la    256B   254p 24171R,
  2 chunks scored<br>
la.94R(99%) 257 bytes = LATIN  <br><br>
[lv] <span style="background:#EFD8FF;color:#7F2F00;">
a gadskārtējā izpārdošana slēpošana jāņi atlaide izmaiņas trafikā </span><br>
[] <span style="background:#EFD8FF;color:#7F2F00;">
kas saistītas ar sezonas izpārdošanu speciālajām atlaidēm u c ir parastas un atsl</span><br>
[] <span style="background:#EFD8FF;color:#7F2F00;">
ēgvārdi kas ir populāri noteiktos laika posmos šajā laikā saņems lielāku klikšķu </span><br>
DocTote::Dump
[ 4]  lv    255B   352p 25500R,
  3 chunks scored<br>
lv.100R(99%) 256 bytes = LATVIAN  <br><br>
[ln] <span style="background:#D8F3FF;color:#007F4F;">
abakisamaki ndenge esengeli moyebami abongisamaki solo mpenza kombo ya moyebami elonguamaki kombo ya bayebami </span><br>
[] <span style="background:#D8F3FF;color:#007F4F;">
elonguamaki kombo eleki molayi po na esika epesameli limbisa esika ya kotia ba kombo esuki boye esengeli olimbola ndako na yo ya mikanda kombo </span><br>
DocTote::Dump
[ 3]  ln    253B   277p 23133R,
  2 chunks scored<br>
ln.91R(99%) 254 bytes = LINGALA  <br><br>
[lt] <span style="background:#FFD8EB;color:#7F2F00;">
a išsijungia mano idėja dėl geriausio laiko po pastarųjų savo santykių pasimokiau penki </span><br>
[] <span style="background:#FFD8EB;color:#7F2F00;">
dalykai be kurių negaliu gyventi mano miegamajame tu surasi ide</span><br>
[] <span style="background:#FFD8EB;color:#7F2F00;">
ali pora išsilavinimas aukštoji mokykla koledžas universitetas pagrindinis laipsnis metai </span><br>
DocTote::Dump
[ 5]  lt    251B   253p 25100R,
  3 chunks scored<br>
lt.100R(99%) 252 bytes = LITHUANIAN  <br><br>
[lb] <span style="background:#FFF7D8;color:#007F1F;">
a gewerkschaften och hei gefuerdert dir dammen an dir häre vun de gewe</span><br>
[] <span style="background:#FFF7D8;color:#007F1F;">
rkschaften denkt un déi aarm wann der äer fuerderunge formuléiert d sechst congés </span><br>
[] <span style="background:#FFF7D8;color:#007F1F;">
woch an aarbechtszäitverkierzung hëllefen hinnen net d unhiewe vun de steigerungssäz bei de </span><br>
DocTote::Dump
[ 6]  lb    252B   239p 25200R,
  3 chunks scored<br>
lb.100R(99%) 253 bytes = LUXEMBOURGISH  <br><br>
[mk] <span style="background:#EFD8FF;color:#7F5F00;">
гласовите коалицијата на вмро дпмне како партија со најмногу </span><br>
[] <span style="background:#EFD8FF;color:#7F5F00;">
освоени гласови ќе добие евра а на сметката на коализијата за македонија </span><br>
DocTote::Dump
[ 4]  mk    247B   264p 24700R,
  2 chunks scored<br>
mk.100R(99%) 248 bytes = MACEDONIAN  <br><br>
[mg] <span style="background:#FFD8D8;color:#004F7F;">
amporisihin i ianao mba hijery ny dika teksta ranofotsiny an ity </span><br>
[] <span style="background:#FFD8D8;color:#004F7F;">
lahatsoratra ity tsy ilaina ny opérateur efa karohina daholo ny teny re</span><br>
[] <span style="background:#FFD8D8;color:#004F7F;">
hetra nosoratanao ampiasao anaovana dokambarotra i google telugu datin ny takelaka fikarohana sary renitakelak i </span><br>
DocTote::Dump
[ 0]  mg    250B   343p 25000R,
  3 chunks scored<br>
mg.100R(99%) 251 bytes = MALAGASY  <br><br>
[ms] <span style="background:#D8FFFF;color:#7F5F00;">
daripada dirinya hirako shinji seorang pemuda merujuk diri mereka sebagai vi</span><br>
[] <span style="background:#D8FFFF;color:#7F5F00;">
zard shinji telah cuba untuk menyakinkan ichigo untuk menyertai kumpulan me</span><br>
[] <span style="background:#D8FFFF;color:#7F5F00;">
reka mengatakan bahawa hanya dia sahaja yang mampu mengajar ichigo teknik untuk mengawal hollow </span><br>
DocTote::Dump
[ 8]  ms    247B   367p 24700R,
  3 chunks scored<br>
ms.100R(99%) 248 bytes = MALAY  <br><br>
[ms] <span style="background:#D8FFFF;color:#7F5F00;">
bilik sebelah berkata julai pada pm ladymariah hmm sume ni terpulang kepada individu mu</span><br>
[] <span style="background:#D8FFFF;color:#7F5F00;">
ngkin anda bernasib baik selama ini dalam membeli hp yang bagus deli berkata julai </span><br>
[] <span style="background:#D8FFFF;color:#7F5F00;">
pada pm walaupun bukan bahsa baku tp tetap bahasa melayu kan perubahan boleh dibuat </span><br>
DocTote::Dump
[ 8]  ms    254B   337p 25400R,
  3 chunks scored<br>
ms.100R(99%) 255 bytes = MALAY  <br><br>
[mt] <span style="background:#F8FFD8;color:#3F7F00;">
ata ikteb messaġġ lil indirizzi differenti billi tagħżilhom u tagħfas il buttuna ik</span><br>
[] <span style="background:#F8FFD8;color:#3F7F00;">
teb żid numri tfittxijja tal kotba mur print home kotba minn pagni ghal pagna </span><br>
[] <span style="background:#F8FFD8;color:#3F7F00;">
minn ghall ktieb ta aċċessa stieden habib iehor grazzi it tim tal gruppi google </span><br>
DocTote::Dump
[ 1]  mt    249B   294p 24900R,
  3 chunks scored<br>
mt.100R(99%) 250 bytes = MALTESE  <br><br>
[en] <span style="background:#FFFFF4;color:#000000;">
and not ripe as i thought yn assyl yn shynnagh as yn lion the ass the fox </span><br>
[gv] <span style="background:#F8D8FF;color:#004F7F;">
and the lion va assyl as shynnagh ayns commee son nyn vendeilys as sauchys hie </span><br>
[] <span style="background:#F8D8FF;color:#004F7F;">
ad magh ayns y cheyll dy shelg cha row ad er gholl feer foddey tra veeit ad rish lion yn shynnagh </span><br>
DocTote::Dump
[ 0]  en     74B    71p 7400R,
[15]  gv    177B   210p 17700R,
  3 chunks scored<br>
gv.100R(70%) en.100R(29%) 252 bytes = MANX  <br><br>
[mi] <span style="background:#FFD8D8;color:#007F7F;">
haere ki te kainga o o haere ki te kainga o o haere ki te kainga o te rapunga ahua o haere ki te kai</span><br>
[] <span style="background:#FFD8D8;color:#007F7F;">
nga o ka tangohia he ki to rapunga kaore au mohio te tikanga whakatiki o te ra he whakaharuru te pai rapunga a te rapunga ahua a e kainga o nga awhina o te </span><br>
DocTote::Dump
[ 0]  mi    256B   265p 25600R,
  2 chunks scored<br>
mi.100R(99%) 257 bytes = MAORI  <br><br>
[mr] <span style="background:#FFD8D8;color:#3F7F00;">
हैदराबाद उच्चार ऐका सहाय्य माहिती तेलुगू </span><br>
[te] <span style="background:#EFFFD8;color:#7F5F00;">
హైదరాబాదు </span><br>
[hi] <span style="background:#D8F3FF;color:#7F5F00;">
उर्दू </span><br>
[ur*.11/sd.5] <span style="background:#D8FFE7;color:#6F7F00;">
حیدر آباد </span><br>
[mr] <span style="background:#FFD8D8;color:#3F7F00;">
हे भारतातील आंध्र प्रदेश राज्याच्या राजधानीचे शहर आहे है</span><br>
[] <span style="background:#FFD8D8;color:#3F7F00;">
दराबादची लोकसंख्या लाख हजार आहे मोत्यांचे शहर अशी एकेकाळी ओळख असले</span><br>
[] <span style="background:#FFD8D8;color:#3F7F00;">
ल्या या शहराला ऐतिहासिक सांस्कृतिक आणि स्थापत्यशास्त्रीय वारसा लाभला आहे </span><br>
[] <span style="background:#FFD8D8;color:#3F7F00;">
नंतर शिक्षण आणि माहिती तंत्रज्ञान त्याचप्रमाणे औषधनिर्मिती आणि जैवत</span><br>
[] <span style="background:#FFD8D8;color:#3F7F00;">
ंत्रज्ञान क्षेत्रातील उद्योगधंद्यांची वाढ शहरात झाली दक्षिण मध्य भा</span><br>
[] <span style="background:#FFD8D8;color:#3F7F00;">
रतातील पर्यटन आणि तेलुगू चित्रपटनिर्मितीचे हैदराबाद हे केंद्र आहे </span><br>
DocTote::Dump
[ 0]  mr   1190B   767p 119000R,
[ 2]  ur     18B    11p  648R,
[ 3]  hi     16B     8p 1600R,
[12]  te     29B    29p 2900R,
  10 chunks scored<br>
{CloseLangPair: hi.100R,16B => mr}<br>
{Unreli ur.36R,18B} mr.100R(95%) te.100R(3%) 1257 bytes = MARATHI  <br><br>
[mfe*.66/crs.61] <span style="background:#D8F3FF;color:#001F7F;">
anz dir mwa sa bann delo ki to trouve la kot fam prostitie asize samem bann pep bann la</span><br>
[] <span style="background:#D8F3FF;color:#001F7F;">
foul dimoun bann nasion ek bann langaz sa dis korn ki to finn trouve ansam avek bebet la zot </span><br>
[] <span style="background:#D8F3FF;color:#001F7F;">
pou ena laenn pou prostitie la zot pou pran tou seki li ena e met li touni zot pou manz so laser e </span><br>
[] <span style="background:#D8F3FF;color:#001F7F;">
bril seki reste dan dife parski bondie finn met dan zot leker proze pou realiz so plan </span><br>
[] <span style="background:#D8F3FF;color:#001F7F;">
zot pou met zot dakor pou sed zot pouvwar bebet la ziska ki parol bondie fini realize </span><br>
DocTote::Dump
[ 3] mfe    452B   406p 38347R,
  5 chunks scored<br>
mfe.84R(99%) 453 bytes = MAURITIAN_CREOLE  <br><br>
[mn] <span style="background:#FFD8D8;color:#007F1F;">
а боловсронгуй болгох орон нутгийн ажил үйлсийг уялдуулж зо</span><br>
[] <span style="background:#FFD8D8;color:#007F1F;">
хицуулах дүрэм журам боловсруулах орон нутгийн өмч хөрөнгө санхүүгийн </span><br>
DocTote::Dump
[ 0]  mn    241B   315p 24100R,
  2 chunks scored<br>
mn.100R(99%) 242 bytes = MONGOLIAN  <br><br>
[na] <span style="background:#F8FFD8;color:#004F7F;">
arcol obabakaen riringa itorere ibibokiei ababaro min kuduwa airumena baoin tokin rowio</span><br>
[] <span style="background:#F8FFD8;color:#004F7F;">
wet itiket keram damadamit eigirow etoreiy row keitsito boney ibingo it</span><br>
[] <span style="background:#F8FFD8;color:#004F7F;">
siw dorerin naoerodelaporte s nauruan dictionary a c a c d g h o p s t y aiquen ion eins aiquen </span><br>
DocTote::Dump
[ 1]  na    254B   229p 25400R,
  3 chunks scored<br>
na.100R(99%) 255 bytes = NAURU  <br><br>
[ne] <span style="background:#FFEBD8;color:#7F5F00;">
अरू ठाऊँबाटपनि खुलेको छ यो खाता अर अरू ठाऊँबाटपनि खुलेको छ यो खाता अर ू </span><br>
DocTote::Dump
[11]  ne    186B    68p 18600R,
  1 chunks scored<br>
ne.100R(99%) 187 bytes = NEPALI  <br><br>
[no] <span style="background:#FFD8F7;color:#000000;">
a er obligatorisk tidsforskyvning plassering av katalogsøk planinformasjon loggfilbane gruppenavn kontoinfo</span><br>
[] <span style="background:#FFD8F7;color:#000000;">
rmasjon passord domene gruppeinformasjon alle kampanjesporing alternativ bruker grupper oppgaveplanlegger oppgavehistorikk kontosammendrag antall </span><br>
DocTote::Dump
[10]  no    254B   207p 25400R,
  2 chunks scored<br>
no.100R(99%) 255 bytes = NORWEGIAN  <br><br>
[nn] <span style="background:#FFD8D8;color:#0F7F00;">
a for verktylina til å hjelpa deg å nå oss merk at pagerank syninga ikkje automatisk kjem til å </span><br>
[] <span style="background:#FFD8D8;color:#0F7F00;">
henta inn informasjon frå sider med argument dvs frå sider med eit i en dersom datamaskina di er plassert bak ein mellomtenar for vevsider kan det verka </span><br>
DocTote::Dump
[ 0]  nn    255B   246p 25500R,
  2 chunks scored<br>
nn.100R(99%) 256 bytes = NORWEGIAN_N  <br><br>
[ny] <span style="background:#D8E7FF;color:#001F7F;">
boma ndi gawo la dziko lomwe linapangidwa ndi cholinga chothandiza ntchito yolamu</span><br>
[] <span style="background:#D8E7FF;color:#001F7F;">
lira kuŵalako kulikuunikabe mandita edipo nyima unalephera kugonjetsa kuŵalako </span><br>
DocTote::Dump
[14]  ny    162B   143p 16200R,
  2 chunks scored<br>
ny.100R(99%) 163 bytes = NYANJA  <br><br>
[oc] <span style="background:#F8FFD8;color:#6F7F00;">
pasmens la classificacion pus admesa uei segon juli ronjat e pèire bèc agropa lei parlars deis aups dins l occitan vivaroaupenc e non dins lo dialècte provençau </span><br>
DocTote::Dump
[ 1]  oc    165B    89p 16500R,
  1 chunks scored<br>
oc.100R(99%) 166 bytes = OCCITAN  <br><br>
[om] <span style="background:#D8FFE7;color:#004F7F;">
afaan katalaa bork bork bork hiikaa jira hin argamne gareen barbaadame hin argamne gargarsa qube en gar bayee </span><br>
[] <span style="background:#D8FFE7;color:#004F7F;">
jira garee walitti firooman gareewwan walitti firooman fuula web akka ta</span><br>
[] <span style="background:#D8FFE7;color:#004F7F;">
rtiiba qubeetiin agarsiisi akka tartiiba qubeetiin agarsiisaa jira akka </span><br>
DocTote::Dump
[ 2]  om    254B   313p 25400R,
  3 chunks scored<br>
om.100R(99%) 255 bytes = OROMO  <br><br>
[ps] <span style="background:#FFD8EB;color:#007F4F;">
اتو مستقل رياست جوړ شو او د پخواني ادبي انجمن څانګې ددې رياست جز شوی </span><br>
[] <span style="background:#FFD8EB;color:#007F4F;">
او ددې انجمن د ژبې مديريت د پښتو ټولنې په لوی مديريت واوښت لوی مدير يې د </span><br>
DocTote::Dump
[ 5]  ps    252B   222p 25200R,
  2 chunks scored<br>
ps.100R(99%) 253 bytes = PASHTO  <br><br>
[nso] <span style="background:#F8FFD8;color:#0F007F;">
bophara bja asia ekaba bja lefase goba bja naga ya lefase ntle le mawatle asia enale badudu bao baka</span><br>
[] <span style="background:#F8FFD8;color:#0F007F;">
bago dimillione millione tše nne billion yeo e bago ya badudi ba lefase ka bophara a bapolelwa rena sefapanong mehleng ya po</span><br>
[] <span style="background:#F8FFD8;color:#0F007F;">
ntius pilatus a hlokofatšwa a bolokwa a tsoga ka letšatši la boraro ka mo mangwalo a bolelago ka gona a rotogela magodimong </span><br>
DocTote::Dump
[ 1] nso    352B   176p 30793R,
  3 chunks scored<br>
nso.87R(99%) 353 bytes = PEDI  <br><br>
[fa] <span style="background:#D8FFF3;color:#3F7F00;">
آب خوردن عجله می کردند به جای باز ی کتک کاری می کردند و همه چيز </span><br>
[] <span style="background:#D8FFF3;color:#3F7F00;">
مثل قبل بود فقط من ماندم و يک دنيا حرف و انتظار تا عاقبت رسيد احضاريه ی ای با </span><br>
DocTote::Dump
[13]  fa    249B   199p 24900R,
  2 chunks scored<br>
fa.100R(99%) 250 bytes = PERSIAN  <br><br>
[pl] <span style="background:#FFEBD8;color:#000000;">
a australii będzie widział inne reklamy niż użytkownik z kanady kierowanie geogra</span><br>
[] <span style="background:#FFEBD8;color:#000000;">
ficzne sprawia że reklamy są lepiej dopasowane do użytkownika twojej st</span><br>
[] <span style="background:#FFEBD8;color:#000000;">
rony oznacza to także że możesz nie zobaczyć wszystkich reklam które są wyświetlane na </span><br>
DocTote::Dump
[11]  pl    253B   431p 25300R,
  3 chunks scored<br>
pl.100R(99%) 254 bytes = POLISH  <br><br>
[pt] <span style="background:#EFFFD8;color:#000000;">
a abit prevê que a entrada desses produtos estrangeiros no mercado têxtil e vestuário do bra</span><br>
[] <span style="background:#EFFFD8;color:#000000;">
sil possa reduzir os preços em cerca de a partir de má notícia para os empresários que terão que lutar para garantir suas margens de lucro mas boa notícia </span><br>
DocTote::Dump
[12]  pt    256B   241p 25600R,
  2 chunks scored<br>
pt.100R(99%) 257 bytes = PORTUGUESE  <br><br>
[qu] <span style="background:#FFF7D8;color:#007F4F;">
is t ipanakunatapis rikuchinankupaq qanpa simiykipi no</span><br>
[] <span style="background:#FFF7D8;color:#007F4F;">
qaykoqpa uya jllanakunamanta kunan jamoq simikunaman qelqan tiyan watukuy qpa </span><br>
[] <span style="background:#FFF7D8;color:#007F4F;">
uyata qanpa llaqtaykipi llank anakuna simimanta yanapakuna si</span><br>
[] <span style="background:#FFF7D8;color:#007F4F;">
mimanta mayqen llaqtallapis kay simimanta t ijray qpa qelqa </span><br>
DocTote::Dump
[ 6]  qu    253B   386p 25246R,
  4 chunks scored<br>
qu.99R(99%) 254 bytes = QUECHUA  <br><br>
[rm] <span style="background:#EFD8FF;color:#007F1F;">
cur ch il chantun turitg ha dà il dretg da votar a las dunnas è ella vegnida elegida en il cu</span><br>
[] <span style="background:#EFD8FF;color:#007F1F;">
ssegl da vischnanca da zumikon per la partida liberaldemocratica svizra pld da enfin è ella stada presidenta da vischnanca da zumikon l </span><br>
[] <span style="background:#EFD8FF;color:#007F1F;">
onn è elisabeth kopp vegnida elegida en il cussegl naziunal e reelegida quatter onns pli tard cun in resultat da sur vuschs l onn è ella daventada vicepresidenta da la pld </span><br>
DocTote::Dump
[ 4]  rm    406B   324p 40220R,
  3 chunks scored<br>
rm.99R(99%) 407 bytes = RHAETO_ROMANCE  <br><br>
[ro] <span style="background:#FFF7D8;color:#7F2F00;">
оперативэ а органелор ши институциилор екзекутиве ши а органелор </span><br>
[] <span style="background:#FFF7D8;color:#7F2F00;">
жудичиаре але путерий де стат фиекэруй орган ал путерий де стат и се </span><br>
DocTote::Dump
[ 6]  ro    246B   206p 24600R,
  2 chunks scored<br>
ro.100R(99%) 247 bytes = ROMANIAN  <br><br>
[ro] <span style="background:#FFF7D8;color:#7F2F00;">
a anunţurilor reţineţi nu plătiţi pentru clicuri sau impresii ci numai atunci </span><br>
[] <span style="background:#FFF7D8;color:#7F2F00;">
când pe site ul dvs survine o acţiune dorită site urile negative nu pot avea uri de destinaţie daţi instrucţiuni societăţii dvs bancare sau constructoare să </span><br>
DocTote::Dump
[ 6]  ro    249B   241p 24651R,
  2 chunks scored<br>
ro.99R(99%) 250 bytes = ROMANIAN  <br><br>
[rw] <span style="background:#F8D8FF;color:#007F7F;">
ishaka mu ndero y abana bawe ganira n abigisha nimba hari ingorane izo ari zo zose ushobora gusaba kubo</span><br>
[rn] <span style="background:#D8F3FF;color:#004F7F;">
nana n umwigisha canke kuvugana nawe kuri terefone inyuma y uko babarungikira urutonde rw amanota i muhira mu bisanzwe amashure aratumira abavyeyi </span><br>
DocTote::Dump
[ 3]  rn    147B   134p 14700R,
[15]  rw    103B    99p 10300R,
  2 chunks scored<br>
{CloseLangPair: rw.100R,103B => rn}<br>
rn.100R(99%) 251 bytes = RUNDI  <br><br>
[ru] <span style="background:#D8FFF3;color:#000000;">
а неправильный формат идентификатора дн назад </span><br>
DocTote::Dump
[13]  ru     86B    35p 6966R,
  1 chunks scored<br>
ru.81R(98%) 87 bytes = RUSSIAN  <br><br>
[sm] <span style="background:#EFD8FF;color:#004F7F;">
autu mea o lo totonu le e le minaomia matou te tuu i totonu i le faamatalaina o le suesuega i taimi </span><br>
[] <span style="background:#EFD8FF;color:#004F7F;">
uma mea o lo totonu fuafua i mea e tatau fa afoi tala mai le newsgroup mataupu fa afoi mai tala e ai le mataupu e ai totonu tusitala o le itu o faamatalaga </span><br>
DocTote::Dump
[ 4]  sm    256B   286p 25600R,
  2 chunks scored<br>
sm.100R(99%) 257 bytes = SAMOAN  <br><br>
[sg] <span style="background:#FFD8EB;color:#004F7F;">
atâa na âkotta zo me lâkwê angbâ gï tarrango nî âkotta zo tî koddoro nî âde agbû tenne </span><br>
[] <span style="background:#FFD8EB;color:#004F7F;">
nî na kate töngana mbênî kotta kpalle tî nzönî dutï tî halëzo pëpe </span><br>
[] <span style="background:#FFD8EB;color:#004F7F;">
atâa sô âla lü gbâ tî ândya tî mâi na sahngo asâra gbâ tî </span><br>
DocTote::Dump
[ 5]  sg    247B   390p 24700R,
  3 chunks scored<br>
sg.100R(99%) 248 bytes = SANGO  <br><br>
[sa] <span style="background:#FFF7D8;color:#004F7F;">
ं क र्मणस् त स्य य त्कि ङ्चेह करो त्यय ं त स्माल् लोका त्पु नरै ति अस्मै लोका य क र्मण इ ति नु काम </span><br>
DocTote::Dump
[ 6]  sa    245B    52p 20090R,
  1 chunks scored<br>
sa.82R(99%) 246 bytes = SANSKRIT  <br><br>
[sa] <span style="background:#FFF7D8;color:#004F7F;">
brahmā tatraivāntaradhīyata tataḥ saśiṣyo vālmīkir munir vismayam āyayau </span><br>
[] <span style="background:#FFF7D8;color:#004F7F;">
tasya śiṣyās tataḥ sarve jaguḥ ślokam imaṃ punaḥ muhur muhuḥ prīyamāṇāḥ prāhuś ca bhṛśavismitāḥ samākṣaraiś caturbhir yaḥ pādair gīto </span><br>
DocTote::Dump
[ 6]  sa    256B   274p 25600R,
  2 chunks scored<br>
sa.100R(99%) 257 bytes = SANSKRIT  <br><br>
[sco] <span style="background:#D8FFF3;color:#004F7F;">
a gless an geordie runciman ower a gless an tamson their man preached a hale hoor </span><br>
[en*.71/sco.67] <span style="background:#FFFFF4;color:#000000;">
aboot the glorious memories o forty three an backsliders an profane persons like esau an abo</span><br>
[sco] <span style="background:#D8FFF3;color:#004F7F;">
ot jeroboam the son o nebat that gaed stravagin to anither kirk an made aa israel </span><br>
DocTote::Dump
[ 0]  en     92B    71p 3036R,
[13] sco    164B   133p 14350R,
  3 chunks scored<br>
{Unreli en.33R,92B => sco} sco.135R(63%) 257 bytes = SCOTS* <br><br>
[gd] <span style="background:#D8FFF3;color:#6F7F00;">
air son is gum bi casg air a h uile briosgaid no gum faigh thu brath nua</span><br>
[] <span style="background:#D8FFF3;color:#6F7F00;">
ir a tha briosgaid a tighinn gad rannsachadh ghoogle gu ceart mura bheil briosgai</span><br>
[] <span style="background:#D8FFF3;color:#6F7F00;">
dean ceadaichte cuiridh google briosgaid dha do neach cleachdaidh fa leth tha google a cleachdadh </span><br>
DocTote::Dump
[13]  gd    251B   331p 25100R,
  3 chunks scored<br>
gd.100R(99%) 252 bytes = SCOTS_GAELIC  <br><br>
[sr] <span style="background:#D8FFF3;color:#7F2F00;">
балчак балчак на мапи србије уреди демографија у насељу балчак живи пуноле</span><br>
[] <span style="background:#D8FFF3;color:#7F2F00;">
тна становника а просечна старост становништва износи година </span><br>
DocTote::Dump
[13]  sr    251B   296p 24302R,
  2 chunks scored<br>
sr.96R(99%) 252 bytes = SERBIAN  <br><br>
[sr] <span style="background:#D8FFF3;color:#7F2F00;">
autonomnih pokrajina saveznim zakonom može se propisati poseban sastav organizacija i delokrug saveta za poslove narodne odbrane članove saveta federa</span><br>
[] <span style="background:#D8FFF3;color:#7F2F00;">
cije bira na predlog predsedništva savezna skupština iz reda društveno političkih i drugih javnih </span><br>
DocTote::Dump
[13]  sr    254B   217p 25094R,
  2 chunks scored<br>
sr.98R(99%) 255 bytes = SERBIAN  <br><br>
[crs] <span style="background:#D8F3FF;color:#0F007F;">
sesel ou menm nou sel patri kot nou viv dan larmoni lazwa lanmour ek lape nou remersye bo</span><br>
[] <span style="background:#D8F3FF;color:#0F007F;">
ndye preserv labote nou pei larises nou losean en leritaz byen presye pour boner nou </span><br>
[] <span style="background:#D8F3FF;color:#0F007F;">
zanfan reste touzour dan linite fer monte nou paviyon ansanm pou tou leternite koste seselwa </span><br>
DocTote::Dump
[ 3] crs    267B   258p 26700R,
  3 chunks scored<br>
crs.100R(99%) 268 bytes = SESELWA  <br><br>
[st] <span style="background:#FFF7D8;color:#0F7F00;">
bang ba nang le thahasello matshwao a sehlooho thuto e thehilweng hodima diphetho </span><br>
[] <span style="background:#FFF7D8;color:#0F7F00;">
ke tsela ya ho ruta le ho ithuta e totobatsang hantle seo baithuti ba lokelang ho se </span><br>
[] <span style="background:#FFF7D8;color:#0F7F00;">
fihlella ntlhatheo eo e sebetsang ka yona ke ya hore titjhere o hlakisa pele seo </span><br>
DocTote::Dump
[ 6]  st    248B   251p 24800R,
  3 chunks scored<br>
st.100R(99%) 249 bytes = SESOTHO  <br><br>
[sn] <span style="background:#E3FFD8;color:#007F4F;">
chete vanyori vanotevera vakabatsira kunyora zvikamu zvino kumba home tinyo</span><br>
[] <span style="background:#E3FFD8;color:#007F4F;">
rere tsamba chikamu chakumbirwa hachina kuwanikwa chikamu ichi cheninge chakayiswa kui</span><br>
[] <span style="background:#E3FFD8;color:#007F4F;">
mwe nzvimbo mudhairekitori rino chimwe chikamu chopadhuze pane chinhu chatadza kushanda bad </span><br>
DocTote::Dump
[ 7]  sn    253B   288p 24025R,
  3 chunks scored<br>
sn.94R(99%) 254 bytes = SHONA  <br><br>
[sd] <span style="background:#D8F3FF;color:#007F1F;">
اضافو ٿي ٿيو پر اها خبر عثمان کي بعد پيئي ته سگريٽ ڇڪيندڙ مسلمان نه هو بلڪ هندو هو دڪان تي پهچي عثمان ڪسبت کولي گراهڪن جي سيرب لاهڻ شروع ڪئي پر </span><br>
DocTote::Dump
[ 3]  sd    256B   182p 25600R,
  1 chunks scored<br>
sd.100R(99%) 257 bytes = SINDHI  <br><br>
[ss] <span style="background:#E3FFD8;color:#004F7F;">
bakhokhintsela yesikhashana bafake imininingwane ye akhawunti leliciniso kule</span><br>
[] <span style="background:#E3FFD8;color:#004F7F;">
lifomu nangabe akukafakwa imininingwane leliciniso imali lekhokhiwe angeke ifakwe kumk</span><br>
[] <span style="background:#E3FFD8;color:#004F7F;">
hokhintsela lofanele imininingwane ye akhawunti ime ngalendlela lelandzelako inombolo </span><br>
DocTote::Dump
[ 7]  ss    249B   283p 24737R,
  3 chunks scored<br>
ss.99R(99%) 250 bytes = SISWANT  <br><br>
[sk] <span style="background:#EFD8FF;color:#3F7F00;">
a aktivovať reklamnú kampaň ak chcete kampaň pred spustením ešte prispôsobiť uložte ju ako ša</span><br>
[] <span style="background:#EFD8FF;color:#3F7F00;">
blónu a pokračujte v úprave vyberte si jednu z možností nižšie a kliknite na tlačidlo uložiť kampaň nastavenia kampane môžete ľubovoľne </span><br>
DocTote::Dump
[ 4]  sk    254B   317p 25400R,
  2 chunks scored<br>
sk.100R(99%) 255 bytes = SLOVAK  <br><br>
[sl] <span style="background:#F8D8FF;color:#6F7F00;">
adsense stanje prijave za google adsense google adsense račun je bil začasno zamrznjen </span><br>
[] <span style="background:#F8D8FF;color:#6F7F00;">
pozdravljeni hvala za vaše zanimanje v google adsense po pregledu vaše prijavnice so naši strokovnjaki ugotovili da spletna stran ki je trenutno povezana z vašim </span><br>
DocTote::Dump
[15]  sl    255B   182p 25233R,
  2 chunks scored<br>
sl.98R(99%) 256 bytes = SLOVENIAN  <br><br>
[so] <span style="background:#D8FFF3;color:#0F7F00;">
a oo maanta bogga koobaad ugu qoran yahey beesha caalamka laakiin si kata oo beesha </span><br>
[] <span style="background:#D8FFF3;color:#0F7F00;">
caalamku ula guntato soomaaliya waxa aan shaki ku jirin in aakhirataanka da</span><br>
[] <span style="background:#D8FFF3;color:#0F7F00;">
dka soomaalida oo kaliya ay yihiin ku soomaaliya ka saari kara dhibka ay ku jirto </span><br>
DocTote::Dump
[13]  so    241B   348p 24100R,
  3 chunks scored<br>
so.100R(99%) 242 bytes = SOMALI  <br><br>
[es] <span style="background:#D8E7FF;color:#000000;">
a continuación haz clic en el botón obtener ruta también puedes desplazarte hasta el final </span><br>
[] <span style="background:#D8E7FF;color:#000000;">
de la página para cambiar tus opciones de búsqueda gráfico y detalles ésta es una lista de los vídeos que te recomendamos nuestras recomendaciones se basan </span><br>
DocTote::Dump
[14]  es    255B   237p 24090R,
  2 chunks scored<br>
es.94R(99%) 256 bytes = SPANISH  <br><br>
[su] <span style="background:#E3FFD8;color:#3F7F00;">
alus gampang deuih uhun im gmail obrolan ulah disimpen na koropak kuring simpen obrolan dina koropak kuring obrolan obrolan </span><br>
[en*.51/su.48] <span style="background:#FFFFF4;color:#000000;">
anjeun teu boga arsip obrolan slovak slovenia vietnam catalan czech estonia hindi lithuania romania tagalog thai turkish édit iber </span><br>
DocTote::Dump
[ 0]  en    132B    51p 2772R,
[ 7]  su    124B    75p 12400R,
  2 chunks scored<br>
{Unreli en.21R,132B} su.100R(48%) 257 bytes = SUNDANESE* <br><br>
[sw] <span style="background:#D8E7FF;color:#6F7F00;">
a ujumbe mpya jumla unda tafuta na angalia vikundi vya kujadiliana na kushiriki mawazo iliyopangwa kwa </span><br>
[] <span style="background:#D8E7FF;color:#6F7F00;">
tarehe watumiaji wapya futa orodha hizi lugha hoja vishikanisho vilivyo dhaminiwa ujumbe sanaa na tamasha toka udhibitisho wa neno kwa haraka fikia </span><br>
DocTote::Dump
[14]  sw    251B   279p 25100R,
  2 chunks scored<br>
sw.100R(99%) 252 bytes = SWAHILI  <br><br>
[sv] <span style="background:#F8D8FF;color:#000000;">
a bort objekt från google desktop post äldst meny öretag dress etaljer alternativ för vad är inne yaste </span><br>
[] <span style="background:#F8D8FF;color:#000000;">
google skrivbord plugin program för nyheter google visa nyheter som är anpassade efter de artiklar som du läser om du till exempel läser </span><br>
DocTote::Dump
[15]  sv    250B   164p 25000R,
  2 chunks scored<br>
sv.100R(99%) 251 bytes = SWEDISH  <br><br>
[tl] <span style="b
Download .txt
gitextract_ord_uu64/

├── .github/
│   └── workflows/
│       └── pythonpackage.yaml
├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── bindings/
│   ├── README
│   ├── encodings.cc
│   ├── gen_enc.py
│   ├── gen_test.py
│   ├── pycldmodule.cc
│   ├── test.py
│   └── test_shuffle.py
├── cld2/
│   ├── LICENSE
│   ├── docs/
│   │   ├── CLD2UnitTestFullOutput.html
│   │   ├── CLD2UnitTestOutput.html
│   │   ├── CLD2UnitTestOutputVerbose.html
│   │   ├── a_little_french_test_input.html
│   │   ├── evaluate_cld1_small_20110406.txt
│   │   ├── evaluate_cld2_large_20130720.txt
│   │   ├── evaluate_cld2_large_20140122.txt
│   │   ├── evaluate_cld2_small_20130715.txt
│   │   ├── evaluate_cld2_small_20140122.txt
│   │   ├── test_version.html
│   │   └── test_version.txt
│   ├── internal/
│   │   ├── cld2_do_score.cc
│   │   ├── cld2_dynamic_compat.h
│   │   ├── cld2_dynamic_data.cc
│   │   ├── cld2_dynamic_data.h
│   │   ├── cld2_dynamic_data_extractor.cc
│   │   ├── cld2_dynamic_data_extractor.h
│   │   ├── cld2_dynamic_data_loader.cc
│   │   ├── cld2_dynamic_data_loader.h
│   │   ├── cld2_dynamic_data_tool.cc
│   │   ├── cld2_generated_cjk_compatible.cc
│   │   ├── cld2_generated_deltaocta0122.cc
│   │   ├── cld2_generated_deltaocta0527.cc
│   │   ├── cld2_generated_deltaoctachrome.cc
│   │   ├── cld2_generated_deltaoctachrome0122.cc
│   │   ├── cld2_generated_deltaoctachrome0614.cc
│   │   ├── cld2_generated_distinctocta0122.cc
│   │   ├── cld2_generated_distinctocta0527.cc
│   │   ├── cld2_generated_distinctoctachrome.cc
│   │   ├── cld2_generated_distinctoctachrome0122.cc
│   │   ├── cld2_generated_distinctoctachrome0604.cc
│   │   ├── cld2_generated_octa2_dummy.cc
│   │   ├── cld2_generated_quad0122.cc
│   │   ├── cld2_generated_quad0720.cc
│   │   ├── cld2_generated_quadchrome0122_16.cc
│   │   ├── cld2_generated_quadchrome0122_19.cc
│   │   ├── cld2_generated_quadchrome0122_2.cc
│   │   ├── cld2_generated_quadchrome0715.cc
│   │   ├── cld2_generated_quadchrome_16.cc
│   │   ├── cld2_generated_quadchrome_2.cc
│   │   ├── cld2_unittest.cc
│   │   ├── cld2_unittest_full.cc
│   │   ├── cld2tablesummary.h
│   │   ├── cld_generated_cjk_delta_bi_32.cc
│   │   ├── cld_generated_cjk_delta_bi_4.cc
│   │   ├── cld_generated_cjk_uni_prop_80.cc
│   │   ├── cld_generated_score_quad_octa_0122.cc
│   │   ├── cld_generated_score_quad_octa_0122_2.cc
│   │   ├── cld_generated_score_quad_octa_1024_256.cc
│   │   ├── cld_generated_score_quad_octa_2.cc
│   │   ├── cldutil.cc
│   │   ├── cldutil.h
│   │   ├── cldutil_offline.cc
│   │   ├── cldutil_offline.h
│   │   ├── cldutil_shared.cc
│   │   ├── cldutil_shared.h
│   │   ├── clean.sh
│   │   ├── compact_lang_det.cc
│   │   ├── compact_lang_det_hint_code.cc
│   │   ├── compact_lang_det_hint_code.h
│   │   ├── compact_lang_det_impl.cc
│   │   ├── compact_lang_det_impl.h
│   │   ├── compact_lang_det_test.cc
│   │   ├── compile.sh
│   │   ├── compile_and_test_all.sh
│   │   ├── compile_dynamic.sh
│   │   ├── compile_full.sh
│   │   ├── compile_libs.sh
│   │   ├── debug.cc
│   │   ├── debug.h
│   │   ├── debug_empty.cc
│   │   ├── fixunicodevalue.cc
│   │   ├── fixunicodevalue.h
│   │   ├── generated_distinct_bi_0.cc
│   │   ├── generated_entities.cc
│   │   ├── generated_language.cc
│   │   ├── generated_language.h
│   │   ├── generated_ulscript.cc
│   │   ├── generated_ulscript.h
│   │   ├── getonescriptspan.cc
│   │   ├── getonescriptspan.h
│   │   ├── integral_types.h
│   │   ├── lang_script.cc
│   │   ├── lang_script.h
│   │   ├── langspan.h
│   │   ├── offsetmap.cc
│   │   ├── offsetmap.h
│   │   ├── port.h
│   │   ├── scoreonescriptspan.cc
│   │   ├── scoreonescriptspan.h
│   │   ├── scoreutf8text.cc
│   │   ├── stringpiece.h
│   │   ├── tote.cc
│   │   ├── tote.h
│   │   ├── unittest_data.h
│   │   ├── utf8acceptinterchange.h
│   │   ├── utf8prop_lettermarkscriptnum.h
│   │   ├── utf8repl_lettermarklower.h
│   │   ├── utf8scannot_lettermarkspecial.h
│   │   ├── utf8statetable.cc
│   │   └── utf8statetable.h
│   └── public/
│       ├── compact_lang_det.h
│       └── encodings.h
├── pycld2/
│   └── __init__.py
├── requirements.txt
├── setup.cfg
├── setup.py
└── test_pycld2.py
Download .txt
SYMBOL INDEX (421 symbols across 87 files)

FILE: bindings/encodings.cc
  type cld_encoding (line 22) | struct cld_encoding {
  function strcasecmp (line 106) | inline int
  function EncodingFromName (line 119) | CLD2::Encoding EncodingFromName(const char *name) {

FILE: bindings/pycldmodule.cc
  type cld_encoding (line 37) | struct cld_encoding {
  type CLD2 (line 43) | namespace CLD2 {
  type PYCLDState (line 48) | struct PYCLDState {
  type PYCLDState (line 56) | struct PYCLDState
  function PyObject (line 59) | static PyObject *
  function cld_traverse (line 354) | static int cld_traverse(PyObject *m, visitproc visit, void *arg) {
  function cld_clear (line 359) | static int cld_clear(PyObject *m) {
  type PyModuleDef (line 364) | struct PyModuleDef
  type PYCLDState (line 368) | struct PYCLDState
  function PyMODINIT_FUNC (line 389) | PyMODINIT_FUNC

FILE: bindings/test.py
  class TestCLD (line 266) | class TestCLD(unittest.TestCase):
    method runOne (line 271) | def runOne(self, expectedLangName, s, doFull = False):
    method test_basic (line 312) | def test_basic(self):
    method test_vectors (line 319) | def test_vectors(self):
    method test_encoding_hint (line 328) | def test_encoding_hint(self):
    method test_language_hint (line 334) | def test_language_hint(self):
    method test_top_level_domain_hint (line 341) | def test_top_level_domain_hint(self):
    method test_language_http_headers_hint (line 348) | def test_language_http_headers_hint(self):
    method test_debug_flags (line 353) | def test_debug_flags(self):
    method test_unreliable (line 362) | def test_unreliable(self):
    method test_random_bytes (line 367) | def test_random_bytes(self):
    method test_invalid_utf8 (line 377) | def test_invalid_utf8(self):
    method test_best_effort (line 388) | def test_best_effort(self):

FILE: bindings/test_shuffle.py
  function readlines (line 15) | def readlines(f):

FILE: cld2/internal/cld2_do_score.cc
  function Language (line 34) | Language ScoreOneLine(const char* buffer, int buffer_length,
  function ReadLine (line 72) | bool ReadLine(FILE* infile, char* buffer, size_t maxlen) {
  function IsComment (line 85) | bool IsComment(const char* buffer) {
  function SkipOneField (line 95) | int SkipOneField(const string& src, int pos) {
  function GetLangScript (line 107) | void GetLangScript(const string& src,
  function GetTextBeginPos (line 146) | int GetTextBeginPos(const string& src) {
  function Divisor (line 169) | inline double Divisor(double x) {
  function Flush (line 173) | void Flush(Language cur_lang, ULScript ulscript,
  function BytesPer1KB (line 192) | int BytesPer1KB(int i, int j) {
  function main (line 197) | int main(int argc, char *argv[]) {

FILE: cld2/internal/cld2_dynamic_data.cc
  type CLD2DynamicData (line 20) | namespace CLD2DynamicData {
    function setDebug (line 22) | void setDebug(int debug) {
    function mem_compare (line 26) | bool mem_compare(const void* data1, const void* data2, const int lengt...
    function calculateHeaderSize (line 45) | CLD2::uint32 calculateHeaderSize(CLD2::uint32 numTables) {
    function dumpHeader (line 51) | void dumpHeader(FileHeader* header) {
    function verify (line 113) | bool verify(const CLD2::ScoringTables* realData,
    function isLittleEndian (line 198) | bool isLittleEndian() {
    function coreAssumptionsOk (line 206) | bool coreAssumptionsOk() {

FILE: cld2/internal/cld2_dynamic_data.h
  function namespace (line 135) | namespace CLD2DynamicData {

FILE: cld2/internal/cld2_dynamic_data_extractor.cc
  type CLD2DynamicDataExtractor (line 23) | namespace CLD2DynamicDataExtractor {
    function setDebug (line 25) | void setDebug(int debug) {
    function advance (line 29) | int advance(FILE* f, CLD2::uint32 position) {
    function writeChunk (line 39) | void writeChunk(FILE *f, const void* data, CLD2::uint32 startAt, CLD2:...
    function writeDataFile (line 46) | void writeDataFile(const CLD2::ScoringTables* data,
    function initTableHeaders (line 162) | void initTableHeaders(const CLD2::CLD2TableSummary** summaries,
    function alignAll (line 199) | void alignAll(CLD2DynamicData::FileHeader* header, const int alignment) {
    function initDeltaHeaders (line 333) | void initDeltaHeaders(CLD2DynamicData::FileHeader* header, const CLD2:...
    function initUtf8Headers (line 338) | void initUtf8Headers(CLD2DynamicData::FileHeader* header, const CLD2::...

FILE: cld2/internal/cld2_dynamic_data_extractor.h
  function namespace (line 24) | namespace CLD2DynamicDataExtractor {

FILE: cld2/internal/cld2_dynamic_data_loader.cc
  type CLD2DynamicDataLoader (line 30) | namespace CLD2DynamicDataLoader {
    function unloadDataFile (line 141) | void unloadDataFile(CLD2::ScoringTables** scoringTables,
    function unloadDataRaw (line 155) | void unloadDataRaw(CLD2::ScoringTables** scoringTables) {

FILE: cld2/internal/cld2_dynamic_data_loader.h
  function namespace (line 22) | namespace CLD2DynamicDataLoader {

FILE: cld2/internal/cld2_dynamic_data_tool.cc
  type CLD2 (line 32) | namespace CLD2 {
  function main (line 52) | int main(int argc, char** argv) {

FILE: cld2/internal/cld2_generated_cjk_compatible.cc
  type CLD2 (line 20) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_deltaocta0122.cc
  type CLD2 (line 37) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_deltaocta0527.cc
  type CLD2 (line 37) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_deltaoctachrome.cc
  type CLD2 (line 45) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_deltaoctachrome0122.cc
  type CLD2 (line 45) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_deltaoctachrome0614.cc
  type CLD2 (line 41) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_distinctocta0122.cc
  type CLD2 (line 36) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_distinctocta0527.cc
  type CLD2 (line 36) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_distinctoctachrome.cc
  type CLD2 (line 44) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_distinctoctachrome0122.cc
  type CLD2 (line 44) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_distinctoctachrome0604.cc
  type CLD2 (line 43) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_octa2_dummy.cc
  type CLD2 (line 20) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome0122_16.cc
  type CLD2 (line 46) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome0122_19.cc
  type CLD2 (line 46) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome0122_2.cc
  type CLD2 (line 44) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome0715.cc
  type CLD2 (line 38) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome_16.cc
  type CLD2 (line 46) | namespace CLD2 {

FILE: cld2/internal/cld2_generated_quadchrome_2.cc
  type CLD2 (line 46) | namespace CLD2 {

FILE: cld2/internal/cld2_unittest.cc
  function OneTest (line 193) | bool OneTest(int flags, bool get_vector,
  function InitHtmlOut (line 265) | void InitHtmlOut(int flags) {
  function FinishHtmlOut (line 279) | void FinishHtmlOut(int flags) {
  function RunTests (line 288) | int RunTests (int flags, bool get_vector, const char* data_file) {
  function main (line 428) | int main(int argc, char** argv) {

FILE: cld2/internal/cld2_unittest_full.cc
  type CLD2 (line 31) | namespace CLD2 {
    function OneTest (line 287) | bool OneTest(int flags, bool get_vector,
    function InitHtmlOut (line 351) | void InitHtmlOut(int flags) {
    function FinishHtmlOut (line 365) | void FinishHtmlOut(int flags) {
    function RunTests (line 373) | int RunTests (int flags, bool get_vector) {
  function main (line 400) | int main(int argc, char** argv) {

FILE: cld2/internal/cld2tablesummary.h
  function namespace (line 25) | namespace CLD2 {

FILE: cld2/internal/cld_generated_cjk_delta_bi_32.cc
  type CLD2 (line 21) | namespace CLD2 {

FILE: cld2/internal/cld_generated_cjk_delta_bi_4.cc
  type CLD2 (line 41) | namespace CLD2 {

FILE: cld2/internal/cld_generated_cjk_uni_prop_80.cc
  type CLD2 (line 31) | namespace CLD2 {

FILE: cld2/internal/cld_generated_score_quad_octa_0122.cc
  type CLD2 (line 25) | namespace CLD2 {

FILE: cld2/internal/cld_generated_score_quad_octa_0122_2.cc
  type CLD2 (line 16) | namespace CLD2 {

FILE: cld2/internal/cld_generated_score_quad_octa_1024_256.cc
  type CLD2 (line 54) | namespace CLD2 {

FILE: cld2/internal/cld_generated_score_quad_octa_2.cc
  type CLD2 (line 25) | namespace CLD2 {

FILE: cld2/internal/cldutil.cc
  type CLD2 (line 28) | namespace CLD2 {
    function ProcessProbV2Tote (line 128) | void ProcessProbV2Tote(uint32 probs, Tote* tote) {
    function GetLangScore (line 141) | int GetLangScore(uint32 probs, uint8 pslang) {
    function DoBigramScoreV3 (line 163) | int DoBigramScoreV3(const CLD2TableSummary* bigram_obj,
    function GetUniHits (line 201) | int GetUniHits(const char* text,
    function GetBiHits (line 248) | void GetBiHits(const char* text,
    function GetQuadHits (line 315) | int GetQuadHits(const char* text,
    function GetOctaHits (line 416) | void GetOctaHits(const char* text,
    function ReliabilityDelta (line 553) | int ReliabilityDelta(int value1, int value2, int gramcount) {
    function ReliabilityExpected (line 587) | int ReliabilityExpected(int actual_score_1kb, int expected_score_1kb) {
    function uint32 (line 610) | uint32 MakeLangProb(Language lang, int qprob) {

FILE: cld2/internal/cldutil.h
  function namespace (line 28) | namespace CLD2 {

FILE: cld2/internal/cldutil_offline.cc
  type CLD2 (line 29) | namespace CLD2 {
    function ProcessProbV2Tote (line 35) | void ProcessProbV2Tote(uint32 probs, Tote* tote) {
    function uint32 (line 48) | uint32 GetNextLangprob(ULScriptRType rtype,
    function DoWordScore (line 99) | void DoWordScore(const char* isrc, int srclen, ULScript ulscript,
    function uint8 (line 137) | uint8 FindBestProb3Match(const uint8* prob3) {
    function GetProb (line 159) | int GetProb(Language lang, uint32 probs) {
    function uint32 (line 177) | uint32 ApproxProb3(int propval) {
    function uint32 (line 184) | uint32 ProbPackV2(uint8* plang3, uint8* prob3) {
    function ProbUnpackV2 (line 202) | void ProbUnpackV2(uint32 prob, uint8* plang3, uint8* prob3) {

FILE: cld2/internal/cldutil_offline.h
  function namespace (line 28) | namespace CLD2 {

FILE: cld2/internal/cldutil_shared.cc
  type CLD2 (line 27) | namespace CLD2 {
    function uint32 (line 107) | uint32 BiHashV2(const char* word_ptr, int bytecount) {
    function uint32 (line 167) | uint32 QuadHashV2Mix(const char* word_ptr, int bytecount, uint32 prepo...
    function uint32 (line 196) | uint32 QuadHashV2(const char* word_ptr, int bytecount) {
    function uint32 (line 208) | uint32 QuadHashV2Underscore(const char* word_ptr, int bytecount) {
    function uint64 (line 234) | uint64 OctaHash40Mix(const char* word_ptr, int bytecount, uint64 prepo...
    function uint64 (line 348) | uint64 OctaHash40(const char* word_ptr, int bytecount) {
    function uint64 (line 364) | uint64 OctaHash40underscore(const char* word_ptr, int bytecount) {
    function uint64 (line 384) | uint64 PairHash(uint64 worda_hash, uint64 wordb_hash) {
    function UniLen (line 396) | int UniLen(const char* src) {
    function BiLen (line 403) | int BiLen(const char* src) {
    function QuadLen (line 411) | int QuadLen(const char* src) {
    function OctaLen (line 421) | int OctaLen(const char* src) {

FILE: cld2/internal/cldutil_shared.h
  function namespace (line 27) | namespace CLD2 {

FILE: cld2/internal/compact_lang_det.cc
  type CLD2 (line 28) | namespace CLD2 {
    function Language (line 44) | Language DetectLanguageCheckUTF8(
    function Language (line 59) | Language DetectLanguage(
    function Language (line 98) | Language DetectLanguageSummary(
    function Language (line 138) | Language DetectLanguageSummary(
    function Language (line 181) | Language ExtDetectLanguageSummary(
    function Language (line 221) | Language ExtDetectLanguageSummary(
    function Language (line 261) | Language ExtDetectLanguageSummary(
    function Language (line 317) | Language ExtDetectLanguageSummaryCheckUTF8(
    function Language (line 372) | Language ExtDetectLanguageSummary(

FILE: cld2/internal/compact_lang_det_hint_code.cc
  type CLD2 (line 29) | namespace CLD2 {
    function SetCLDPriorWeight (line 925) | inline void SetCLDPriorWeight(int w, OneCLDLangPrior* olp) {
    function SetCLDPriorLang (line 928) | inline void SetCLDPriorLang(Language lang, OneCLDLangPrior* olp) {
    function OneCLDLangPrior (line 932) | OneCLDLangPrior PackCLDPriorLangWeight(Language lang, int w) {
    function MaxInt (line 936) | inline int MaxInt(int a, int b) {
    function MergeCLDLangPriorsMax (line 941) | void MergeCLDLangPriorsMax(OneCLDLangPrior olp, CLDLangPriors* lps) {
    function MergeCLDLangPriorsBoost (line 958) | void MergeCLDLangPriorsBoost(OneCLDLangPrior olp, CLDLangPriors* lps) {
    function TrimCLDLangPriors (line 975) | void TrimCLDLangPriors(int max_entries, CLDLangPriors* lps) {
    function CountCommas (line 998) | int CountCommas(const string& langtags) {
    function LangTagLookup (line 1007) | const LangTagLookup* DoLangTagLookup(const char* key,
    function TLDLookup (line 1027) | const TLDLookup* DoTLDLookup(const char* key,
    function string (line 1050) | string TrimCLDLangTagsHint(const string& langtags) {
    function int32 (line 1209) | int32 FindTagStart(const char* utf8_body, int32 pos, int32 max_pos) {
    function int32 (line 1231) | int32 FindTagEnd(const char* utf8_body, int32 pos, int32 max_pos) {
    function int32 (line 1244) | int32 FindQuoteStart(const char* utf8_body, int32 pos, int32 max_pos) {
    function int32 (line 1255) | int32 FindQuoteEnd(const char* utf8_body, int32 pos, int32 max_pos) {
    function int32 (line 1269) | int32 FindEqualSign(const char* utf8_body, int32 pos, int32 max_pos) {
    function FindBefore (line 1306) | bool FindBefore(const char* utf8_body,
    function FindAfter (line 1328) | bool FindAfter(const char* utf8_body,
    function string (line 1355) | string CopyOneQuotedString(const char* utf8_body,
    function string (line 1382) | string CopyQuotedString(const char* utf8_body,
    function SetCLDLangTagsHint (line 1394) | void SetCLDLangTagsHint(const string& langtags, CLDLangPriors* langpri...
    function SetCLDContentLangHint (line 1439) | void SetCLDContentLangHint(const char* contentlang, CLDLangPriors* lan...
    function SetCLDTLDHint (line 1446) | void SetCLDTLDHint(const char* tld, CLDLangPriors* langpriors) {
    function SetCLDEncodingHint (line 1466) | void SetCLDEncodingHint(Encoding enc, CLDLangPriors* langpriors) {
    function SetCLDLanguageHint (line 1503) | void SetCLDLanguageHint(Language lang, CLDLangPriors* langpriors) {
    function string (line 1510) | string DumpCLDLangPriors(const CLDLangPriors* langpriors) {
    function string (line 1557) | string GetLangTagsFromHtml(const char* utf8_body, int32 utf8_body_len,

FILE: cld2/internal/compact_lang_det_hint_code.h
  function namespace (line 28) | namespace CLD2 {

FILE: cld2/internal/compact_lang_det_impl.cc
  type CLD2 (line 44) | namespace CLD2 {
    function SpanInterchangeValid (line 74) | int SpanInterchangeValid(const char* src, int byte_length) {
    function isDataLoaded (line 105) | bool isDataLoaded() { return dynamicDataLoaded; }
    function isDataDynamic (line 106) | bool isDataDynamic() { return true; }
    function loadDataFromFile (line 108) | void loadDataFromFile(const char* fileName) {
    function loadDataFromRawAddress (line 123) | void loadDataFromRawAddress(const void* rawAddress, const uint32_t len...
    function unloadData (line 138) | void unloadData() {
    function isDataLoaded (line 169) | bool isDataLoaded() { return true; }
    function isDataDynamic (line 170) | bool isDataDynamic() { return false; }
    function loadDataFromFile (line 172) | void loadDataFromFile(const char* fileName) {
    function loadDataFromRawAddress (line 176) | void loadDataFromRawAddress(const void* rawAddress, const uint32_t len...
    function unloadData (line 180) | void unloadData() {
    function FlagFinish (line 433) | inline bool FlagFinish(int flags) {return (flags & kCLDFlagFinish) != 0;}
    function FlagSqueeze (line 434) | inline bool FlagSqueeze(int flags) {return (flags & kCLDFlagSqueeze) !...
    function FlagRepeats (line 435) | inline bool FlagRepeats(int flags) {return (flags & kCLDFlagRepeats) !...
    function FlagTop40 (line 436) | inline bool FlagTop40(int flags) {return (flags & kCLDFlagTop40) != 0;}
    function FlagShort (line 437) | inline bool FlagShort(int flags) {return (flags & kCLDFlagShort) != 0;}
    function FlagHint (line 438) | inline bool FlagHint(int flags) {return (flags & kCLDFlagHint) != 0;}
    function FlagUseWords (line 439) | inline bool FlagUseWords(int flags) {return (flags & kCLDFlagUseWords)...
    function FlagBestEffort (line 440) | inline bool FlagBestEffort(int flags) {
    function DemoteNotTop40 (line 467) | void DemoteNotTop40(Tote* chunk_tote, uint16 psplus_one) {
    function PrintText (line 471) | void PrintText(FILE* f, Language cur_lang, const string& temp) {
    function BackscanToSpace (line 491) | int BackscanToSpace(const char* src, int limit) {
    function ForwardscanToSpace (line 509) | int ForwardscanToSpace(const char* src, int limit) {
    function CountPredictedBytes (line 541) | int CountPredictedBytes(const char* isrc, int src_len, int* hash, int*...
    function CountSpaces4 (line 586) | int CountSpaces4(const char* src, int src_len) {
    function CheapRepWordsInplace (line 610) | int CheapRepWordsInplace(char* isrc, int src_len, int* hash, int* tbl) {
    function CheapRepWordsInplaceOverwrite (line 697) | int CheapRepWordsInplaceOverwrite(char* isrc, int src_len, int* hash, ...
    function CheapSqueezeInplace (line 785) | int CheapSqueezeInplace(char* isrc,
    function CheapSqueezeInplaceOverwrite (line 869) | int CheapSqueezeInplaceOverwrite(char* isrc,
    function CheapSqueezeTriggerTest (line 952) | bool CheapSqueezeTriggerTest(const char* src, int src_len, int testsiz...
    function RemoveExtendedLanguages (line 977) | void RemoveExtendedLanguages(DocTote* doc_tote) {
    function RemoveUnreliableLanguages (line 997) | void RemoveUnreliableLanguages(DocTote* doc_tote,
    function MoveLang1ToLang2 (line 1105) | void MoveLang1ToLang2(Language lang1, Language lang2,
    function RefineScoredClosePairs (line 1154) | void RefineScoredClosePairs(DocTote* doc_tote,
    function ApplyAllLanguageHints (line 1206) | void ApplyAllLanguageHints(Tote* chunk_tote, int tote_grams,
    function PrintHtmlEscapedText (line 1211) | void PrintHtmlEscapedText(FILE* f, const char* txt, int len) {
    function PrintLang (line 1216) | void PrintLang(FILE* f, Tote* chunk_tote,
    function PrintTopLang (line 1227) | void PrintTopLang(Language top_lang) {
    function PrintTopLangSpeculative (line 1236) | void PrintTopLangSpeculative(Language top_lang) {
    function PrintLangs (line 1247) | void PrintLangs(FILE* f, const Language* language3, const int* percent3,
    function GetNormalizedScore (line 1269) | double GetNormalizedScore(Language lang, ULScript ulscript,
    function ExtractLangEtc (line 1276) | void ExtractLangEtc(DocTote* doc_tote, int total_text_bytes,
    function IsFIGS (line 1386) | bool IsFIGS(Language lang) {
    function IsEFIGS (line 1394) | bool IsEFIGS(Language lang) {
    function CalcSummaryLang (line 1414) | void CalcSummaryLang(DocTote* doc_tote, int total_text_bytes,
    function AddLangPriorBoost (line 1524) | void AddLangPriorBoost(Language lang, uint32 langprob,
    function AddOneWhack (line 1545) | void AddOneWhack(Language whacker_lang, Language whackee_lang,
    function AddCloseLangWhack (line 1563) | void AddCloseLangWhack(Language lang, ScoringContext* scoringcontext) {
    function ApplyHints (line 1587) | void ApplyHints(const char* buffer,
    function FinishResultVector (line 1688) | void FinishResultVector(int lo, int hi, ResultChunkVector* vec) {
    function Language (line 1707) | Language DetectLanguageSummaryV2(

FILE: cld2/internal/compact_lang_det_impl.h
  function namespace (line 28) | namespace CLD2 {

FILE: cld2/internal/compact_lang_det_test.cc
  type CLD2 (line 36) | namespace CLD2 {
    function uint64 (line 66) | static inline uint64 Microseconds(const struct timeval& t) {
    function Readline (line 74) | bool Readline(FILE* infile, char* buffer) {
    function IsComment (line 87) | bool IsComment(char* buffer) {
    function DumpExtLang (line 97) | void DumpExtLang(int flags,
    function DumpLanguages (line 159) | void DumpLanguages(Language summary_lang,
    function main (line 204) | int main(int argc, char** argv) {
  function main (line 413) | int main(int argc, char *argv[]) {

FILE: cld2/internal/debug.cc
  type CLD2 (line 29) | namespace CLD2 {
    function string (line 32) | string GetUniAt(const char* text) {
    function string (line 41) | string GetBiAt(const char* text) {
    function string (line 50) | string GetQuadAt(const char* text) {
    function string (line 61) | string GetOctaAt(const char* text) {
    function string (line 72) | string GetOcta2At(const char* text) {
    function string (line 87) | string FmtLP(ULScript ulscript, uint8 pslang, uint8 qprob) {
    function string (line 99) | string GetLangProbTxt(const ScoringContext* scoringcontext, uint32 lan...
    function string (line 127) | string GetScoreTxt(const ScoringContext* scoringcontext,
    function GetBackColor (line 171) | static int GetBackColor(Language lang, bool lighten) {
    function GetTextColor (line 191) | static int GetTextColor(Language lang, bool lighten) {
    function string (line 209) | string GetPlainEscapedText(const string& txt) {
    function string (line 225) | string GetHtmlEscapedText(const string& txt) {
    function string (line 251) | string GetColorHtmlEscapedText(Language lang, const string& txt) {
    function string (line 262) | string GetLangColorHtmlEscapedText(Language lang, const string& txt) {
    function CLD2_Debug (line 275) | void CLD2_Debug(const char* text,
    function CLD2_Debug2 (line 411) | void CLD2_Debug2(const char* text,
    function DumpResultChunkVector (line 463) | void DumpResultChunkVector(FILE* f, const char* src,

FILE: cld2/internal/debug.h
  function namespace (line 27) | namespace CLD2 {

FILE: cld2/internal/debug_empty.cc
  type CLD2 (line 25) | namespace CLD2 {
    function string (line 27) | string GetPlainEscapedText(const string& txt) {return string("");}
    function string (line 29) | string GetHtmlEscapedText(const string& txt) {return string("");}
    function string (line 31) | string GetColorHtmlEscapedText(Language lang, const string& txt) {
    function string (line 35) | string GetLangColorHtmlEscapedText(Language lang, const string& txt) {
    function CLD2_Debug (line 44) | void CLD2_Debug(const char* text,
    function CLD2_Debug2 (line 54) | void CLD2_Debug2(const char* text,
    function DumpResultChunkVector (line 60) | void DumpResultChunkVector(FILE* f, const char* src,

FILE: cld2/internal/fixunicodevalue.cc
  type CLD2 (line 22) | namespace CLD2 {
    function char32 (line 29) | char32 FixUnicodeValue(char32 uv) {

FILE: cld2/internal/fixunicodevalue.h
  function namespace (line 29) | namespace CLD2 {

FILE: cld2/internal/generated_distinct_bi_0.cc
  type CLD2 (line 20) | namespace CLD2 {

FILE: cld2/internal/generated_entities.cc
  type CLD2 (line 22) | namespace CLD2 {

FILE: cld2/internal/generated_language.cc
  type CLD2 (line 24) | namespace CLD2 {

FILE: cld2/internal/generated_language.h
  function namespace (line 27) | namespace CLD2 {

FILE: cld2/internal/generated_ulscript.cc
  type CLD2 (line 24) | namespace CLD2 {

FILE: cld2/internal/generated_ulscript.h
  function namespace (line 24) | namespace CLD2 {

FILE: cld2/internal/getonescriptspan.cc
  type CLD2 (line 33) | namespace CLD2 {
    function runetochar (line 249) | int runetochar(char *str, const char32 *rune) {
    function LookupEntity (line 292) | int LookupEntity(const char* entity_name, int entity_len) {
    function ascii_isdigit (line 303) | bool ascii_isdigit(char c) {
    function ascii_isxdigit (line 306) | bool ascii_isxdigit(char c) {
    function ascii_isalnum (line 312) | bool ascii_isalnum(char c) {
    function hex_digit_to_int (line 318) | int hex_digit_to_int(char c) {
    function int32 (line 325) | static int32 strto32_base10(const char* nptr, const char* limit,
    function int32 (line 357) | static int32 strto32_base16(const char* nptr, const char* limit,
    function ReadEntity (line 393) | int ReadEntity(const char* src, int srcn, int* src_consumed) {
    function EntityToBuffer (line 454) | void EntityToBuffer(const char* src, int len, char* dst,
    function IsSpecial (line 471) | bool inline IsSpecial(char c) {
    function ScanToLetterOrSpecial (line 480) | int ScanToLetterOrSpecial(const char* src, int len) {
    function ScanToPossibleLetter (line 503) | int ScanToPossibleLetter(const char* isrc, int len, int max_exit_state) {
    function EqCase (line 649) | inline bool EqCase(char uplow, char c) {
    function NeqLetter (line 655) | inline bool NeqLetter(char c) {
    function WS (line 661) | inline bool WS(char c) {
    function GetUTF8LetterScriptNum (line 1085) | int GetUTF8LetterScriptNum(const char* src) {

FILE: cld2/internal/getonescriptspan.h
  function namespace (line 27) | namespace CLD2 {

FILE: cld2/internal/integral_types.h
  function namespace (line 16) | namespace CLD2 {

FILE: cld2/internal/lang_script.cc
  type CLD2 (line 32) | namespace CLD2 {
    function ULScriptRType (line 154) | ULScriptRType ULScriptRecognitionType(ULScript ulscript) {
    function ULScript (line 228) | ULScript LanguageRecognizedScript(Language lang, int n) {
    function LanguageCloseSet (line 261) | int LanguageCloseSet(Language lang) {
    function Language (line 314) | Language DefaultLanguage(ULScript ulscript) {
    function uint8 (line 320) | uint8 PerScriptNumber(ULScript ulscript, Language lang) {
    function Language (line 328) | Language FromPerScriptNumber(ULScript ulscript, uint8 perscript_number) {
    function IsLatnLanguage (line 344) | bool IsLatnLanguage(Language lang) {
    function IsOthrLanguage (line 350) | bool IsOthrLanguage(Language lang) {
    function BinarySearch (line 361) | int BinarySearch(const char* key, int lo, int hi, const CharIntPair* c...
    function Language (line 376) | Language MakeLang(int i) {return static_cast<Language>(i);}
    function Language (line 380) | Language GetLanguageFromName(const char* src) {
    function ULScript (line 462) | ULScript MakeULScr(int i) {return static_cast<ULScript>(i);}
    function ULScript (line 464) | ULScript GetULScriptFromName(const char* src) {
    function LScript4 (line 552) | int LScript4(ULScript ulscript) {

FILE: cld2/internal/lang_script.h
  function namespace (line 71) | namespace CLD2 {

FILE: cld2/internal/langspan.h
  function namespace (line 26) | namespace CLD2 {

FILE: cld2/internal/offsetmap.cc
  type CLD2 (line 28) | namespace CLD2 {
    function OpPart (line 57) | static inline char OpPart(const char c) {
    function LenPart (line 60) | static inline char LenPart(const char c) {

FILE: cld2/internal/offsetmap.h
  function namespace (line 46) | namespace CLD2 {

FILE: cld2/internal/port.h
  function namespace (line 25) | namespace CLD2 {

FILE: cld2/internal/scoreonescriptspan.cc
  type CLD2 (line 31) | namespace CLD2 {
    function AddLangProb (line 35) | void AddLangProb(uint32 langprob, Tote* chunk_tote) {
    function ZeroPSLang (line 39) | void ZeroPSLang(uint32 langprob, Tote* chunk_tote) {
    function SameCloseSet (line 44) | bool SameCloseSet(uint16 lang1, uint16 lang2) {
    function SameCloseSet (line 51) | bool SameCloseSet(Language lang1, Language lang2) {
    function SetChunkSummary (line 60) | void SetChunkSummary(ULScript ulscript, int first_linear_in_chunk,
    function IsSingleLang (line 99) | bool IsSingleLang(uint32 langprob) {
    function AddDistinctBoost1 (line 105) | void AddDistinctBoost1(uint32 langprob, ScoringContext* scoringcontext) {
    function AddDistinctBoost2 (line 112) | void AddDistinctBoost2(uint32 langprob, ScoringContext* scoringcontext) {
    function ScoreBoosts (line 125) | void ScoreBoosts(const ScoringContext* scoringcontext, Tote* chunk_tot...
    function GetTextSpanOffsets (line 164) | void GetTextSpanOffsets(const ScoringHitBuffer* hitbuffer,
    function DiffScore (line 187) | int DiffScore(const CLD2TableSummary* obj, int indirect,
    function ScoreOneChunk (line 208) | void ScoreOneChunk(const char* text, ULScript ulscript,
    function ScoreAllHits (line 265) | void ScoreAllHits(const char* text,  ULScript ulscript,
    function SummaryBufferToDocTote (line 305) | void SummaryBufferToDocTote(const SummaryBuffer* summarybuffer,
    function ItemToVector (line 323) | void ItemToVector(ScriptScanner* scanner,
    function uint16 (line 362) | uint16 PriorVecLang(const ResultChunkVector* vec) {
    function uint16 (line 367) | uint16 NextChunkLang(const SummaryBuffer* summarybuffer, int i) {
    function SummaryBufferToVector (line 389) | void SummaryBufferToVector(ScriptScanner* scanner, const char* text,
    function JustOneItemToVector (line 513) | void JustOneItemToVector(ScriptScanner* scanner, const char* text,
    function PrintableIndirect (line 555) | inline int PrintableIndirect(int x) {
    function DumpHitBuffer (line 561) | void DumpHitBuffer(FILE* df, const char* text,
    function DumpLinearBuffer (line 613) | void DumpLinearBuffer(FILE* df, const char* text,
    function DumpChunkSummary (line 636) | void DumpChunkSummary(FILE* df, const ChunkSummary* cs) {
    function DumpSummaryBuffer (line 652) | void DumpSummaryBuffer(FILE* df, const SummaryBuffer* summarybuffer) {
    function BetterBoundary (line 671) | int BetterBoundary(const char* text,
    function SharpenBoundaries (line 780) | void SharpenBoundaries(const char* text,
    function uint32 (line 848) | uint32 DefaultLangProb(ULScript ulscript) {
    function LinearizeAll (line 856) | void LinearizeAll(ScoringContext* scoringcontext, bool score_cjk,
    function ChunkAll (line 978) | void ChunkAll(int letter_offset, bool score_cjk, ScoringHitBuffer* hit...
    function LinearizeHitBuffer (line 1043) | void LinearizeHitBuffer(int letter_offset,
    function ProcessHitBuffer (line 1067) | void ProcessHitBuffer(const LangSpan& scriptspan,
    function SpliceHitBuffer (line 1118) | void SpliceHitBuffer(ScoringHitBuffer* hitbuffer, int next_offset) {
    function ScoreEntireScriptSpan (line 1132) | void ScoreEntireScriptSpan(const LangSpan& scriptspan,
    function ScoreCJKScriptSpan (line 1163) | void ScoreCJKScriptSpan(const LangSpan& scriptspan,
    function ScoreQuadScriptSpan (line 1231) | void ScoreQuadScriptSpan(const LangSpan& scriptspan,
    function ScoreOneScriptSpan (line 1302) | void ScoreOneScriptSpan(const LangSpan& scriptspan,

FILE: cld2/internal/scoreonescriptspan.h
  function namespace (line 87) | namespace CLD2 {
  type ScoringHit (line 166) | typedef struct {
  type LinearHitType (line 171) | typedef enum {
  type LangprobHit (line 179) | typedef struct {
  type ScoringHitBuffer (line 186) | typedef struct {
  type ChunkSpan (line 229) | typedef struct {
  type ChunkSummary (line 240) | typedef struct {
  type SummaryBuffer (line 261) | typedef struct {

FILE: cld2/internal/scoreutf8text.cc
  type CLD2 (line 34) | namespace CLD2 {
    function ReadLine (line 67) | bool ReadLine(FILE* infile, char* buffer, size_t maxlen) {
    function IsComment (line 80) | bool IsComment(char* buffer) {
    function SkipOneField (line 92) | int SkipOneField(const string& src, int pos) {
    function GetStatedLangScript (line 104) | void GetStatedLangScript(const string& src, string* lang_script, strin...
    function GetTextBeginPos (line 138) | int GetTextBeginPos(const string& src) {
    function CarefulMatch (line 161) | bool CarefulMatch(const char* in_langscript,
    function int32 (line 224) | int32 MapToSmallInt(const string& s, StringIntMap* smap, int* next_sma...
    function InitResult (line 236) | void InitResult() {
    function RecordCLDResult (line 250) | void RecordCLDResult(const char* buffer, const char* in_langscript,
    function FinishResult (line 291) | void FinishResult() {
    function SkipMe (line 379) | bool SkipMe(char c) {
    function Trim (line 386) | int Trim(char* buffer) {
    function LangDetLinesOfFile (line 393) | void LangDetLinesOfFile(int flags, bool get_vector, const char* fname) {
    function main (line 502) | int main (int argc, char *argv[])
  function main (line 544) | int main(int argc, char *argv[]) {

FILE: cld2/internal/stringpiece.h
  type stringpiece_ssize_type (line 28) | typedef int stringpiece_ssize_type;
  function class (line 30) | class StringPiece {
  function remove_suffix (line 62) | void remove_suffix(stringpiece_ssize_type n) {

FILE: cld2/internal/tote.cc
  type CLD2 (line 25) | namespace CLD2 {

FILE: cld2/internal/tote.h
  function namespace (line 25) | namespace CLD2 {

FILE: cld2/internal/utf8acceptinterchange.h
  function namespace (line 30) | namespace CLD2 {

FILE: cld2/internal/utf8prop_lettermarkscriptnum.h
  function namespace (line 34) | namespace CLD2 {

FILE: cld2/internal/utf8repl_lettermarklower.h
  function namespace (line 33) | namespace CLD2 {

FILE: cld2/internal/utf8scannot_lettermarkspecial.h
  function namespace (line 33) | namespace CLD2 {

FILE: cld2/internal/utf8statetable.cc
  type CLD2 (line 38) | namespace CLD2 {
    function InStateZero (line 164) | static inline bool InStateZero(const UTF8ScanObj* st, const uint8* Tbl) {
    function InStateZero_2 (line 169) | static inline bool InStateZero_2(const UTF8ReplaceObj_2* st,
    function IsPropObj (line 179) | static bool IsPropObj(const UTF8StateMachineObj& obj) {
    function IsPropObj_2 (line 184) | static bool IsPropObj_2(const UTF8StateMachineObj_2& obj) {
    function IsScanObj (line 189) | static bool IsScanObj(const UTF8StateMachineObj& obj) {
    function IsReplaceObj (line 194) | static bool IsReplaceObj(const UTF8StateMachineObj& obj) {
    function IsReplaceObj_2 (line 200) | static bool IsReplaceObj_2(const UTF8StateMachineObj_2& obj) {
    function uint8 (line 207) | uint8 UTF8GenericProperty(const UTF8PropObj* st,
    function UTF8HasGenericProperty (line 258) | bool UTF8HasGenericProperty(const UTF8PropObj& st, const char* src) {
    function uint8 (line 299) | uint8 UTF8GenericPropertyBigOneByte(const UTF8PropObj* st,
    function UTF8HasGenericPropertyBigOneByte (line 352) | bool UTF8HasGenericPropertyBigOneByte(const UTF8PropObj& st, const cha...
    function uint8 (line 390) | uint8 UTF8GenericPropertyTwoByte(const UTF8PropObj_2* st,
    function UTF8HasGenericPropertyTwoByte (line 442) | bool UTF8HasGenericPropertyTwoByte(const UTF8PropObj_2& st, const char...
    function UTF8GenericScan (line 488) | int UTF8GenericScan(const UTF8ScanObj* st,
    function UTF8GenericScanFastAscii (line 588) | int UTF8GenericScanFastAscii(const UTF8ScanObj* st,
    function DoSpecialFixup (line 625) | static int DoSpecialFixup(const unsigned char c,
    function UTF8GenericReplaceInternal (line 636) | static int UTF8GenericReplaceInternal(const UTF8ReplaceObj* st,
    function UTF8GenericReplaceInternalTwoByte (line 905) | static int UTF8GenericReplaceInternalTwoByte(const UTF8ReplaceObj_2* st,
    function UTF8GenericReplace (line 1166) | int UTF8GenericReplace(const UTF8ReplaceObj* st,
    function UTF8GenericReplace (line 1200) | int UTF8GenericReplace(const UTF8ReplaceObj* st,
    function UTF8GenericReplace (line 1218) | int UTF8GenericReplace(const UTF8ReplaceObj* st,
    function UTF8GenericReplaceTwoByte (line 1242) | int UTF8GenericReplaceTwoByte(const UTF8ReplaceObj_2* st,
    function UTF8GenericReplaceTwoByte (line 1278) | int UTF8GenericReplaceTwoByte(const UTF8ReplaceObj_2* st,
    function UTF8GenericReplaceTwoByte (line 1296) | int UTF8GenericReplaceTwoByte(const UTF8ReplaceObj_2* st,
    function UTF8TrimToChars (line 1319) | void UTF8TrimToChars(StringPiece* istr) {

FILE: cld2/internal/utf8statetable.h
  function namespace (line 30) | namespace CLD2 {

FILE: cld2/public/compact_lang_det.h
  function namespace (line 70) | namespace CLD2 {

FILE: cld2/public/encodings.h
  function namespace (line 22) | namespace CLD2 {

FILE: test_pycld2.py
  class CLDTest (line 13) | class CLDTest(unittest.TestCase):
    method test_takes_only_bytes_str (line 14) | def test_takes_only_bytes_str(self):
    method _test_detect_one (line 20) | def _test_detect_one(self, expected_lang_name, s):
    method test_detect (line 29) | def test_detect(self):
    method test_combined_lang_str (line 42) | def test_combined_lang_str(self):
    method test_unreliable (line 49) | def test_unreliable(self):
    method test_random_bytes (line 56) | def test_random_bytes(self):
    method test_bad_bytes (line 63) | def test_bad_bytes(self):
    method test_cext_attrs (line 70) | def test_cext_attrs(self):
Copy disabled (too large) Download .json
Condensed preview — 123 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (54,729K chars).
[
  {
    "path": ".github/workflows/pythonpackage.yaml",
    "chars": 2718,
    "preview": "name: Python Package\n\non:\n  push:\n    branches:\n      - \"**\"\n    tags:\n      - \"v*.*.*\"\n\npermissions:\n  id-token: write "
  },
  {
    "path": ".gitignore",
    "chars": 380,
    "preview": "*.py[cod]\n\n# C extensions\n*.so\n\n# Packages\n*.egg\n*.egg-info\ndist\nbuild\neggs\nparts\nbin\nvar\nsdist\ndevelop-eggs\n.installed."
  },
  {
    "path": ".travis.yml",
    "chars": 415,
    "preview": "# Config file for automatic testing at travis-ci.org\n\nlanguage: python\n\npython:\n  - \"3.7\"\n  - \"3.6\"\n  - \"3.5\"\n  - \"3.4\"\n"
  },
  {
    "path": "LICENSE",
    "chars": 11385,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "MANIFEST.in",
    "chars": 177,
    "preview": "include cld2/internal/*.cc\ninclude cld2/internal/*.h\ninclude cld2/public/*.h\ninclude bindings/*\ninclude README*\ninclude "
  },
  {
    "path": "Makefile",
    "chars": 349,
    "preview": "clean:\n\trm -rf build/\n\trm -rf dist/\n\trm -rf pycld2.egg-info/\n\trm -rf pycld2/__pycache__/\n\trm -f pycld2/_pycld2*.so\n\nbuil"
  },
  {
    "path": "README.md",
    "chars": 6149,
    "preview": "# PYCLD2 - Python Bindings to CLD2\n\nPython bindings for the Compact Langauge Detect 2 (CLD2).\n\n[![Downloads](https://img"
  },
  {
    "path": "bindings/README",
    "chars": 1869,
    "preview": "Dick Sites (and others) at Google graciously provided a new version\n2.0 of the compact language detector, here:\n\n  https"
  },
  {
    "path": "bindings/encodings.cc",
    "chars": 4257,
    "preview": "//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n// you may not use this file except in compliance "
  },
  {
    "path": "bindings/gen_enc.py",
    "chars": 1058,
    "preview": "#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance wit"
  },
  {
    "path": "bindings/gen_test.py",
    "chars": 4415,
    "preview": "# coding=utf-8\n\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except i"
  },
  {
    "path": "bindings/pycldmodule.cc",
    "chars": 27476,
    "preview": "//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n// you may not use this file except in compliance "
  },
  {
    "path": "bindings/test.py",
    "chars": 57287,
    "preview": "# coding=utf-8\n\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except i"
  },
  {
    "path": "bindings/test_shuffle.py",
    "chars": 1754,
    "preview": "import time\nimport re\n\nUSE_FULL_TABLES = True\n\nif USE_FULL_TABLES:\n  import cld2full as cld2detect\nelse:\n  import cld2 a"
  },
  {
    "path": "cld2/LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "cld2/docs/CLD2UnitTestFullOutput.html",
    "chars": 85617,
    "preview": "<html><meta charset=\"UTF-8\"><body>\n<style media=\"print\" type=\"text/css\"> :root { -webkit-print-color-adjust: exact; } </"
  },
  {
    "path": "cld2/docs/CLD2UnitTestOutput.html",
    "chars": 41749,
    "preview": "<html><meta charset=\"UTF-8\"><body>\n<style media=\"print\" type=\"text/css\"> :root { -webkit-print-color-adjust: exact; } </"
  },
  {
    "path": "cld2/docs/CLD2UnitTestOutputVerbose.html",
    "chars": 490494,
    "preview": "<html><meta charset=\"UTF-8\"><body>\n<style media=\"print\" type=\"text/css\"> :root { -webkit-print-color-adjust: exact; } </"
  },
  {
    "path": "cld2/docs/a_little_french_test_input.html",
    "chars": 2107,
    "preview": "Création d'un portail soutenant les wikipédias dont les langues sont\n\"traditionnellement présentes dans l'espace étatiqu"
  },
  {
    "path": "cld2/docs/evaluate_cld1_small_20110406.txt",
    "chars": 7307,
    "preview": "  Evaluate CLD1 20110406 256k\n Language\t\t\t\tPrecision\t\t\t\tRecall\t\t\t\tF-measure\n Name\tCode\tScript\t\tTop five\tN.det\t%\t\tTop fiv"
  },
  {
    "path": "cld2/docs/evaluate_cld2_large_20130720.txt",
    "chars": 15654,
    "preview": "  Evaluate CLD2 large 20130720 1024k\n Language\t\t\t\tPrecision\t\t\t\tRecall\t\t\t\tF-measure\n Name\tCode\tScript\t\tTop five\tN.det\t%\t\t"
  },
  {
    "path": "cld2/docs/evaluate_cld2_large_20140122.txt",
    "chars": 15471,
    "preview": "  Evaluate CLD2 20140122 1024k\n Language\t\t\t\tPrecision\t\t\t\tRecall\t\t\t\tF-measure\n Name\tCode\tScript\t\tTop five\tN.det\t%\t\tTop fi"
  },
  {
    "path": "cld2/docs/evaluate_cld2_small_20130715.txt",
    "chars": 7700,
    "preview": "  Evaluate CLD2 small 20130715 256k\n Language\t\t\t\tPrecision\t\t\t\tRecall\t\t\t\tF-measure\n Name\tCode\tScript\t\tTop five\tN.det\t%\t\tT"
  },
  {
    "path": "cld2/docs/evaluate_cld2_small_20140122.txt",
    "chars": 7362,
    "preview": "  Evaluate CLD2 small 20140122 256k\n Language\t\t\t\tPrecision\t\t\t\tRecall\t\t\t\tF-measure\n Name\tCode\tScript\t\tTop five\tN.det\t%\t\tT"
  },
  {
    "path": "cld2/docs/test_version.html",
    "chars": 465,
    "preview": "<html><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\"><body>\nqpdbmrmxyzptlkuuddlrlrbas las el qpdbmr"
  },
  {
    "path": "cld2/docs/test_version.txt",
    "chars": 370,
    "preview": "qpdbmrmxyzptlkuuddlrlrbas las el qpdbmrmxyzptlkuuddlrlrbas les la qpdbmrmxyzptlkuuddlrlrbas\nqpdbmrmxyzptlkuuddlrlrbas la"
  },
  {
    "path": "cld2/internal/cld2_do_score.cc",
    "chars": 8688,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_dynamic_compat.h",
    "chars": 1347,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_dynamic_data.cc",
    "chars": 10175,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_dynamic_data.h",
    "chars": 11005,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.                                                  \n//                 "
  },
  {
    "path": "cld2/internal/cld2_dynamic_data_extractor.cc",
    "chars": 16297,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_dynamic_data_extractor.h",
    "chars": 3201,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.                                                  \n//                 "
  },
  {
    "path": "cld2/internal/cld2_dynamic_data_loader.cc",
    "chars": 11113,
    "preview": "//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n// you may not use this file except in compliance "
  },
  {
    "path": "cld2/internal/cld2_dynamic_data_loader.h",
    "chars": 4375,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.                                                  \n//                 "
  },
  {
    "path": "cld2/internal/cld2_dynamic_data_tool.cc",
    "chars": 6353,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_cjk_compatible.cc",
    "chars": 17649,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_deltaocta0122.cc",
    "chars": 6586053,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_deltaocta0527.cc",
    "chars": 1682368,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_deltaoctachrome.cc",
    "chars": 436037,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_deltaoctachrome0122.cc",
    "chars": 429149,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_deltaoctachrome0614.cc",
    "chars": 423383,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_distinctocta0122.cc",
    "chars": 767407,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_distinctocta0527.cc",
    "chars": 418340,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_distinctoctachrome.cc",
    "chars": 190635,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_distinctoctachrome0122.cc",
    "chars": 190065,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_distinctoctachrome0604.cc",
    "chars": 186517,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_octa2_dummy.cc",
    "chars": 1594,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome0122_16.cc",
    "chars": 4483262,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome0122_19.cc",
    "chars": 5417404,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome0122_2.cc",
    "chars": 6894623,
    "preview": "//\n// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome0715.cc",
    "chars": 6936631,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome_16.cc",
    "chars": 4613011,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_generated_quadchrome_2.cc",
    "chars": 7232407,
    "preview": "// Copyright 2014 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_unittest.cc",
    "chars": 15203,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2_unittest_full.cc",
    "chars": 13149,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld2tablesummary.h",
    "chars": 2061,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_cjk_delta_bi_32.cc",
    "chars": 585694,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_cjk_delta_bi_4.cc",
    "chars": 79819,
    "preview": "//\n// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License"
  },
  {
    "path": "cld2/internal/cld_generated_cjk_uni_prop_80.cc",
    "chars": 372307,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_score_quad_octa_0122.cc",
    "chars": 27371,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_score_quad_octa_0122_2.cc",
    "chars": 26993,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_score_quad_octa_1024_256.cc",
    "chars": 28609,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cld_generated_score_quad_octa_2.cc",
    "chars": 27393,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil.cc",
    "chars": 25031,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil.h",
    "chars": 2902,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil_offline.cc",
    "chars": 7422,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil_offline.h",
    "chars": 2494,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil_shared.cc",
    "chars": 14916,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/cldutil_shared.h",
    "chars": 16711,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/clean.sh",
    "chars": 1118,
    "preview": "#!/bin/sh\n#\n#  Copyright 2014 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/compact_lang_det.cc",
    "chars": 15058,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compact_lang_det_hint_code.cc",
    "chars": 55510,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compact_lang_det_hint_code.h",
    "chars": 3107,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compact_lang_det_impl.cc",
    "chars": 81444,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compact_lang_det_impl.h",
    "chars": 7308,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compact_lang_det_test.cc",
    "chars": 12495,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/compile.sh",
    "chars": 4008,
    "preview": "#!/bin/sh\n#\n#  Copyright 2014 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/compile_and_test_all.sh",
    "chars": 2614,
    "preview": "#!/bin/sh\n#\n#  Copyright 2014 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/compile_dynamic.sh",
    "chars": 3929,
    "preview": "#!/bin/sh\n#\n#  Copyright 2013 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/compile_full.sh",
    "chars": 3219,
    "preview": "#!/bin/sh\n#\n#  Copyright 2013 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/compile_libs.sh",
    "chars": 2385,
    "preview": "#!/bin/sh\n#\n#  Copyright 2013 Google Inc. All Rights Reserved.\n#\n#  Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "cld2/internal/debug.cc",
    "chars": 15751,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/debug.h",
    "chars": 2015,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/debug_empty.cc",
    "chars": 2084,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/fixunicodevalue.cc",
    "chars": 1583,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/fixunicodevalue.h",
    "chars": 3141,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_distinct_bi_0.cc",
    "chars": 1782,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_entities.cc",
    "chars": 6278,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_language.cc",
    "chars": 141928,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_language.h",
    "chars": 28159,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_ulscript.cc",
    "chars": 26715,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/generated_ulscript.h",
    "chars": 5839,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/getonescriptspan.cc",
    "chars": 38122,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/getonescriptspan.h",
    "chars": 4192,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/integral_types.h",
    "chars": 945,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/lang_script.cc",
    "chars": 20840,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/lang_script.h",
    "chars": 8326,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/langspan.h",
    "chars": 1403,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/offsetmap.cc",
    "chars": 18230,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/offsetmap.h",
    "chars": 5578,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/port.h",
    "chars": 4548,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/scoreonescriptspan.cc",
    "chars": 51789,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/scoreonescriptspan.h",
    "chars": 12114,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/scoreutf8text.cc",
    "chars": 17017,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/stringpiece.h",
    "chars": 2337,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/tote.cc",
    "chars": 6828,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/tote.h",
    "chars": 4074,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/unittest_data.h",
    "chars": 218003,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8acceptinterchange.h",
    "chars": 22278,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8prop_lettermarkscriptnum.h",
    "chars": 82751,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8repl_lettermarklower.h",
    "chars": 40027,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8scannot_lettermarkspecial.h",
    "chars": 70834,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8statetable.cc",
    "chars": 48954,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/internal/utf8statetable.h",
    "chars": 10072,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/public/compact_lang_det.h",
    "chars": 19900,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "cld2/public/encodings.h",
    "chars": 7056,
    "preview": "// Copyright 2013 Google Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");"
  },
  {
    "path": "pycld2/__init__.py",
    "chars": 222,
    "preview": "from ._pycld2 import (\n    DETECTED_LANGUAGES,\n    ENCODINGS,\n    LANGUAGES,\n    VERSION,\n    detect,\n    error,\n    __v"
  },
  {
    "path": "requirements.txt",
    "chars": 14,
    "preview": "wheel==0.23.0\n"
  },
  {
    "path": "setup.cfg",
    "chars": 117,
    "preview": "[metadata]\nlicense_file = LICENSE\n\n[flake8]\nmax-line-length = 120\nhang-close = False\n\n[tool.black]\nline-length = 120\n"
  },
  {
    "path": "setup.py",
    "chars": 4367,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you"
  },
  {
    "path": "test_pycld2.py",
    "chars": 78764,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport os\nimport unittest\n\nimport pycld2 as cld2\n\n# This is just a re-arr"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the aboSamoor/pycld2 GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 123 files (100.6 MB), approximately 12.6M tokens, and a symbol index with 421 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!