[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n:warning: If this is a question about Wn or how to use it, please create a [discussion](https://github.com/goodmami/wn/discussions) instead of an issue.\n\n**To Reproduce**\nPlease enter a minimal working example of the command or Python code that illustrates the problem. To avoid formatting issues, enter the code in a Markdown code block:\n\n```console\n$ python -m wn ...\noutput...\n```\n\nor\n\n```pycon\n>>> import wn\n>>> ...\noutput\n```\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Environment**\nPlease enter the versions of Python and Wn you are using as well as the installed lexicons. You can find these by executing the following commands (adjust your platform-specific Python command as necessary, e.g., `python3` or `py -3`):\n\n```console\npython --version\npython -m wn --version\npython -m wn lexicons\n```\n\n**Additional context**\nAdd any other context about the problem here.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/data-issue.md",
    "content": "---\nname: Data issue\nabout: Report an issue Wn's data index\ntitle: ''\nlabels: data\nassignees: ''\n\n---\n\n**If your issue is regarding the contents of the data** (e.g., a lexicon is missing a word, synset, relation, etc.), then please find the upstream project and file the issue there. You can find links to the projects on Wn's [README](https://github.com/goodmami/wn/). Projects without links are probably managed by the [Open Multilingual Wordnet](https://github.com/omwn/omw-data).\n\n**Use this issue template for the following kinds of issues:**\n1. Request a wordnet lexicon (including new versions of existing lexicons) to be indexed by Wn\n\n   Please provide:\n   - the project name\n   - the name and contact info of the current maintainer\n   - the language of the lexicon (BCP-47 code preferred)\n   - a URL to the project (e.g., on GitHub or other homepage)\n   - a URL to the [WN-LMF](https://github.com/globalwordnet/schemas/) resource\n\n2. Report an issue with an indexed lexicon (e.g., the source URL has changed)\n\n   Please indicate the lexicon id and version and the correct project information, if available.\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "content": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: enhancement\nassignees: ''\n\n---\n\n**Is your feature request related to a problem? Please describe.**\nA clear and concise description of what the problem is. Ex. I'm always frustrated when [...]\n\n**Describe the solution you'd like**\nA clear and concise description of what you want to happen.\n\n**Describe alternatives you've considered**\nA clear and concise description of any alternative solutions or features you've considered.\n\n**Additional context**\nAdd any other context or screenshots about the feature request here.\n"
  },
  {
    "path": ".github/workflows/checks.yml",
    "content": "name: tests\n\non:\n  push:\n    branches: [main]\n  pull_request:\n    branches: [main]\n\njobs:\n  lint:\n    runs-on: ubuntu-latest\n    steps:\n    - uses: actions/checkout@v4\n    - name: Set up Python\n      uses: actions/setup-python@v4\n      with:\n        python-version: \"3.10\"\n    - name: Install Hatch\n      run: pipx install hatch\n    - name: Lint\n      run: hatch fmt --linter --check\n    - name: Type Check\n      run: hatch run mypy:check\n    - name: Check Buildable\n      run: hatch build\n\n  tests:\n    runs-on: ${{ matrix.os }}\n    strategy:\n      matrix:\n        python-version: [\"3.10\", \"3.11\", \"3.12\", \"3.13\", \"3.14\"]\n        os: [ubuntu-latest, windows-latest]\n    steps:\n    - uses: actions/checkout@v4\n    - name: Set up Python ${{ matrix.python-version }}\n      uses: actions/setup-python@v4\n      with:\n        python-version: ${{ matrix.python-version }}\n    - name: Install Hatch\n      run: pipx install hatch\n    - name: Test\n      run: hatch test\n"
  },
  {
    "path": ".github/workflows/publish.yml",
    "content": "name: Build and Publish to PyPI or TestPyPI\n\non: push\n\njobs:\n  build:\n    name: Build distribution\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - name: Set up Python\n        uses: actions/setup-python@v4\n        with:\n          python-version: \"3.x\"\n      - name: Install Hatch\n        run: pipx install hatch\n      - name: Build\n        run: hatch build\n      - name: Store the distribution packages\n        uses: actions/upload-artifact@v4\n        with:\n          name: python-package-distributions\n          path: dist/\n\n  publish-to-pypi:\n    name: Publish distributions to PyPI\n    if: startsWith(github.ref, 'refs/tags/')  # only publish to PyPI on tag pushes\n    needs:\n      - build\n    runs-on: ubuntu-latest\n    environment:\n      name: pypi\n      url: https://pypi.org/p/wn\n    permissions:\n      id-token: write  # IMPORTANT: mandatory for trusted publishing\n    steps:\n      - name: Download the dists\n        uses: actions/download-artifact@v4.1.8\n        with:\n          name: python-package-distributions\n          path: dist/\n      - name: Publish to PyPI\n        uses: pypa/gh-action-pypi-publish@release/v1\n\n  publish-to-testpypi:\n    name: Publish distributions to TestPyPI\n    needs:\n      - build\n    runs-on: ubuntu-latest\n    environment:\n      name: testpypi\n      url: https://test.pypi.org/p/wn\n    permissions:\n      id-token: write  # IMPORTANT: mandatory for trusted publishing\n    steps:\n      - name: Download the dists\n        uses: actions/download-artifact@v4.1.8\n        with:\n          name: python-package-distributions\n          path: dist/\n      - name: Publish to TestPyPI\n        uses: pypa/gh-action-pypi-publish@release/v1\n        with:\n          repository-url: https://test.pypi.org/legacy/\n          skip-existing: true\n"
  },
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\npip-wheel-metadata/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\n\n# Ruff (has its own .gitignore, but in case that ever changes...)\n.ruff_cache\n\n# Sphinx documentation\ndocs/_build/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# PyCharm\n.idea/\n\n# VS Code\n.vscode/\n\n# benchmarking results\n.benchmarks/"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# Change Log\n\n## [Unreleased][unreleased]\n\n\n## [v1.1.0]\n\n**Release date: 2026-03-21**\n\n### Added\n\n* `cache` subcommand ([#313])\n* `wn.config.list_cache_entries()` method ([#313])\n* Support for `WN_DATA_DIR` environment variable ([#314])\n\n### Changed\n\n* The schema hashing function is now resilient to ordering and SQL DB\n  operations ([#319])\n\n### Fixed\n\n* `Synset.translate()` resets the lexicon configuration of the\n  translated sysets ([#316]); `Sense.translate()` and\n  `Word.translate()` derive from `Synset.translate()` so nothing\n  special needs to be done for them.\n\n\n## [v1.0.0]\n\n**Release date: 2026-01-31**\n\nNotable changes in this release:\n* A new version of the database schema requires a database rebuild\n* A new `wn.ili` module deals with ILI files and objects; interlingual\n  queries still use the `Synset.ili` member, which is now a simple `str`\n* The Open English Wordnet versions 2025 and 2025+ are added to the index\n* The Open Multilingual Wordnet 2.0 is added to the index\n\n### Index\n\n* Add `oewn:2025` ([#294])\n* Add `oewn:2025+` ([#294])\n* Add `omw:2.0`, including `2.0` versions of individual OMW lexicons ([#300])\n\n### Schema\n\n* Add `specifier` column to `lexicon` table ([#234])\n* Remove `lexicalized` column from `synsets` and `senses` ([#248])\n* Add `unlexicalized_synsets` and `unlexicalized_senses` tables ([#248])\n* Add `lexicon_rowid` column to `pronunciations` and `tags` ([#303])\n\n### Added\n\n* `wn.lemmas()` function and `Wordnet.lemmas()` method to query all\n  lemmas at once.\n* Support for WN-LMF 1.4 ([#260])\n  - Sense ordering: `index` on `<LexicalEntry>` and `n` on `<Sense>`\n  - New sense relations:\n    - `metaphor`\n    - `has_metaphor`\n    - `metonym`\n    - `has_metonym`\n    - `agent`\n    - `material`\n    - `event`\n    - `instrument`\n    - `location`\n    - `by_means_of`\n    - `undergoer`\n    - `property`\n    - `result`\n    - `state`\n    - `uses`\n    - `destination`\n    - `body_part`\n    - `vehicle`\n  - `ref` attribute for `<Requires>` and `<Extends>` ([#301])\n* `wn.ili` module\n* `wn.Sense.synset_relations()` ([#271])\n* `wn.Pronunciation.lexicon()` method ([#303])\n* `wn.Tag.lexicon()` method ([#303])\n* Support for exporting lexicon extensions ([#103])\n* `wn.compat.sensekey` supports the `oewn-v2` flavor for escaping and\n  unescaping for the scheme used by OEWN 2025 ([#292])\n* `wn.compat.sensekey` supports the `oewn:2025` and `oewn:2025+` lexicons for\n  the `sense_key_getter` and `sense_getter` functions ([#292])\n* `wn.reset_database()` function for reinitializing an outdated database.\n\n### Removed\n\n* `wn.web` module ([#295])\n* `wn.Synset.relation_map()` method ([#271])\n* `wn.Sense.relation_map()` method ([#271])\n\n### Changed\n\n* Default form normalizer uses casefold instead of lower ([#233])\n* `Synset.ili` is a `str` instead of an `ILI` object.\n* `Wordnet.synsets()` method and `wn.synsets()` function's only accepts `ili`\n  `str` arguments for the `ili` parameter again, reverting a change from\n  v0.12.0. This is because `Synset.ili` is now a simple string and `ILI`\n  objects are no longer part of the core `wn` package namespace.\n* `wn.Synset.relations()`: return `wn.Relation` to `wn.Synset` mapping when\n  using `data=True` ([#271])\n* `wn.Sense.relations()`: return `wn.Relation` to `wn.Sense` mapping when\n  using `data=True` ([#271])\n* Queries of relations can specify different lexicons for source and target\n  (part of [#103]; not a user-facing change)\n\n### Fixed\n\n* WN-LMF 1.1+ `<Pronunciation>` exported properly ([#302])\n* WN-LMF 1.1+ `subcat` attribute exported properly ([#302])\n\n### Documentation\n\n* Correct docstring for `wn.taxonomy.taxonomy_depth()` ([#291])\n\n\n## [v0.14.0]\n\n**Release date: 2025-11-16**\n\n### Python Support\n\n* Removed support for Python 3.9\n* Added support for Python 3.14\n\n### Added\n\n* Preliminary XML-only support for WN-LMF 1.4 ([#260])\n* `lexicon()` method on `Form`, `Example`, `Definition`, and `Count` ([#286])\n* `confidence()` method ([#263])\n  - On `Lexicon` defaults to 1.0\n  - On existing `ILI`, defaults to 1.0\n  - On `Word`, `Sense`, `Synset`, `Relation`, `Example`, `Definition`, and\n    `Count`, defaults to the confidence of their lexicon.\n* `/` (index) and `/health` endpoints for `wn.web` (see [#268])\n\n### Changed\n\n* `wn.web`: returns `JSONResponse` on most errors ([#277])\n\n### Fixed\n\n* Encode example metadata on export ([#285])\n* Update LMF to use `https` in `dc` namespace\n\n### Maintenance\n\n* Added `py.typed` file to repository ([#266])\n* Use `tomllib` instead of `tomli` for Python 3.11+\n\n\n## [v0.13.0]\n\n**Release date: 2025-06-13**\n\n### Added\n\n* Support for WN-LMF 1.4 ([#260])\n* `wn.compat` namespace (see [#55])\n* `wn.compat.sensekey` module ([#55]) with methods:\n  - `sense_key_getter()`\n  - `sense_getter()`\n  - `unescape_oewn_sense_key()`\n  - `escape_oewn_sense_key()`\n* `wn.project.get_project()` ([#53])\n* `wn.project.Project` ([#53])\n* `wn.project.ResourceOnlyPackage` ([#53])\n* `path` property on `wn.project.Project` classes ([#53])\n* `delete` parameter on `wn.project.iterpackages()` ([#53])\n\n### Changed\n\n* `wn.add()` allows synset members to be lexical entry IDs for rank\n  calculations ([#255])\n* `wn.add()` no longer requires `partOfSpeech` on synsets; this was\n  not a requirement of WN-LMF nor was it enforced in the database\n* `wn.export()` defaults to `version=\"1.4\"` instead of `\"1.0\"`\n\n\n## [v0.12.0]\n\n**Release date: 2025-04-22**\n\n### Added\n\n* `wn.add_lexical_resource()` to add result of `wn.lmf.load()` to\n  database rather than from a file (pertinent to [#98])\n* `bench/` directory with benchmark tests ([#98])\n* `Synset.definitions()` ([#246])\n\n### Fixed\n\n* `wn.web` casts URL objects to strings for JSON serialization ([#238])\n* Setting `wn.config.data_directory` to an uninitialized directory no\n  longer raises a `sqlite3.OperationalError` ([#250])\n\n### Changed\n\n* `Wordnet` and module-level query functions now issue a warning when\n  the `lang` argument matches more than one lexicon ([#241])\n* `Wordnet.synsets()` now accepts `wn.ILI` objects for the `ili`\n  parameter ([#235])\n* DB-internal rowids are no longer used outside of SQL queries ([#226])\n* The following methods now return standard `str` objects by default\n  and custom classes with a `data=True` argument ([#246]):\n  - `Word.lemma()`\n  - `Word.forms()`\n  - `Sense.examples()`\n  - `Synset.examples()`\n  - `Synset.definition()`\n* `Sense.counts()` now returns a standard `int` object by default and\n  a custom class with a `data=True` argument ([#246])\n* The following classes no longer subclass standard `str` or `int`\n  types and therefore no longer inherit their behavior or interface\n  ([#246]):\n  - `Form`\n  - `Example`\n  - `Definition`\n  - `Count`\n\n\n## [v0.11.0]\n\n**Release date: 2024-12-11**\n\n### Index\n\n* Added `oewn:2024` ([#221])\n\n### Added\n\n* `Relation` class ([#216])\n* `Sense.relation_map()` method ([#216])\n* `Synset.relation_map()` method ([#167], [#216])\n* `W305` blank definition on synset validation ([#151])\n* `W306` blank example on synset validation ([#151])\n* `W307` repeated definition on synset validation ([#151])\n\n### Fixed\n\n* Enumerate repeated entry, sense, synset IDs for validation ([#228])\n\n\n## [v0.10.1]\n\n**Release date: 2024-10-29**\n\n### Fixed\n\n* Follow redirects with `httpx.Client` in `wn._download` ([#211])\n* Remove reverse relations for `pertainym` and `also` ([#213])\n* Validate redundant relations considering `dc:type` ([#215])\n\n### Maintenance\n\n* Added `docs/.readthedocs.yaml` for building docs ([#214])\n\n\n## [v0.10.0]\n\n**Release date: 2024-10-29**\n\n### Python Support\n\n* Removed support for Python 3.8 ([#202])\n* Added support for Python 3.13 ([#202])\n\n### Added\n\n* Support for WN-LMF 1.2 and 1.3 ([#200])\n\n### Fixed\n\n* Don't assume 'id' on form elements in WN-LMF 1.2+ ([#207])\n\n### Maintenance\n\n* Switched packaging from flit to Hatch ([#201])\n* Updated dependencies, CI warnings, old workarounds ([#203])\n* Change CI publishing to OIDC trusted publishing\n\n\n## [v0.9.5]\n\n**Release date: 2023-12-05**\n\n### Python Support\n\n* Removed support for Python 3.7 ([#191])\n* Added support for Python 3.12 ([#191])\n\n### Index\n\n* Added `oewn:2023` ([#194])\n\n\n## [v0.9.4]\n\n**Release date: 2023-05-07**\n\n### Index\n\n* Added `oewn:2022` ([#181])\n\n\n## [v0.9.3]\n\n**Release date: 2022-11-13**\n\n### Python Support\n\n* Removed support for Python 3.6\n* Added support for Python 3.11\n\n### Fixed\n\n* `wn.Synset.relations()` no longer raises a `KeyError` when no\n  relation types are given and relations are found via ILI ([#177])\n\n\n## [v0.9.2]\n\n**Release date: 2022-10-02**\n\n### Provisional Changes\n\n* The `editor` installation extra installs the `wn-editor`\n  package. This is not a normal way of using extras, as it installs a\n  dependent and not a dependency, and may be removed. ([#17])\n\n### Fixed\n\n* `wn.download()` no longer uses Python features unavailable in 3.7\n  when recovering from download errors\n* `Sense.synset()` now creates a `Synset` properly linked to the same\n  `Wordnet` object ([#157], [#168])\n* `Sense.word()` now creates a `Word` properly linked to the same\n  `Wordnet` object ([#157])\n* `Synset.relations()` uses the correct relation type for those\n  obtained from expand lexicons ([#169])\n\n\n## [v0.9.1]\n\n**Release date: 2021-11-23**\n\n### Fixed\n\n* Correctly add syntactic behaviours for WN-LMF 1.1 lexicons ([#156])\n\n\n## [v0.9.0]\n\n**Release date: 2021-11-17**\n\n### Added\n\n* `wn.constants.REVERSE_RELATIONS`\n* `wn.validate` module ([#143])\n* `validate` subcommand ([#143])\n* `wn.Lexicon.describe()` ([#144])\n* `wn.Wordnet.describe()` ([#144])\n* `wn.ConfigurationError`\n* `wn.ProjectError`\n\n### Fixed\n\n* WN-LMF 1.0 Syntactic Behaviours with no `senses` are now assigned to\n  all senses in the lexical entry. If a WN-LMF 1.1 lexicon extension\n  puts Syntactic Behaviour elements on lexical entries (which it\n  shouldn't) it will only be assigned to senses and external senses\n  listed.\n* `wn.Form` now always hashes like `str`, so things like\n  `set.__contains__` works as expected.\n* `wn.download()` raises an exception on bad responses ([#147]])\n* Avoid returning duplicate matches when a lemmatizer is used ([#154])\n\n### Removed\n\n* `wn.lmf.dump()` no longer has the `version` parameter\n\n### Changed\n\n* `wn.lmf.load()`\n  - returns a dictionary for the resource instead of a\n    list of lexicons, now including the WN-LMF version, as below:\n    ```python\n    {\n        'lmf_version': '...',\n        'lexicons': [...]\n    }\n    ```\n  - returned lexicons are modeled with Python lists and dicts instead\n    of custom classes ([#80])\n* `wn.lmf.scan_lexicons()` only returns info about present lexicons,\n  not element counts ([#113])\n* Improper configurations (e.g., invalid data directory, malformed\n  index) now raise a `wn.ConfigurationError`\n* Attempting to get an unknown project or version now raises\n  `wn.ProjectError` instead of `wn.Error` or `KeyError`\n* Projects and versions in the index now take an `error` key. Calling\n  `wn.config.get_project_info()` on such an entry will raise\n  `wn.ProjectError`. Such entries may not also specify a url. The\n  entry can still be viewed without triggering the error via\n  `wn.config.index`. ([#146])\n* Project versions in the index may specify multiple, space-separated\n  URLs on the url key. If one fails, the next will be attempted when\n  downloading. ([#142])\n* `wn.config.get_project_info()` now returns a `resource_urls` key\n  mapped to a list of URLs instead of `resource_url` mapped to a\n  single URL. ([#142])\n* `wn.config.get_cache_path()` now only accepts URL arguments\n* The `lexicon` parameter in many functions now allows glob patterns\n  like `omw-*:1.4` ([#155])\n\n### Index\n\n* Added `oewn:2021` new ID, previously `ewn` ([#152])\n* Added `own`, `own-pt`, and `own-en` ([#97])\n* Added `odenet:1.4`\n* Added `omw:1.4`, including `omw-en`, formerly `pwn:3.0` ([#152])\n* Added `omw-en31:1.4`, formerly `pwn:3.1` ([#152])\n* Removed `omw:1.3`, `pwn:3.0`, and `pwn:3.1` ([#152])\n* Added `kurdnet:1.0` ([#140])\n\n\n## [v0.8.3]\n\n**Release date: 2021-11-03**\n\n### Fixed\n\n* `wn.lmf` now serialized DC and non-DC metadata correctly ([#148])\n\n\n## [v0.8.2]\n\n**Release date: 2021-11-01**\n\nThis release only resolves some dependency issues with the previous\nrelease.\n\n\n## [v0.8.1]\n\n**Release date: 2021-10-29**\n\nNote: the release on PyPI was yanked because a dependency was not\nspecified properly.\n\n### Fixed\n\n* `wn.lmf` uses `https://` for the `dc` namespace instead of\n  `http://`, following the DTD\n\n\n## [v0.8.0]\n\n**Release date: 2021-07-07**\n\n### Added\n\n* `wn.ic` module ([#40]\n* `wn.taxonomy` module ([#125])\n* `wn.similarity.res` Resnik similarity ([#122])\n* `wn.similarity.jcn` Jiang-Conrath similarity ([#123])\n* `wn.similarity.lin` Lin similarity ([#124])\n* `wn.util.synset_id_formatter` ([#119])\n\n### Changed\n\n* Taxonomy methods on `wn.Synset` are moved to `wn.taxonomy`, but\n  shortcut methods remain for compatibility ([#125]).\n* Similarity metrics in `wn.similarity` now raise an error when\n  synsets come from different parts of speech.\n\n\n## [v0.7.0]\n\n**Release date: 2021-06-09**\n\n### Added\n\n* Support for approximate word searches; on by default, configurable\n  only by instantiating a `wn.Wordnet` object ([#105])\n* `wn.morphy` ([#19])\n* `wn.Wordnet.lemmatizer` attribute ([#8])\n* `wn.web` ([#116])\n* `wn.Sense.relations()` ([#82])\n* `wn.Synset.relations()` ([#82])\n\n### Changed\n\n* `wn.lmf.load()` now takes a `progress_handler` parameter ([#46])\n* `wn.lmf.scan_lexicons()` no longer returns sets of relation types or\n  lexfiles; `wn.add()` now gets these from loaded lexicons instead\n* `wn.util.ProgressHandler`\n  - Now has a `refresh_interval` parameter; updates only trigger a\n    refresh after the counter hits the threshold set by the interval\n  - The `update()` method now takes a `force` parameter to trigger a\n    refresh regardless of the refresh interval\n* `wn.Wordnet`\n  - Initialization now takes a `normalizer` parameter ([#105])\n  - Initialization now takes a `lemmatizer` parameter ([#8])\n  - Initialization now takes a `search_all_forms` parameter ([#115])\n  - `Wordnet.words()`, `Wordnet.senses()` and `Wordnet.synsets()` now\n    use any specified lemmatization or normalization functions to\n    expand queries on word forms ([#105])\n\n### Fixed\n\n* `wn.Synset.ili` for proposed ILIs now works again (#117)\n\n\n## [v0.6.2]\n\n**Release date: 2021-03-22**\n\n### Fixed\n\n* Disable `sqlite3` progress reporting after `wn.remove()` ([#108])\n\n\n## [v0.6.1]\n\n**Release date: 2021-03-05**\n\n### Added\n\n* `wn.DatabaseError` as a more specific error type for schema changes\n  ([#106])\n\n\n## [v0.6.0]\n\n**Release date: 2021-03-04**\n\n**Notice:** This release introduces backwards-incompatible changes to\nthe schema that require users upgrading from previous versions to\nrebuild their database.\n\n### Added\n\n* For WN-LMF 1.0 support ([#65])\n  - `wn.Sense.frames()`\n  - `wn.Sense.adjposition()`\n  - `wn.Tag`\n  - `wn.Form.tags()`\n  - `wn.Count`\n  - `wn.Sense.counts()`\n* For ILI modeling ([#23])\n  - `wn.ILI` class\n  - `wn.Wordnet.ili()`\n  - `wn.Wordnet.ilis()`\n  - `wn.ili()`\n  - `wn.ilis()`\n  - `wn.project.Package.type` property\n  - Index entries of different types; default is `'wordnet'`, `'ili'`\n    is also available\n  - Support for detecting and loading ILI tab-separated-value exports;\n    not directly accessible through the public API at this time\n  - Support for adding ILI resources to the database\n  - A CILI index entry ([#23])\n* `wn.lmf` WN-LMF 1.1 support ([#7])\n  - `<Requires>`\n  - `<LexiconExtension>`, `<Extends>`, `<ExternalSynset>`,\n    `<ExternalLexicalEntry>`, `<ExternalSense>`,\n    `<ExternalLemma>`, `<ExternalForm>`\n  - `subcat` on `<Sense>`\n  - `members` on `<Synset>`\n  - `lexfile` on `<Synset>`\n  - `<Pronunciation>`\n  - `id` on `<Form>`\n  - New relations\n* Other WN-LMF 1.1 support\n  - `wn.Lexicon.requires()`\n  - `wn.Lexicon.extends()` ([#99])\n  - `wn.Lexicon.extensions()` ([#99])\n  - `wn.Pronunciation` ([#7])\n  - `wn.Form.pronunciations()` ([#7])\n  - `wn.Form.id` ([#7])\n  - `wn.Synset.lexfile()`\n* `wn.constants.SENSE_SYNSET_RELATIONS`\n* `wn.WnWarning` (related to [#92])\n* `wn.Lexicon.modified()` ([#17])\n\n### Fixed\n\n* Adding a wordnet with sense relations with invalid target IDs now\n  raises an error instead of ignoring the relation.\n* Detect LMF-vs-CILI projects even when files are uncompressed ([#104])\n\n### Changed\n\n* WN-LMF 1.0 entities now modeled and exported to XML ([#65]):\n  - Syntactic behaviour ([#65])\n  - Adjpositions ([#65])\n  - Form tags\n  - Sense counts\n  - Definition source senses\n  - ILI definitions\n* WN-LMF 1.1 entities now modeled and exported to XML ([#89]):\n  - Lexicon requirements and extensions ([#99])\n  - Form pronunciations\n  - Lexicographer files via the `lexfile` attribute\n  - Form ids\n* `wn.Synset.ili` now returns an `ILI` object\n* `wn.remove()` now takes a `progess_handler` parameter\n* `wn.util.ProgressBar` uses a simpler formatting string with two new\n  computed variables\n* `wn.project.is_package_directory()` and\n  `wn.project.is_collection_directory()` now detect\n  packages/collection with ILI resource files ([#23])\n* `wn.project.iterpackages()` now includes ILI packages\n* `wn.Wordnet` now sets the default `expand` value to a lexicon's\n  dependencies if they are specified (related to [#92])\n\n### Schema\n\n* General changes:\n  - Parts of speech are stored as text\n  - Added indexes and `ON DELETE` actions to speed up `wn.remove()`\n  - All extendable tables are now linked to their lexicon ([#91])\n  - Added rowid to tables with metadata\n  - Preemptively added a `modified` column to `lexicons` table ([#17])\n  - Preemptively added a `normalized_form` column to `forms` ([#105])\n  - Relation type tables are combined for synsets and senses ([#75])\n* ILI-related changes ([#23]):\n  - ILIs now have an integer rowid and a status\n  - Proposed ILIs also have an integer rowid for metadata access\n  - Added a table for ILI statuses\n* WN-LMF 1.0 changes ([#65]):\n  - SyntacticBehaviour (previously unused) no longer requires an ID and\n    does not use it in the primary key\n  - Added table for adjposition values\n  - Added source-sense to definitions table\n* WN-LMF 1.1 changes ([#7], [#89]):\n  - Added a table for lexicon dependencies\n  - Added a table for lexicon extensions ([#99])\n  - Added `logo` column to `lexicons` table\n  - Added a `synset_rank` column to `senses` table\n  - Added a `pronunciations` table\n  - Added column for lexicographer files to the `synsets` table\n  - Added a table for lexicographer file names\n  - Added an `id` column to `forms` table\n\n\n## [v0.5.1]\n\n**Release date: 2021-01-29**\n\n### Fixed\n\n* `wn.lmf` specifies `utf-8` when opening files ([#95])\n* `wn.lmf.dump()` casts attribute values to strings\n\n\n## [v0.5.0]\n\n**Release date: 2021-01-28**\n\n### Added\n\n* `wn.Lexicon.specifier()`\n* `wn.config.allow_multithreading` ([#86])\n* `wn.util` module for public-API utilities\n* `wn.util.ProgressHandler` ([#87])\n* `wn.util.ProgressBar` ([#87])\n\n### Removed\n\n* `wn.Wordnet.lang`\n\n### Changed\n\n* `wn.Synset.get_related()` does same-lexicon traversals first, then\n  ILI expansions ([#90])\n* `wn.Synset.get_related()` only targets the source synset lexicon in\n  default mode ([#90], [#92])\n* `wn.Wordnet` has a \"default mode\", when no lexicon or language is\n  selected, which searches any lexicon but relation traversals only\n  target the lexicon of the source synset ([#92]) is used for the\n  lexicon id ([#92])\n* `wn.Wordnet` has an empty expand set when a lexicon or language is\n  specified and no expand set is specified ([#92])\n* `wn.Wordnet` now allows versions in lexicon specifiers when the id\n  is `*` (e.g., `*:1.3+omw`)\n* `wn.Wordnet` class signature has `lexicon` first, `lang` is\n  keyword-only ([#93])\n* `lang` and `lexicon` parameters are keyword-only on `wn.lexicons()`,\n  `wn.word()`, `wn.words()`, `wn.sense()`, `wn.senses()`,\n  `wn.synset()`, `wn.synsets()`, and the `translate()` methods of\n  `wn.Word`, `wn.Sense`, and `wn.Synset` ([#93])\n\n\n## [v0.4.1]\n\n**Release date: 2021-01-19**\n\n### Removed\n\n* `wn.config.database_filename` (only `wn.config.data_directory` is\n  configurable now)\n\n### Changed\n\n* Schema validation is now done when creating a new connection,\n  instead of on import of `wn`\n* One connection is shared per database path, rather than storing\n  connections on the modeling classes ([#81])\n\n### Fixed\n\n* More robustly check for LMF validity ([#83])\n\n\n## [v0.4.0]\n\n**Release date: 2020-12-29**\n\n### Added\n\n* `wn.export()` to export lexicon(s) from the database ([#15])\n* `wn.lmf.dump()` to dump WN-LMF lexicons to disk ([#15])\n* `metadata` method on `wn.Word`, `wn.Sense`, and `wn.Synset`\n* `lexicalized` method on `wn.Sense` and `wn.Synset`\n* `wn.Form` class ([#79])\n* `--verbose` / `-v` option for the command-line interface ([#71])\n\n### Changed\n\n* `wn.Lexicon.metadata` is now a method\n* `wn.Word.lemma()` returns a `wn.Form` object ([#79])\n* `wn.Word.forms()` returns a list of `wn.Form` objects ([#79])\n* `wn.project.iterpackages()` raises `wn.Error` on decompression\n  problems ([#77])\n* `wn.lmf.LMFError` now inherits from `wn.Error`\n* `wn.lmf.scan_lexicons()` raises `LMFError` on XML parsing errors\n  ([#77])\n* `wn.download()` reraises caught `wn.Error` with more informative\n  message ([#77])\n* `wn.add()` improve error message when lexicons are already added\n  ([#77])\n* Basic logging added for `wn.download()` and `wn.add()` ([#71])\n* `Synset.get_related()` and `Sense.get_related()` may take a `'*'`\n  parameter to get all relations\n* `wn.Wordnet` objects keep an open connection to the database ([#81])\n\n### Fixed\n\n* `wn.projects.iterpackages()` tries harder to prevent potential race\n  conditions when reading temporary files ([#76])\n* `wn.Lexicon.metadata` now returns a dictionary ([#78])\n\n\n## [v0.3.0]\n\n**Release date: 2020-12-16**\n\n### Added\n\n* `add` parameter to `wn.download()` ([#73])\n* `--no-add` option to `wn download` command ([#73])\n* `progress_handler` parameter to `wn.download()` ([#70])\n* `progress_handler` parameter to `wn.add()` ([#70])\n\n### Fixed\n\n* `Synset.shortest_path()` no longer includes starting node ([#63])\n* `Synset.closure()`/`Sense.closure()` may take multiple relations\n  ([#74])\n* `Synset.hypernym_paths(simulate_root=True)` returns just the fake\n  root node if no paths were found (related to [#64])\n* `wn.lexicons()` returns empty list on unknown lang/lexicon ([#59])\n\n### Changed\n\n* Renamed `lgcode` parameter to `lang` throughout ([#66])\n* Renamed `Wordnet.lgcode` property to `Wordnet.lang` ([#66])\n* Renamed `--lgcode` command-line option to `--lang` ([#66])\n* Use better-performing/less-safe database options when adding\n  lexicons ([#69])\n\n\n## [v0.2.0]\n\n**Release date: 2020-12-02**\n\n### Added\n\n* `wn.config.get_cache_path()` returns the path of a cached resource\n* `wn.projects()` returns the info about known projects ([#60])\n* `projects` subcommand to command-line interface ([#60])\n* Open German WordNet 1.3 to the index\n\n### Changed\n\n* On import, Wn now raises an error if the database has an outdated\n  schema ([#61])\n* `wn.config.get_project_info()` now includes a `cache` key\n* Output of `lexicons` CLI subcommand now tab-delimited\n\n\n## [v0.1.1]\n\n**Release date: 2020-11-26**\n\n### Added\n\n* Command-line interface for downloading and listing lexicons ([#47])\n\n### Fixed\n\n* Cast `pathlib.Path` to `str` for `sqlite3.connect()` ([#58])\n* Pass `lgcode` to `Wordnet` object in `wn.synset()`\n\n\n## [v0.1.0]\n\n**Release date: 2020-11-25**\n\nThis is the initial release of the new Wn library. On PyPI it replaces\nthe https://github.com/nltk/wordnet/ code which had been effectively\nabandoned, but this is an entirely new codebase.\n\n\n[v1.1.0]: ../../releases/tag/v1.1.0\n[v1.0.0]: ../../releases/tag/v1.0.0\n[v0.14.0]: ../../releases/tag/v0.14.0\n[v0.13.0]: ../../releases/tag/v0.13.0\n[v0.12.0]: ../../releases/tag/v0.12.0\n[v0.11.0]: ../../releases/tag/v0.11.0\n[v0.10.1]: ../../releases/tag/v0.10.1\n[v0.10.0]: ../../releases/tag/v0.10.0\n[v0.9.5]: ../../releases/tag/v0.9.5\n[v0.9.4]: ../../releases/tag/v0.9.4\n[v0.9.3]: ../../releases/tag/v0.9.3\n[v0.9.2]: ../../releases/tag/v0.9.2\n[v0.9.1]: ../../releases/tag/v0.9.1\n[v0.9.0]: ../../releases/tag/v0.9.0\n[v0.8.3]: ../../releases/tag/v0.8.3\n[v0.8.2]: ../../releases/tag/v0.8.2\n[v0.8.1]: ../../releases/tag/v0.8.1\n[v0.8.0]: ../../releases/tag/v0.8.0\n[v0.7.0]: ../../releases/tag/v0.7.0\n[v0.6.2]: ../../releases/tag/v0.6.2\n[v0.6.1]: ../../releases/tag/v0.6.1\n[v0.6.0]: ../../releases/tag/v0.6.0\n[v0.5.1]: ../../releases/tag/v0.5.1\n[v0.5.0]: ../../releases/tag/v0.5.0\n[v0.4.1]: ../../releases/tag/v0.4.1\n[v0.4.0]: ../../releases/tag/v0.4.0\n[v0.3.0]: ../../releases/tag/v0.3.0\n[v0.2.0]: ../../releases/tag/v0.2.0\n[v0.1.1]: ../../releases/tag/v0.1.1\n[v0.1.0]: ../../releases/tag/v0.1.0\n[unreleased]: ../../tree/main\n\n[#7]: https://github.com/goodmami/wn/issues/7\n[#8]: https://github.com/goodmami/wn/issues/8\n[#15]: https://github.com/goodmami/wn/issues/15\n[#17]: https://github.com/goodmami/wn/issues/17\n[#19]: https://github.com/goodmami/wn/issues/19\n[#23]: https://github.com/goodmami/wn/issues/23\n[#40]: https://github.com/goodmami/wn/issues/40\n[#46]: https://github.com/goodmami/wn/issues/46\n[#47]: https://github.com/goodmami/wn/issues/47\n[#53]: https://github.com/goodmami/wn/issues/53\n[#55]: https://github.com/goodmami/wn/issues/55\n[#58]: https://github.com/goodmami/wn/issues/58\n[#59]: https://github.com/goodmami/wn/issues/59\n[#60]: https://github.com/goodmami/wn/issues/60\n[#61]: https://github.com/goodmami/wn/issues/61\n[#63]: https://github.com/goodmami/wn/issues/63\n[#64]: https://github.com/goodmami/wn/issues/64\n[#65]: https://github.com/goodmami/wn/issues/65\n[#66]: https://github.com/goodmami/wn/issues/66\n[#69]: https://github.com/goodmami/wn/issues/69\n[#70]: https://github.com/goodmami/wn/issues/70\n[#71]: https://github.com/goodmami/wn/issues/71\n[#73]: https://github.com/goodmami/wn/issues/73\n[#74]: https://github.com/goodmami/wn/issues/74\n[#75]: https://github.com/goodmami/wn/issues/75\n[#76]: https://github.com/goodmami/wn/issues/76\n[#77]: https://github.com/goodmami/wn/issues/77\n[#78]: https://github.com/goodmami/wn/issues/78\n[#79]: https://github.com/goodmami/wn/issues/79\n[#80]: https://github.com/goodmami/wn/issues/80\n[#81]: https://github.com/goodmami/wn/issues/81\n[#82]: https://github.com/goodmami/wn/issues/82\n[#83]: https://github.com/goodmami/wn/issues/83\n[#86]: https://github.com/goodmami/wn/issues/86\n[#87]: https://github.com/goodmami/wn/issues/87\n[#89]: https://github.com/goodmami/wn/issues/89\n[#90]: https://github.com/goodmami/wn/issues/90\n[#91]: https://github.com/goodmami/wn/issues/91\n[#92]: https://github.com/goodmami/wn/issues/92\n[#93]: https://github.com/goodmami/wn/issues/93\n[#95]: https://github.com/goodmami/wn/issues/95\n[#97]: https://github.com/goodmami/wn/issues/97\n[#98]: https://github.com/goodmami/wn/issues/98\n[#99]: https://github.com/goodmami/wn/issues/99\n[#103]: https://github.com/goodmami/wn/issues/103\n[#104]: https://github.com/goodmami/wn/issues/104\n[#105]: https://github.com/goodmami/wn/issues/105\n[#106]: https://github.com/goodmami/wn/issues/106\n[#108]: https://github.com/goodmami/wn/issues/108\n[#113]: https://github.com/goodmami/wn/issues/113\n[#115]: https://github.com/goodmami/wn/issues/115\n[#116]: https://github.com/goodmami/wn/issues/116\n[#117]: https://github.com/goodmami/wn/issues/117\n[#119]: https://github.com/goodmami/wn/issues/119\n[#122]: https://github.com/goodmami/wn/issues/122\n[#123]: https://github.com/goodmami/wn/issues/123\n[#124]: https://github.com/goodmami/wn/issues/124\n[#125]: https://github.com/goodmami/wn/issues/125\n[#140]: https://github.com/goodmami/wn/issues/140\n[#142]: https://github.com/goodmami/wn/issues/142\n[#143]: https://github.com/goodmami/wn/issues/143\n[#144]: https://github.com/goodmami/wn/issues/144\n[#146]: https://github.com/goodmami/wn/issues/146\n[#147]: https://github.com/goodmami/wn/issues/147\n[#148]: https://github.com/goodmami/wn/issues/148\n[#151]: https://github.com/goodmami/wn/issues/151\n[#152]: https://github.com/goodmami/wn/issues/152\n[#154]: https://github.com/goodmami/wn/issues/154\n[#155]: https://github.com/goodmami/wn/issues/155\n[#156]: https://github.com/goodmami/wn/issues/156\n[#157]: https://github.com/goodmami/wn/issues/157\n[#167]: https://github.com/goodmami/wn/issues/167\n[#168]: https://github.com/goodmami/wn/issues/168\n[#169]: https://github.com/goodmami/wn/issues/169\n[#177]: https://github.com/goodmami/wn/issues/177\n[#181]: https://github.com/goodmami/wn/issues/181\n[#191]: https://github.com/goodmami/wn/issues/191\n[#194]: https://github.com/goodmami/wn/issues/194\n[#200]: https://github.com/goodmami/wn/issues/200\n[#201]: https://github.com/goodmami/wn/issues/201\n[#202]: https://github.com/goodmami/wn/issues/202\n[#203]: https://github.com/goodmami/wn/issues/203\n[#207]: https://github.com/goodmami/wn/issues/207\n[#211]: https://github.com/goodmami/wn/issues/211\n[#213]: https://github.com/goodmami/wn/issues/213\n[#214]: https://github.com/goodmami/wn/issues/214\n[#215]: https://github.com/goodmami/wn/issues/215\n[#216]: https://github.com/goodmami/wn/issues/216\n[#221]: https://github.com/goodmami/wn/issues/221\n[#226]: https://github.com/goodmami/wn/issues/226\n[#228]: https://github.com/goodmami/wn/issues/228\n[#233]: https://github.com/goodmami/wn/issues/233\n[#234]: https://github.com/goodmami/wn/issues/234\n[#235]: https://github.com/goodmami/wn/issues/235\n[#238]: https://github.com/goodmami/wn/issues/238\n[#241]: https://github.com/goodmami/wn/issues/241\n[#246]: https://github.com/goodmami/wn/issues/246\n[#248]: https://github.com/goodmami/wn/issues/248\n[#250]: https://github.com/goodmami/wn/issues/250\n[#255]: https://github.com/goodmami/wn/issues/255\n[#260]: https://github.com/goodmami/wn/issues/260\n[#263]: https://github.com/goodmami/wn/issues/263\n[#266]: https://github.com/goodmami/wn/issues/266\n[#268]: https://github.com/goodmami/wn/pull/268\n[#271]: https://github.com/goodmami/wn/issues/271\n[#277]: https://github.com/goodmami/wn/issues/277\n[#285]: https://github.com/goodmami/wn/issues/285\n[#286]: https://github.com/goodmami/wn/issues/286\n[#291]: https://github.com/goodmami/wn/issues/291\n[#292]: https://github.com/goodmami/wn/issues/292\n[#294]: https://github.com/goodmami/wn/issues/294\n[#295]: https://github.com/goodmami/wn/issues/295\n[#300]: https://github.com/goodmami/wn/issues/300\n[#301]: https://github.com/goodmami/wn/issues/301\n[#302]: https://github.com/goodmami/wn/issues/302\n[#303]: https://github.com/goodmami/wn/issues/303\n[#313]: https://github.com/goodmami/wn/issues/313\n[#314]: https://github.com/goodmami/wn/issues/314\n[#316]: https://github.com/goodmami/wn/issues/316\n[#319]: https://github.com/goodmami/wn/issues/319\n"
  },
  {
    "path": "CITATION.cff",
    "content": "cff-version: 1.2.0\ntitle: Wn\nmessage: >-\n  Please cite this software using the metadata from\n  'preferred-citation'.\ntype: software\nauthors:\n  - given-names: Michael Wayne\n    family-names: Goodman\n    email: goodman.m.w@gmail.com\n    orcid: 'https://orcid.org/0000-0002-2896-5141'\n  - given-names: Francis\n    family-names: Bond\n    email: bond@ieee.org\n    orcid: 'https://orcid.org/0000-0003-4973-8068'\nrepository-code: 'https://github.com/goodmami/wn/'\npreferred-citation:\n  type: conference-paper\n  authors:\n  - given-names: Michael Wayne\n    family-names: Goodman\n    email: goodmami@uw.edu\n    orcid: 'https://orcid.org/0000-0002-2896-5141'\n    affiliation: Nanyang Technological University\n  - given-names: Francis\n    family-names: Bond\n    email: bond@ieee.org\n    orcid: 'https://orcid.org/0000-0003-4973-8068'\n    affiliation: Nanyang Technological University\n  start: 100  # First page number\n  end: 107  # Last page number\n  conference:\n      name: \"Proceedings of the 11th Global Wordnet Conference\"\n  title: \"Intrinsically Interlingual: The Wn Python Library for Wordnets\"\n  year: 2021\n  month: 1\n  url: 'https://aclanthology.org/2021.gwc-1.12/'\n  publisher: \"Global Wordnet Association\"\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to Wn\n\nThanks for helping to make Wn better!\n\n**Quick Links:**\n\n- [Report a bug or request a features](https://github.com/goodmami/wn/issues/new)\n- [Ask a question](https://github.com/goodmami/wn/discussions)\n- [View documentation](https://wn.readthedocs.io/)\n\n**Developer Information:**\n\n- Versioning scheme: [Semantic Versioning](https://semver.org/)\n- Branching scheme: [GitHub Flow](https://guides.github.com/introduction/flow/)\n- Changelog: [keep a changelog](https://keepachangelog.com/en/1.0.0/)\n- Documentation framework: [Sphinx](https://www.sphinx-doc.org/)\n- Docstring style: [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) (via [sphinx.ext.napoleon](https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html))\n- Unit/regression testing: [pytest](https://pytest.org/)\n- Benchmarking: [pytest-benchmark](https://pytest-benchmark.readthedocs.io/)\n- Packaging framework: [Hatch](https://hatch.pypa.io/)\n- Coding style: [PEP-8](https://www.python.org/dev/peps/pep-0008/) (via [Ruff](https://beta.ruff.rs/docs/))\n- Type checking: [Mypy](http://mypy-lang.org/)\n\n\n## Get Help\n\nConfused about wordnets in general? See the [Global Wordnet\nAssociation Documentation](https://globalwordnet.github.io/gwadoc/)\n\nConfused about using Wn or wish to share some tips? [Start a\ndiscussion](https://github.com/goodmami/wn/discussions)\n\nEncountering a problem with Wn or wish to propose a new features? [Raise an\nissue](https://github.com/goodmami/wn/issues/new)\n\n\n## Report a Bug\n\nWhen reporting a bug, please provide enough information for someone to\nreproduce the problem. This might include the version of Python you're\nrunning, the version of Wn you have installed, the wordnet lexicons\nyou have installed, and possibly the platform (Linux, Windows, macOS)\nyou're on. Please give a minimal working example that illustrates the\nproblem. For example:\n\n> I'm using Wn 0.9.5 with Python 3.11 on Linux and [description of\n> problem...]. Here's what I have tried:\n>\n> ```pycon\n> >>> import wn\n> >>> # some code\n> ... # some result or error\n> ```\n\n\n## Request a Feature\n\nIf there's a feature that you think would make a good addition to Wn,\nraise an issue describing what the feature is and what problems it\nwould address.\n\n## Guidelines for Contributing\n\nSee the \"developer information\" above for a brief description of\nguidelines and conventions used in Wn. If you have a fix, please\nsubmit a pull request to the `main` branch. In general, every pull\nrequest should have an associated issue.\n\nDevelopers should run and test Wn locally from source using\n[Hatch](https://hatch.pypa.io/). Hatch may be installed\nsystem-wide or within a virtual environment:\n\n```bash\n$ pip install hatch\n```\n\nYou can then use the `hatch` commands like the following:\n\n```console\n$ hatch shell           # activate a Wn virtual environment\n$ hatch fmt --check     # lint the code and check code style\n$ hatch run mypy:check  # type check with mypy\n$ hatch test            # run unit tests\n$ hatch test bench      # run benchmarks\n$ hatch build           # build a source distribution and wheel\n$ hatch publish         # publish build artifacts to PyPI\n```\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2020 Michael Wayne Goodman\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "\n\n<p align=\"center\">\n  <img src=\"https://raw.githubusercontent.com/goodmami/wn/main/docs/_static/wn-logo.svg\" alt=\"Wn logo\">\n  <br>\n  <strong>a Python library for wordnets</strong>\n  <br>\n  <a href=\"https://pypi.org/project/wn/\"><img src=\"https://img.shields.io/pypi/v/wn.svg?style=flat-square\" alt=\"PyPI link\"></a>\n  <img src=\"https://img.shields.io/pypi/pyversions/wn.svg?style=flat-square\" alt=\"Python Support\">\n  <a href=\"https://github.com/goodmami/wn/actions?query=workflow%3A%22tests%22\"><img src=\"https://github.com/goodmami/wn/workflows/tests/badge.svg\" alt=\"tests\"></a>\n  <a href=\"https://wn.readthedocs.io/en/latest/?badge=latest\"><img src=\"https://readthedocs.org/projects/wn/badge/?version=latest&style=flat-square\" alt=\"Documentation Status\"></a>\n  <br>\n  <a href=\"https://github.com/goodmami/wn#available-wordnets\">Available Wordnets</a>\n  | <a href=\"https://wn.readthedocs.io/\">Documentation</a>\n  | <a href=\"https://wn.readthedocs.io/en/latest/faq.html\">FAQ</a>\n  | <a href=\"https://wn.readthedocs.io/en/latest/guides/nltk-migration.html\">Migrating from NLTK</a>\n  |  <a href=\"https://github.com/goodmami/wn#citation\">Citation</a>\n</p>\n\n---\n\nWn is a Python library for exploring information in wordnets.\n\n## Installation\n\nInstall it from PyPI using **pip**:\n\n```sh\npip install wn\n```\n\nor **uv**:\n\n```\nuv add wn\n```\n\n> [!IMPORTANT]\n> Existing users of Wn may encounter an error about an incompatible database schema.\n> The remedy is to rebuild the database. There is a new function to help with this:\n> ```pycon\n> >>> wn.reset_database(rebuild=True)  # re-add any indexed lexicons\n> ```\n> or\n> ```pycon\n> >>> wn.reset_database()  # initialize without re-adding; start from scratch\n> ```\n\n## Getting Started\n\nFirst, download some data:\n\n```sh\npython -m wn download oewn:2025+  # the Open English WordNet 2025+\n```\n\nNow start exploring:\n\n```python\n>>> import wn\n>>> en = wn.Wordnet('oewn:2025+')       # Create Wordnet object to query\n>>> ss = en.synsets('win', pos='v')[0]  # Get the first synset for 'win'\n>>> ss.definition()                     # Get the synset's definition\n'be the winner in a contest or competition; be victorious'\n```\n\n## Features\n\n- Multilingual by design; first-class support for wordnets in any language\n- Interlingual queries via the [Collaborative Interlingual Index](https://github.com/globalwordnet/cili/)\n- Six [similarity metrics](https://wn.readthedocs.io/en/latest/api/wn.similarity.html)\n- Functions for [exploring taxonomies](https://wn.readthedocs.io/en/latest/api/wn.taxonomy.html)\n- Support for [lemmatization] ([Morphy] for English is built-in) and unicode [normalization]\n- Full support of the [WN-LMF 1.4](https://globalwordnet.github.io/schemas/) format, including word pronunciations and lexicon extensions\n- SQL-based backend offers very fast startup and improved performance on many kinds of queries\n\n[lemmatization]: https://wn.readthedocs.io/en/latest/guides/lemmatization.html#lemmatization\n[normalization]: https://wn.readthedocs.io/en/latest/guides/lemmatization.html#normalization\n[Morphy]: https://wn.readthedocs.io/en/latest/api/wn.morphy.html\n\n\n## Available Wordnets\n\nAny WN-LMF-formatted wordnet can be added to Wn's database from a local\nfile or remote URL, but Wn also maintains an index (see\n[wn/index.toml](https://github.com/goodmami/wn/blob/main/wn/index.toml))\nof available projects, similar to a package manager for software, to aid\nin the discovery and downloading of new wordnets. The projects in this\nindex are listed below.\n\n### English Wordnets\n\nThere are several English wordnets available. In general it is\nrecommended to use the latest [Open English Wordnet], but if you have\nstricter compatibility needs for, e.g., experiment replicability, you\nmay try the [OMW English Wordnet based on WordNet 3.0] (compatible with\nthe Princeton WordNet 3.0 and with the [NLTK]), or [OpenWordnet-EN] (for\nuse with the Portuguese wordnet [OpenWordnet-PT]).\n\n| Name                                         | Specifier              | # Synsets | Notes |\n| -------------------------------------------- | ---------------------- | --------: | ----- |\n| [Open English WordNet]                       | `oewn:2025+`<br/> `oewn:2025`</br> `oewn:2024`<br/> `oewn:2023`<br/> `oewn:2022`<br/> `oewn:2021`<br/> `ewn:2020`<br/> `ewn:2019` | 120564<br/>107519<br/>120630<br/>120135<br/>120068<br/>120039<br/>120053<br/>117791 | ← Recommended<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp;<br/>&nbsp; |\n| [OMW English Wordnet based on WordNet 1.5]   | `omw-en15:2.0`   | 91591 |  |\n| [OMW English Wordnet based on WordNet 1.6]   | `omw-en16:2.0`   | 99642 |  |\n| [OMW English Wordnet based on WordNet 1.7]   | `omw-en17:2.0`   | 109377 |  |\n| [OMW English Wordnet based on WordNet 1.7.1] | `omw-en171:2.0`  | 111223 |  |\n| [OMW English Wordnet based on WordNet 2.0]   | `omw-en20:2.0`   | 115424 |  |\n| [OMW English Wordnet based on WordNet 2.1]   | `omw-en21:2.0`   | 117597 |  |\n| [OMW English Wordnet based on WordNet 3.0]   | `omw-en:2.0`</br> `omw-en:1.4` | 117659</br> 117659 | Included with `omw:2.0`<br/> Included with `omw:1.4` |\n| [OMW English Wordnet based on WordNet 3.1]   | `omw-en31:2.0`</br> `omw-en31:1.4` | 117791</br> 117791 |  |\n| [OpenWordnet-EN]                             | `own-en:1.0.0`   | 117659 | Included with `own:1.0.0` |\n\n[Open English WordNet]: https://en-word.net\n[Open Multilingual Wordnet]: https://github.com/omwn\n[OMW English Wordnet based on WordNet 1.5]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 1.6]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 1.7]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 1.7.1]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 2.0]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 2.1]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 3.0]: https://github.com/omwn/omw-data\n[OMW English Wordnet based on WordNet 3.1]: https://github.com/omwn/omw-data\n[OpenWordnet-EN]: https://github.com/own-pt/openWordnet-PT\n[OpenWordnet-PT]: https://github.com/own-pt/openWordnet-PT\n[NLTK]: https://www.nltk.org/\n\n### Other Wordnets and Collections\n\nThese are standalone non-English wordnets and collections. The wordnets\nof each collection are listed further down.\n\n| Name                                       | Specifier                     | # Synsets       | Language         |\n| ------------------------------------------ | ----------------------------- | --------------: | ---------------- |\n| [Open Multilingual Wordnet]                | `omw:1.4`                     | n/a             | multiple [[mul]] |\n| [Open German WordNet]                      | `odenet:1.4`<br/>`odenet:1.3` | 36268<br/>36159 | German [de]      |\n| [Open Wordnets for Portuguese and English] | `own:1.0.0`                   | n/a             | multiple [[mul]] |\n| [KurdNet]                                  | `kurdnet:1.0`                 |            2144 | Kurdish [ckb]    |\n\n[Open English WordNet]: https://github.com/globalwordnet/english-wordnet\n[Open Multilingual Wordnet]: https://github.com/omwn\n[OMW English Wordnet based on WordNet 3.0]: https://github.com/omwn\n[OMW English Wordnet based on WordNet 3.1]: https://github.com/omwn\n[Open German WordNet]: https://github.com/hdaSprachtechnologie/odenet\n[Open Wordnets for Portuguese and English]: https://github.com/own-pt\n[mul]: https://iso639-3.sil.org/code/mul\n[KurdNet]: https://sinaahmadi.github.io/resources/kurdnet.html\n\n### Open Multilingual Wordnet (OMW) Collection\n\nThe *Open Multilingual Wordnet* collection (`omw:1.4`) installs the\nfollowing lexicons (from\n[here](https://github.com/omwn/omw-data/releases/tag/v1.4)) which can\nalso be downloaded and installed independently:\n\n| Name                                     | Specifier                        | # Synsets          | Language                         |\n| ---------------------------------------- | -------------------------------- | -----------------: | -------------------------------- |\n| Albanet                                  | `omw-sq:2.0`<br/> `omw-sq:1.4`   |     4679<br/> 4675 | Albanian [sq]                    |\n| Arabic WordNet (AWN v2)                  | `omw-arb:2.0`<br/> `omw-arb:1.4` |     9916<br/> 9916 | Arabic [arb]                     |\n| BulTreeBank Wordnet (BTB-WN)             | `omw-bg:2.0`<br/> `omw-bg:1.4`   |     4959<br/> 4959 | Bulgarian [bg]                   |\n| Chinese Open Wordnet                     | `omw-cmn:2.0`<br/> `omw-cmn:1.4` |   42300<br/> 42312 | Mandarin (Simplified) [cmn-Hans] |\n| Croatian Wordnet                         | `omw-hr:2.0`<br/> `omw-hr:1.4`   |   23115<br/> 23120 | Croatian [hr]                    |\n| DanNet                                   | `omw-da:2.0`<br/> `omw-da:1.4`   |     4476<br/> 4476 | Danish [da]                      |\n| FinnWordNet                              | `omw-fi:2.0`<br/> `omw-fi:1.4`   | 116763<br/> 116763 | Finnish [fi]                     |\n| Greek Wordnet                            | `omw-el:2.0`<br/> `omw-el:1.4`   |   18113<br/> 18049 | Greek [el]                       |\n| Hebrew Wordnet                           | `omw-he:2.0`<br/> `omw-he:1.4`   |     5448<br/> 5448 | Hebrew [he]                      |\n| IceWordNet                               | `omw-is:2.0`<br/> `omw-is:1.4`   |     4951<br/> 4951 | Icelandic [is]                   |\n| Italian Wordnet                          | `omw-iwn:2.0`<br/> `omw-iwn:1.4` |   15563<br/> 15563 | Italian [it]                     |\n| Japanese Wordnet                         | `omw-ja:2.0`<br/> `omw-ja:1.4`   |  117659<br/> 57184 | Japanese [ja]                    |\n| Lithuanian  WordNet                      | `omw-lt:2.0`<br/> `omw-lt:1.4`   |     9462<br/> 9462 | Lithuanian [lt]                  |\n| Multilingual Central Repository          | `omw-ca:2.0`<br/> `omw-ca:1.4`   |   60765<br/> 45826 | Catalan [ca]                     |\n| Multilingual Central Repository          | `omw-eu:2.0`<br/> `omw-eu:1.4`   |   29420<br/> 29413 | Basque [eu]                      |\n| Multilingual Central Repository          | `omw-gl:2.0`<br/> `omw-gl:1.4`   |   34776<br/> 19312 | Galician [gl]                    |\n| Multilingual Central Repository          | `omw-es:2.0`<br/> `omw-es:1.4`   |   78948<br/> 38512 | Spanish [es]                     |\n| MultiWordNet                             | `omw-it:2.0`<br/> `omw-it:1.4`   |   35001<br/> 35001 | Italian [it]                     |\n| Norwegian Wordnet                        | `omw-nb:2.0`<br/> `omw-nb:1.4`   |     4455<br/> 4455 | Norwegian (Bokmål) [nb]          |\n| Norwegian Wordnet                        | `omw-nn:2.0`<br/> `omw-nn:1.4`   |     3671<br/> 3671 | Norwegian (Nynorsk) [nn]         |\n| OMW English Wordnet based on WordNet 3.0 | `omw-en:2.0`<br/> `omw-en:1.4`   | 117659<br/> 117659 | English [en]                     |\n| Open Dutch WordNet                       | `omw-nl:2.0`<br/> `omw-nl:1.4`   |   30177<br/> 30177 | Dutch [nl]                       |\n| OpenWN-PT                                | `omw-pt:2.0`<br/> `omw-pt:1.4`   |   43895<br/> 43895 | Portuguese [pt]                  |\n| plWordNet                                | `omw-pl:2.0`<br/> `omw-pl:1.4`   |   33826<br/> 33826 | Polish [pl]                      |\n| Romanian Wordnet                         | `omw-ro:2.0`<br/> `omw-ro:1.4`   |   58754<br/> 56026 | Romanian [ro]                    |\n| Slovak WordNet                           | `omw-sk:2.0`<br/> `omw-sk:1.4`   |   18507<br/> 18507 | Slovak [sk]                      |\n| sloWNet                                  | `omw-sl:2.0`<br/> `omw-sl:1.4`   |   42590<br/> 42583 | Slovenian [sl]                   |\n| Swedish (SALDO)                          | `omw-sv:2.0`<br/> `omw-sv:1.4`   |     6796<br/> 6796 | Swedish [sv]                     |\n| Thai Wordnet                             | `omw-th:2.0`<br/> `omw-th:1.4`   |   73350<br/> 73350 | Thai [th]                        |\n| WOLF (Wordnet Libre du Français)         | `omw-fr:2.0`<br/> `omw-fr:1.4`   |   59091<br/> 59091 | French [fr]                      |\n| Wordnet Bahasa                           | `omw-id:2.0`<br/> `omw-id:1.4`   |   46774<br/> 38085 | Indonesian [id]                  |\n| Wordnet Bahasa                           | `omw-zsm:2.0`<br/> `omw-zsm:1.4` |   36911<br/> 36911 | Malaysian [zsm]                  |\n\n### Open Wordnet (OWN) Collection\n\nThe *Open Wordnets for Portuguese and English* collection (`own:1.0.0`)\ninstalls the following lexicons (from\n[here](https://github.com/own-pt/openWordnet-PT/releases/tag/v1.0.0))\nwhich can also be downloaded and installed independently:\n\n| Name           | Specifier      | # Synsets | Language        |\n| -------------- | -------------- | --------: | --------------- |\n| OpenWordnet-PT | `own-pt:1.0.0` |     52670 | Portuguese [pt] |\n| OpenWordnet-EN | `own-en:1.0.0` |    117659 | English [en]    |\n\n### Collaborative Interlingual Index\n\nWhile not a wordnet, the [Collaborative Interlingual Index] (CILI)\nrepresents the interlingual backbone of many wordnets. Wn, including\ninterlingual queries, will function without CILI loaded, but adding it\nto the database makes available the full list of concepts, their status\n(active, deprecated, etc.), and their definitions.\n\n| Name                               | Specifier  | # Concepts |\n| ---------------------------------- | ---------- | ---------: |\n| [Collaborative Interlingual Index] | `cili:1.0` |     117659 |\n\n[Collaborative Interlingual Index]: https://github.com/globalwordnet/cili/\n\n\n## Changes to the Index\n\n### `ewn` → `oewn`\n\nThe 2021 version of the *Open English WordNet* (`oewn:2021`) has\nchanged its lexicon ID from `ewn` to `oewn`, so the index is updated\naccordingly. The previous versions are still available as `ewn:2019`\nand `ewn:2020`.\n\n### `pwn` → `omw-en`, `omw-en31`\n\nThe wordnet formerly called the *Princeton WordNet* (`pwn:3.0`,\n`pwn:3.1`) is now called the *OMW English Wordnet based on WordNet\n3.0* (`omw-en`) and the *OMW English Wordnet based on WordNet 3.1*\n(`omw-en31`). This is more accurate, as it is a OMW-produced\nderivative of the original WordNet data, and it also avoids license or\ntrademark issues.\n\n### `*wn` → `omw-*` for OMW wordnets\n\nAll OMW wordnets have changed their ID scheme from `...wn` to `omw-..` and the version no longer\nincludes `+omw` (e.g., `bulwn:1.3+omw` is now `omw-bg:1.4`).\n\n## Citation\nMichael Wayne Goodman and Francis Bond. 2021. [Intrinsically Interlingual: The Wn Python Library for Wordnets](https://aclanthology.org/2021.gwc-1.12/) In *Proceedings of the 11th Global Wordnet Conference*, pages 100–107, University of South Africa (UNISA). Global Wordnet Association.\n"
  },
  {
    "path": "bench/README.md",
    "content": "# Wn Benchmarking\n\nThis directory contains code and data for running benchmarks for\nWn. The benchmarks are implemented using\n[pytest-benchmarks](https://github.com/ionelmc/pytest-benchmark/), so\nthey are run using pytest as follows (from the top-level project\ndirectory):\n\n```console\n$ hatch test bench/  # run the benchmarks\n$ hatch test bench/ --benchmark-autosave  # run benchmarks and store results\n$ hatch test bench/ --benchmark-compare  # run benchmarks and compare to stored result\n$ hatch test -- --help  # get help on options (look for those prefixed `--benchmark-`)\n```\n\nNotes:\n\n* The tests are not exhaustive; when making a change that may affect\n  performance, consider making a new test if one doesn't exist\n  already. It would be helpful to check in the test to Git, but not\n  the benchmark results since those are dependent on the machine.\n* Benchmark the code before and after the changes. Store the results\n  locally for comparison.\n* Ensure the testing environment has a steady load (wait for\n  long-running processes to finish, close any active web browser tabs,\n  etc.) prior to and while running the test.\n* Expect high variance for IO-bound tasks.\n\n"
  },
  {
    "path": "bench/conftest.py",
    "content": "from collections.abc import Iterator\nfrom itertools import cycle, product\nfrom pathlib import Path\n\nimport pytest\n\nimport wn\nfrom wn import lmf\n\n\n@pytest.fixture\ndef clean_db():\n    def clean_db():\n        wn.remove(\"*\")\n        dummy_lex = lmf.Lexicon(\n            id=\"dummy\",\n            version=\"1\",\n            label=\"placeholder to initialize the db\",\n            language=\"zxx\",\n            email=\"\",\n            license=\"\",\n        )\n        wn.add_lexical_resource(\n            lmf.LexicalResource(lmf_version=\"1.3\", lexicons=[dummy_lex])\n        )\n\n    return clean_db\n\n\n@pytest.fixture(scope=\"session\")\ndef datadir():\n    return Path(__file__).parent.parent / \"tests\" / \"data\"\n\n\n@pytest.fixture\ndef empty_db(clean_db, tmp_path):\n    dir = tmp_path / \"wn_data_empty\"\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        clean_db()\n        yield\n\n\n@pytest.fixture(scope=\"session\")\ndef mock_lmf():\n    synsets: list[lmf.Synset] = [\n        *_make_synsets(\"n\", 20000),\n        *_make_synsets(\"v\", 10000),\n        *_make_synsets(\"a\", 2000),\n        *_make_synsets(\"r\", 1000),\n    ]\n    entries = _make_entries(synsets)\n    lexicon = lmf.Lexicon(\n        id=\"mock\",\n        version=\"1\",\n        label=\"\",\n        language=\"zxx\",\n        email=\"\",\n        license=\"\",\n        entries=entries,\n        synsets=synsets,\n    )\n    return lmf.LexicalResource(lmf_version=\"1.3\", lexicons=[lexicon])\n\n\n@pytest.fixture(scope=\"session\")\ndef mock_db_dir(mock_lmf, tmp_path_factory):\n    dir = tmp_path_factory.mktemp(\"wn_data_empty\")\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        wn.add_lexical_resource(mock_lmf, progress_handler=None)\n        wn._db.clear_connections()\n\n    return Path(dir)\n\n\n@pytest.fixture\ndef mock_db(monkeypatch, mock_db_dir):\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", mock_db_dir)\n        yield\n        wn._db.clear_connections()\n\n\ndef _make_synsets(pos: str, n: int) -> list[lmf.Synset]:\n    synsets: list[lmf.Synset] = [\n        lmf.Synset(\n            id=f\"{i}-{pos}\",\n            ili=\"\",\n            partOfSpeech=pos,\n            relations=[],\n            meta={},\n        )\n        for i in range(1, n + 1)\n    ]\n    # add relations for nouns and verbs\n    if pos in \"nv\":\n        total = len(synsets)\n        tgt_i = 1  # index of next target synset\n        n = cycle([2])  # how many targets to relate\n        for cur_i in range(total):\n            if tgt_i <= cur_i:\n                tgt_i = cur_i + 1\n            source = synsets[cur_i]\n            for cur_k in range(tgt_i, tgt_i + next(n)):\n                if cur_k >= total:\n                    break\n                target = synsets[cur_k]\n                source[\"relations\"].append(\n                    lmf.Relation(target=target[\"id\"], relType=\"hyponym\", meta={})\n                )\n                target[\"relations\"].append(\n                    lmf.Relation(target=source[\"id\"], relType=\"hypernym\", meta={})\n                )\n            tgt_i = cur_k + 1\n\n    return synsets\n\n\ndef _words() -> Iterator[str]:\n    consonants = \"kgtdpbfvszrlmnhw\"\n    vowels = \"aeiou\"\n    while True:\n        yield from map(\"\".join, product(consonants, vowels, consonants, vowels))\n\n\ndef _make_entries(synsets: list[lmf.Synset]) -> list[lmf.LexicalEntry]:\n    words = _words()\n    member_count = cycle(range(1, 4))  # 1, 2, or 3 synset members\n    entries: dict[str, lmf.LexicalEntry] = {}\n    prev_synsets: list[lmf.Synset] = []\n    for synset in synsets:\n        ssid = synset[\"id\"]\n        pos = synset[\"partOfSpeech\"]\n\n        for _ in range(next(member_count)):\n            word = next(words)\n            senses = [lmf.Sense(id=f\"{word}-{ssid}\", synset=ssid, meta={})]\n            # add some polysemy\n            if prev_synsets:\n                ssid2 = prev_synsets.pop()[\"id\"]\n                senses.append(lmf.Sense(id=f\"{word}-{ssid2}\", synset=ssid2, meta={}))\n            eid = f\"{word}-{pos}\"\n            if eid not in entries:\n                entries[eid] = lmf.LexicalEntry(\n                    id=eid,\n                    lemma=lmf.Lemma(\n                        writtenForm=word,\n                        partOfSpeech=pos,\n                    ),\n                    senses=[],\n                    meta={},\n                )\n            entries[eid][\"senses\"].extend(senses)\n\n        prev_synsets.append(synset)\n\n    return list(entries.values())\n"
  },
  {
    "path": "bench/test_bench.py",
    "content": "import pytest\n\nimport wn\nfrom wn import lmf\n\n\n@pytest.mark.benchmark(group=\"lmf.load\", warmup=True)\ndef test_load(datadir, benchmark):\n    benchmark(lmf.load, datadir / \"mini-lmf-1.0.xml\")\n\n\n@pytest.mark.benchmark(group=\"wn.add_lexical_resource\")\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_add_lexical_resource(mock_lmf, benchmark):\n    # TODO: when pytest-benchmark's teardown option is released, use\n    # that here with more rounds\n    benchmark.pedantic(\n        wn.add_lexical_resource,\n        args=(mock_lmf,),\n        # teardown=clean_db,\n        iterations=1,\n        rounds=1,\n    )\n\n\n@pytest.mark.benchmark(group=\"wn.add_lexical_resource\")\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_add_lexical_resource_no_progress(mock_lmf, benchmark):\n    # TODO: when pytest-benchmark's teardown option is released, use\n    # that here with more rounds\n    benchmark.pedantic(\n        wn.add_lexical_resource,\n        args=(mock_lmf,),\n        kwargs={\"progress_handler\": None},\n        # teardown=clean_db,\n        iterations=1,\n        rounds=1,\n    )\n\n\n@pytest.mark.benchmark(group=\"primary queries\")\n@pytest.mark.usefixtures(\"mock_db\")\ndef test_synsets(benchmark):\n    benchmark(wn.synsets)\n\n\n@pytest.mark.benchmark(group=\"primary queries\")\n@pytest.mark.usefixtures(\"mock_db\")\ndef test_words(benchmark):\n    benchmark(wn.words)\n\n\n@pytest.mark.benchmark(group=\"secondary queries\")\n@pytest.mark.usefixtures(\"mock_db\")\ndef test_word_senses_no_wordnet(benchmark):\n    word = wn.words()[0]\n    benchmark(word.senses)\n\n\n@pytest.mark.benchmark(group=\"secondary queries\")\n@pytest.mark.usefixtures(\"mock_db\")\ndef test_word_senses_with_wordnet(benchmark):\n    w = wn.Wordnet(\"mock:1\")\n    word = w.words()[0]\n    benchmark(word.senses)\n"
  },
  {
    "path": "docs/.readthedocs.yaml",
    "content": "# .readthedocs.yaml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details\n\n# Required\nversion: 2\n\n# Set the version of Python and other tools you might need\nbuild:\n  os: ubuntu-22.04\n  tools:\n    python: \"3.12\"\n\n# Build documentation in the docs/ directory with Sphinx\nsphinx:\n  configuration: docs/conf.py\n\n# We recommend specifying your dependencies to enable reproducible builds:\n# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html\npython:\n  install:\n    - requirements: docs/requirements.txt\n\nformats:\n  - pdf\n  - epub\n"
  },
  {
    "path": "docs/Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = .\nBUILDDIR      = _build\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n"
  },
  {
    "path": "docs/_static/css/svg.css",
    "content": "svg {\n    width: 500px;\n    height: 300px;\n\t\n    position: relative;\n    left: 20%;\n    -webkit-transform: translateX(-20%);\n    -ms-transform: translateX(-20%);\n    transform: translateX(-20%);\n\t\n    }\n\t\n"
  },
  {
    "path": "docs/_static/demo.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![logo](https://raw.githubusercontent.com/goodmami/wn/main/docs/_static/wn-logo-rotate.svg)\\n\",\n    \"\\n\",\n    \"# Wn Demonstration\\n\",\n    \"\\n\",\n    \"This is a demonstration of the [Wn](https://github.com/goodmami/wn/) library for working with wordnets in Python. To run this notebook locally, you will need to install the `wn` and `jupyter` packages, and download some wordnet data:\\n\",\n    \"\\n\",\n    \"* Linux/macOS\\n\",\n    \"\\n\",\n    \"  ```console\\n\",\n    \"  $ python3 -m pip install wn jupyter\\n\",\n    \"  $ python3 -m wn download omw oewn:2021\\n\",\n    \"  ```\\n\",\n    \"  \\n\",\n    \"* Windows\\n\",\n    \"\\n\",\n    \"  ```console\\n\",\n    \"  > py -3 -m pip install wn jupyter\\n\",\n    \"  > py -3 -m wn download omw oewn:2021\\n\",\n    \"  ```\\n\",\n    \"\\n\",\n    \"Now you should be able to import the `wn` package:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import wn\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Primary Queries\\n\",\n    \"\\n\",\n    \"A **primary query** of the database is when basic parameters such as word forms, parts of speech, or public identifiers (e.g., synset IDs) are used to retrieve basic wordnet entities. You can perform these searches via module-level functions such as [wn.words()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.words), [wn.senses()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.senses), and [wn.synsets()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.synsets):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('oewn-Malacca-n')]\"\n      ]\n     },\n     \"execution_count\": 2,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.words(\\\"Malacca\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('oewn-08985168-n')]\"\n      ]\n     },\n     \"execution_count\": 3,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.synsets(\\\"Malacca\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Filtering by Language / Lexicon\\n\",\n    \"\\n\",\n    \"Once you've added multiple wordnets, however, you will often get many results for such queries. If that's not clear, then the following will give you some idea(s):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-idea-n'),\\n\",\n       \" Word('omw-sk-idea-n'),\\n\",\n       \" Word('omw-pl-idea-n'),\\n\",\n       \" Word('omw-is-ídea-n'),\\n\",\n       \" Word('omw-zsm-idea-n'),\\n\",\n       \" Word('omw-iwn-idea-n'),\\n\",\n       \" Word('omw-it-idea-n'),\\n\",\n       \" Word('omw-gl-idea-n'),\\n\",\n       \" Word('omw-fi-idea-n'),\\n\",\n       \" Word('omw-ca-idea-n'),\\n\",\n       \" Word('omw-eu-idea-n'),\\n\",\n       \" Word('omw-es-idea-n'),\\n\",\n       \" Word('oewn-idea-n')]\"\n      ]\n     },\n     \"execution_count\": 4,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.words(\\\"idea\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can filter down the results by language, but that may not be enough if you have multiple wordnets for the same language (e.g., the [OMW English Wordnet based on WordNet 3.0](https://github.com/omwn/omw-data/) and the [Open English WordNet](https://en-word.net/)):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-idea-n'), Word('oewn-idea-n')]\"\n      ]\n     },\n     \"execution_count\": 5,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.words(\\\"idea\\\", lang=\\\"en\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The [wn.lexicons()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.lexicons) function can show which lexicons have been added for a language:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[<Lexicon omw-en:1.4 [en]>, <Lexicon oewn:2021 [en]>]\"\n      ]\n     },\n     \"execution_count\": 6,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.lexicons(lang=\\\"en\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can use the `id:version` string to restrict queries to a particular lexicon:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-idea-n')]\"\n      ]\n     },\n     \"execution_count\": 7,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"wn.words(\\\"idea\\\", lexicon=\\\"omw-en:1.4\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"But it can become tedious to enter these specifiers each time. Instead, a [wn.Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object can be used to make the language/lexicon filters persistent:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-idea-n')]\"\n      ]\n     },\n     \"execution_count\": 8,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en = wn.Wordnet(lexicon=\\\"omw-en:1.4\\\")\\n\",\n    \"en.words(\\\"idea\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Filtering by Word Form and Part of Speech\\n\",\n    \"\\n\",\n    \"Even within a single lexicon a word may return multiple results:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-pencil-n'), Word('omw-en-pencil-v')]\"\n      ]\n     },\n     \"execution_count\": 9,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.words(\\\"pencil\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can restrict results by part of speech, as well. E.g., to get the verbal sense of *pencil* (e.g., *to pencil in an appointment*), use the `pos` filter:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Word('omw-en-pencil-v')]\"\n      ]\n     },\n     \"execution_count\": 10,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.words(\\\"pencil\\\", pos=\\\"v\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"This works for getting senses and synsets, too:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Sense('omw-en-pencil-03908204-n'),\\n\",\n       \" Sense('omw-en-pencil-14796748-n'),\\n\",\n       \" Sense('omw-en-pencil-13863020-n'),\\n\",\n       \" Sense('omw-en-pencil-03908456-n'),\\n\",\n       \" Sense('omw-en-pencil-01688604-v')]\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.senses(\\\"pencil\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Sense('omw-en-pencil-01688604-v')]\"\n      ]\n     },\n     \"execution_count\": 12,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.senses(\\\"pencil\\\", pos=\\\"v\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-en-01688604-v')]\"\n      ]\n     },\n     \"execution_count\": 13,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.synsets(\\\"pencil\\\", pos=\\\"v\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"The wordform itself is just a filter on the results. Leaving it off, you can get all results for a particular part of speech:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"11531\"\n      ]\n     },\n     \"execution_count\": 14,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(en.words(pos=\\\"v\\\"))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Or all results, regardless of the part of speech:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"156584\"\n      ]\n     },\n     \"execution_count\": 15,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(en.words())\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Secondary Queries\\n\",\n    \"\\n\",\n    \"**Secondary queries** are used when you want to get additional information from a retrieved entity, such as the forms of a word or the definition of a synset. They are also used for finding links between entities, such as the senses of a word or the relations of a sense or synset.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'pencil'\"\n      ]\n     },\n     \"execution_count\": 16,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil = en.words(\\\"pencil\\\", pos=\\\"v\\\")[0]\\n\",\n    \"pencil.lemma()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['pencil', 'pencilled', 'pencilling']\"\n      ]\n     },\n     \"execution_count\": 17,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil.forms()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'v'\"\n      ]\n     },\n     \"execution_count\": 18,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil.pos\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Sense('omw-en-pencil-01688604-v')]\"\n      ]\n     },\n     \"execution_count\": 19,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil.senses()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"Synset('omw-en-01688604-v')\"\n      ]\n     },\n     \"execution_count\": 20,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil.senses()[0].synset()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-en-01688604-v')]\"\n      ]\n     },\n     \"execution_count\": 21,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil.synsets()  # shorthand for the above\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'write, draw, or trace with a pencil'\"\n      ]\n     },\n     \"execution_count\": 22,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss = pencil.synsets()[0]\\n\",\n    \"pencil_ss.definition()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['he penciled a figure']\"\n      ]\n     },\n     \"execution_count\": 23,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.examples()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-en-01690294-v')]\"\n      ]\n     },\n     \"execution_count\": 24,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.hypernyms()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['draw']\"\n      ]\n     },\n     \"execution_count\": 25,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.hypernyms()[0].lemmas()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Taxonomy Queries\\n\",\n    \"\\n\",\n    \"A common usage of wordnets is exploring the taxonomic structure via hypernym and hyponym relations. These operations thus have some more dedicated functions. For instance, path functions show the synsets from the starting synset to some other synset or the taxonomic root, such as [Synset.hypernym_paths()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.hypernym_paths):\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" Synset('omw-en-01690294-v') ['draw']\\n\",\n      \"   Synset('omw-en-01686132-v') ['represent', 'interpret']\\n\",\n      \"     Synset('omw-en-01619354-v') ['re-create']\\n\",\n      \"       Synset('omw-en-01617192-v') ['make', 'create']\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"for path in pencil_ss.hypernym_paths():\\n\",\n    \"    for i, ss in enumerate(path):\\n\",\n    \"        print(\\\"  \\\" * i, ss, ss.lemmas())\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Paths do not include the starting synset, so the length of the path (i.e., number of edges) is the length of the list of synsets. The length from a synset to the root is called the *depth*. However, as some synsets have multiple paths to the root, there is not always one single depth. Instead, the [Synset.min_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.min_depth) and [Synset.max_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.max_depth) methods find the lengths of the shortest and longest paths.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 27,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"2\"\n      ]\n     },\n     \"execution_count\": 27,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dog = en.synsets(\\\"dog\\\", pos=\\\"n\\\")[0]\\n\",\n    \"len(dog.hypernym_paths())  # two paths\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 28,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"(8, 13)\"\n      ]\n     },\n     \"execution_count\": 28,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"dog.min_depth(), dog.max_depth()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"It is also possible to find paths between two synsets by their lowest common hypernym (also called *least common subsumer*). Here I compare the verbs *pencil* and *pen*:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \" Synset('omw-en-01697816-v') ['create verbally']\\n\",\n      \"   Synset('omw-en-01617192-v') ['make', 'create']\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"pen_ss = en.synsets(\\\"pen\\\", pos=\\\"v\\\")[0]\\n\",\n    \"for path in pen_ss.hypernym_paths():\\n\",\n    \"    for i, ss in enumerate(path):\\n\",\n    \"        print(\\\"  \\\" * i, ss, ss.lemmas())\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-en-01617192-v')]\"\n      ]\n     },\n     \"execution_count\": 30,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.lowest_common_hypernyms(pen_ss)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Synset('omw-en-01690294-v') ['draw']\\n\",\n      \"Synset('omw-en-01686132-v') ['represent', 'interpret']\\n\",\n      \"Synset('omw-en-01619354-v') ['re-create']\\n\",\n      \"Synset('omw-en-01617192-v') ['make', 'create']\\n\",\n      \"Synset('omw-en-01697816-v') ['create verbally']\\n\",\n      \"Synset('omw-en-01698271-v') ['write', 'compose', 'pen', 'indite']\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"for ss in pencil_ss.shortest_path(pen_ss):\\n\",\n    \"    print(ss, ss.lemmas())\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Interlingual Queries\\n\",\n    \"\\n\",\n    \"In Wn, each wordnet (lexicon) added to the database is given its own, independent structure. All queries that traverse across wordnets make use of the Interlingual index (ILI) on synsets.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 32,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"'a thin cylindrical pointed writing implement; a rod of marking substance encased in wood'\"\n      ]\n     },\n     \"execution_count\": 32,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss = en.synsets(\\\"pencil\\\", pos=\\\"n\\\")[0]  # for this we'll use the nominal sense\\n\",\n    \"pencil_ss.definition()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To get the corresponding words, senses, or synsets in some other lexicon, use the [Word.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Word.translate), [Sense.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Sense.translate), and [Synset.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.translate) functions. Of these, the function on the sense is the most natural, as it translates a specific meaning of a specific word, although all translations go through the synsets. As a word may have many senses, translating a word returns a mapping of each sense to its list of translations.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 33,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['lapis', 'matita']\"\n      ]\n     },\n     \"execution_count\": 33,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.translate(lang=\\\"it\\\")[0].lemmas()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 34,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['ペンシル', '木筆', '鉛筆']\"\n      ]\n     },\n     \"execution_count\": 34,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"pencil_ss.translate(lexicon=\\\"omw-ja\\\")[0].lemmas()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 35,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"{Sense('omw-en-pencil-03908204-n'): [Word('omw-ja-ペンシル-n'),\\n\",\n       \"  Word('omw-ja-木筆-n'),\\n\",\n       \"  Word('omw-ja-鉛筆-n')],\\n\",\n       \" Sense('omw-en-pencil-14796748-n'): [Word('omw-ja-鉛筆-n')],\\n\",\n       \" Sense('omw-en-pencil-13863020-n'): [],\\n\",\n       \" Sense('omw-en-pencil-03908456-n'): [Word('omw-ja-ペンシル-n')]}\"\n      ]\n     },\n     \"execution_count\": 35,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"en.words(\\\"pencil\\\", pos=\\\"n\\\")[0].translate(lexicon=\\\"omw-ja\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Interlingual synsets are also used to traversing relations from another wordnet. For instance, many of the lexicons in the [Open Multilingual Wordnet](https://github.com/omwn/omw-data) were created using the *expand* method where only words were translated on top of Princeton WordNet synsets. All relations (hypernyms, hyponyms, etc.) then depend on those from WordNet. In Wn, a [Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object may be instantiated with an `expand` parameter which selects lexicons containing such relations. By default, all lexicons are used (i.e., `expand='*'`), but you can tell Wn to not use any expand lexicons (`expand=''`) or to use a specific lexicon (`expand='omw-en:1.4'`). By being specific, you can better control the behaviour of your program, e.g., for experimental reproducibility.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 36,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-ja-14796575-n')]\"\n      ]\n     },\n     \"execution_count\": 36,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"# by default, any other installed lexicon may be used\\n\",\n    \"wn.Wordnet(lexicon=\\\"omw-ja\\\").synsets(\\\"鉛筆\\\")[0].hypernyms()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 37,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[]\"\n      ]\n     },\n     \"execution_count\": 37,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"# disable interlingual query expansion\\n\",\n    \"wn.Wordnet(lexicon=\\\"omw-ja\\\", expand=\\\"\\\").synsets(\\\"鉛筆\\\")[0].hypernyms()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 38,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"[Synset('omw-ja-14796575-n')]\"\n      ]\n     },\n     \"execution_count\": 38,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"# specify the expand set\\n\",\n    \"wn.Wordnet(lexicon=\\\"omw-ja\\\", expand=\\\"omw-en:1.4\\\").synsets(\\\"鉛筆\\\")[0].hypernyms()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3 (ipykernel)\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.9.5\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 4\n}\n"
  },
  {
    "path": "docs/api/wn.compat.rst",
    "content": "wn.compat\n=========\n\nCompatibility modules for Wn.\n\nThis subpackage is a namespace for compatibility modules when working\nwith particular lexicons. Wn is designed to be agnostic to the\nlanguage or lexicon and not favor one over the other (with the\nexception of :mod:`wn.morphy`, which is English-specific). However,\nthere are some kinds of functionality that would be useful to\ninclude in Wn, even if they don't generalize to all lexicons.\n\nIncluded modules\n----------------\n\n.. toctree::\n   :maxdepth: 1\n\n   wn.compat.sensekey.rst\n\n"
  },
  {
    "path": "docs/api/wn.compat.sensekey.rst",
    "content": "wn.compat.sensekey\n==================\n\n.. automodule:: wn.compat.sensekey\n\n.. autofunction:: escape\n.. autofunction:: unescape\n.. autofunction:: sense_key_getter\n.. autofunction:: sense_getter\n"
  },
  {
    "path": "docs/api/wn.constants.rst",
    "content": "wn.constants\n============\n\n.. automodule:: wn.constants\n\nSynset Relations\n----------------\n\n.. data:: SYNSET_RELATIONS\n\n   - ``agent``\n   - ``also``\n   - ``attribute``\n   - ``be_in_state``\n   - ``causes``\n   - ``classified_by``\n   - ``classifies``\n   - ``co_agent_instrument``\n   - ``co_agent_patient``\n   - ``co_agent_result``\n   - ``co_instrument_agent``\n   - ``co_instrument_patient``\n   - ``co_instrument_result``\n   - ``co_patient_agent``\n   - ``co_patient_instrument``\n   - ``co_result_agent``\n   - ``co_result_instrument``\n   - ``co_role``\n   - ``direction``\n   - ``domain_region``\n   - ``domain_topic``\n   - ``exemplifies``\n   - ``entails``\n   - ``eq_synonym``\n   - ``has_domain_region``\n   - ``has_domain_topic``\n   - ``is_exemplified_by``\n   - ``holo_location``\n   - ``holo_member``\n   - ``holo_part``\n   - ``holo_portion``\n   - ``holo_substance``\n   - ``holonym``\n   - ``hypernym``\n   - ``hyponym``\n   - ``in_manner``\n   - ``instance_hypernym``\n   - ``instance_hyponym``\n   - ``instrument``\n   - ``involved``\n   - ``involved_agent``\n   - ``involved_direction``\n   - ``involved_instrument``\n   - ``involved_location``\n   - ``involved_patient``\n   - ``involved_result``\n   - ``involved_source_direction``\n   - ``involved_target_direction``\n   - ``is_caused_by``\n   - ``is_entailed_by``\n   - ``location``\n   - ``manner_of``\n   - ``mero_location``\n   - ``mero_member``\n   - ``mero_part``\n   - ``mero_portion``\n   - ``mero_substance``\n   - ``meronym``\n   - ``similar``\n   - ``other``\n   - ``patient``\n   - ``restricted_by``\n   - ``restricts``\n   - ``result``\n   - ``role``\n   - ``source_direction``\n   - ``state_of``\n   - ``target_direction``\n   - ``subevent``\n   - ``is_subevent_of``\n   - ``antonym``\n   - ``feminine``\n   - ``has_feminine``\n   - ``masculine``\n   - ``has_masculine``\n   - ``young``\n   - ``has_young``\n   - ``diminutive``\n   - ``has_diminutive``\n   - ``augmentative``\n   - ``has_augmentative``\n   - ``anto_gradable``\n   - ``anto_simple``\n   - ``anto_converse``\n   - ``ir_synonym``\n\n\nSense Relations\n---------------\n\n.. data:: SENSE_RELATIONS\n\n   - ``antonym``\n   - ``also``\n   - ``participle``\n   - ``pertainym``\n   - ``derivation``\n   - ``domain_topic``\n   - ``has_domain_topic``\n   - ``domain_region``\n   - ``has_domain_region``\n   - ``exemplifies``\n   - ``is_exemplified_by``\n   - ``similar``\n   - ``other``\n   - ``feminine``\n   - ``has_feminine``\n   - ``masculine``\n   - ``has_masculine``\n   - ``young``\n   - ``has_young``\n   - ``diminutive``\n   - ``has_diminutive``\n   - ``augmentative``\n   - ``has_augmentative``\n   - ``anto_gradable``\n   - ``anto_simple``\n   - ``anto_converse``\n   - ``simple_aspect_ip``\n   - ``secondary_aspect_ip``\n   - ``simple_aspect_pi``\n   - ``secondary_aspect_pi``\n\n\n.. data:: SENSE_SYNSET_RELATIONS\n\n   - ``domain_topic``\n   - ``domain_region``\n   - ``exemplifies``\n   - ``other``\n\n\n.. data:: REVERSE_RELATIONS\n\n   .. code-block:: python\n\n      {\n          'hypernym': 'hyponym',\n          'hyponym': 'hypernym',\n          'instance_hypernym': 'instance_hyponym',\n          'instance_hyponym': 'instance_hypernym',\n          'antonym': 'antonym',\n          'eq_synonym': 'eq_synonym',\n          'similar': 'similar',\n          'meronym': 'holonym',\n          'holonym': 'meronym',\n          'mero_location': 'holo_location',\n          'holo_location': 'mero_location',\n          'mero_member': 'holo_member',\n          'holo_member': 'mero_member',\n          'mero_part': 'holo_part',\n          'holo_part': 'mero_part',\n          'mero_portion': 'holo_portion',\n          'holo_portion': 'mero_portion',\n          'mero_substance': 'holo_substance',\n          'holo_substance': 'mero_substance',\n          'also': 'also',\n          'state_of': 'be_in_state',\n          'be_in_state': 'state_of',\n          'causes': 'is_caused_by',\n          'is_caused_by': 'causes',\n          'subevent': 'is_subevent_of',\n          'is_subevent_of': 'subevent',\n          'manner_of': 'in_manner',\n          'in_manner': 'manner_of',\n          'attribute': 'attribute',\n          'restricts': 'restricted_by',\n          'restricted_by': 'restricts',\n          'classifies': 'classified_by',\n          'classified_by': 'classifies',\n          'entails': 'is_entailed_by',\n          'is_entailed_by': 'entails',\n          'domain_topic': 'has_domain_topic',\n          'has_domain_topic': 'domain_topic',\n          'domain_region': 'has_domain_region',\n          'has_domain_region': 'domain_region',\n          'exemplifies': 'is_exemplified_by',\n          'is_exemplified_by': 'exemplifies',\n          'role': 'involved',\n          'involved': 'role',\n          'agent': 'involved_agent',\n          'involved_agent': 'agent',\n          'patient': 'involved_patient',\n          'involved_patient': 'patient',\n          'result': 'involved_result',\n          'involved_result': 'result',\n          'instrument': 'involved_instrument',\n          'involved_instrument': 'instrument',\n          'location': 'involved_location',\n          'involved_location': 'location',\n          'direction': 'involved_direction',\n          'involved_direction': 'direction',\n          'target_direction': 'involved_target_direction',\n          'involved_target_direction': 'target_direction',\n          'source_direction': 'involved_source_direction',\n          'involved_source_direction': 'source_direction',\n          'co_role': 'co_role',\n          'co_agent_patient': 'co_patient_agent',\n          'co_patient_agent': 'co_agent_patient',\n          'co_agent_instrument': 'co_instrument_agent',\n          'co_instrument_agent': 'co_agent_instrument',\n          'co_agent_result': 'co_result_agent',\n          'co_result_agent': 'co_agent_result',\n          'co_patient_instrument': 'co_instrument_patient',\n          'co_instrument_patient': 'co_patient_instrument',\n          'co_result_instrument': 'co_instrument_result',\n          'co_instrument_result': 'co_result_instrument',\n          'pertainym': 'pertainym',\n          'derivation': 'derivation',\n          'simple_aspect_ip': 'simple_aspect_pi',\n          'simple_aspect_pi': 'simple_aspect_ip',\n          'secondary_aspect_ip': 'secondary_aspect_pi',\n          'secondary_aspect_pi': 'secondary_aspect_ip',\n          'feminine': 'has_feminine',\n          'has_feminine': 'feminine',\n          'masculine': 'has_masculine',\n          'has_masculine': 'masculine',\n          'young': 'has_young',\n          'has_young': 'young',\n          'diminutive': 'has_diminutive',\n          'has_diminutive': 'diminutive',\n          'augmentative': 'has_augmentative',\n          'has_augmentative': 'augmentative',\n          'anto_gradable': 'anto_gradable',\n          'anto_simple': 'anto_simple',\n          'anto_converse': 'anto_converse',\n          'ir_synonym': 'ir_synonym',\n      }\n\n.. _parts-of-speech:\n\nParts of Speech\n---------------\n\n.. data:: PARTS_OF_SPEECH\n\n   - ``n`` -- Noun\n   - ``v`` -- Verb\n   - ``a`` -- Adjective\n   - ``r`` -- Adverb\n   - ``s`` -- Adjective Satellite\n   - ``t`` -- Phrase\n   - ``c`` -- Conjunction\n   - ``p`` -- Adposition\n   - ``x`` -- Other\n   - ``u`` -- Unknown\n\n.. autodata:: NOUN\n.. autodata:: VERB\n.. autodata:: ADJECTIVE\n.. data:: ADJ\n\n   Alias of :py:data:`ADJECTIVE`\n\n.. autodata:: ADJECTIVE_SATELLITE\n.. data:: ADJ_SAT\n\n   Alias of :py:data:`ADJECTIVE_SATELLITE`\n\n.. autodata:: PHRASE\n.. autodata:: CONJUNCTION\n.. data:: CONJ\n\n   Alias of :py:data:`CONJUNCTION`\n\n.. autodata:: ADPOSITION\n.. autodata:: ADP\n\n   Alias of :py:data:`ADPOSITION`\n\n.. autodata:: OTHER\n.. autodata:: UNKNOWN\n\n\nAdjective Positions\n-------------------\n\n.. data:: ADJPOSITIONS\n\n   - ``a`` -- Attributive\n   - ``ip``  -- Immediate Postnominal\n   - ``p`` -- Predicative\n\n\nLexicographer Files\n-------------------\n\n.. data:: LEXICOGRAPHER_FILES\n\n   .. code-block:: python\n\n      {\n          'adj.all': 0,\n          'adj.pert': 1,\n          'adv.all': 2,\n          'noun.Tops': 3,\n          'noun.act': 4,\n          'noun.animal': 5,\n          'noun.artifact': 6,\n          'noun.attribute': 7,\n          'noun.body': 8,\n          'noun.cognition': 9,\n          'noun.communication': 10,\n          'noun.event': 11,\n          'noun.feeling': 12,\n          'noun.food': 13,\n          'noun.group': 14,\n          'noun.location': 15,\n          'noun.motive': 16,\n          'noun.object': 17,\n          'noun.person': 18,\n          'noun.phenomenon': 19,\n          'noun.plant': 20,\n          'noun.possession': 21,\n          'noun.process': 22,\n          'noun.quantity': 23,\n          'noun.relation': 24,\n          'noun.shape': 25,\n          'noun.state': 26,\n          'noun.substance': 27,\n          'noun.time': 28,\n          'verb.body': 29,\n          'verb.change': 30,\n          'verb.cognition': 31,\n          'verb.communication': 32,\n          'verb.competition': 33,\n          'verb.consumption': 34,\n          'verb.contact': 35,\n          'verb.creation': 36,\n          'verb.emotion': 37,\n          'verb.motion': 38,\n          'verb.perception': 39,\n          'verb.possession': 40,\n          'verb.social': 41,\n          'verb.stative': 42,\n          'verb.weather': 43,\n          'adj.ppl': 44,\n      }\n"
  },
  {
    "path": "docs/api/wn.ic.rst",
    "content": "\nwn.ic\n=====\n\n.. automodule:: wn.ic\n\nThe mathematical formulae for information content are defined in\n`Formal Description`_, and the corresponding Python API function are\ndescribed in `Calculating Information Content`_. These functions\nrequire information content weights obtained either by `computing them\nfrom a corpus <Computing Corpus Weights_>`_, or by `loading\npre-computed weights from a file <Reading Pre-computed Information\nContent Files_>`_.\n\n.. note::\n\n   The term *information content* can be ambiguous. It often, and most\n   accurately, refers to the result of the :func:`information_content`\n   function (:math:`\\text{IC}(c)` in the mathematical notation), but\n   is also sometimes used to refer to the corpus frequencies/weights\n   (:math:`\\text{freq}(c)` in the mathematical notation) returned by\n   :func:`load` or :func:`compute`, as these weights are the basis of\n   the value computed by :func:`information_content`. The Wn\n   documentation tries to consistently refer to former as the\n   *information content value*, or just *information content*, and the\n   latter as *information content weights*, or *weights*.\n\n\nFormal Description\n------------------\n\nThe Information Content (IC) of a concept (synset) is a measure of its\nspecificity computed from the wordnet's taxonomy structure and corpus\nfrequencies. It is defined by Resnik 1995 ([RES95]_), following\ninformation theory, as the negative log-probability of a concept:\n\n.. math::\n\n   \\text{IC}(c) = -\\log{p(c)}\n\nA concept's probability is the empirical probability over a corpus:\n\n.. math::\n\n   p(c) = \\frac{\\text{freq}(c)}{N}\n\nHere, :math:`N` is the total count of words of the same category as\nconcept :math:`c` ([RES95]_ only considered nouns) where each word has\nsome representation in the wordnet, and :math:`\\text{freq}` is defined\nas the sum of corpus counts of words in :math:`\\text{words}(c)`, which\nis the set of words subsumed by concept :math:`c`:\n\n.. math::\n\n   \\text{freq}(c) = \\sum_{w \\in \\text{words}(c)}{\\text{count}(w)}\n\nIt is common for :math:`\\text{freq}` to not contain actual frequencies\nbut instead weights distributed evenly among the synsets for a\nword. These weights are calculated as the word frequency divided by\nthe number of synsets for the word:\n\n.. math::\n\n   \\text{freq}_{\\text{distributed}}(c)\n   = \\sum_{w \\in \\text{words}(c)}{\\frac{\\text{count}(w)}{|\\text{synsets}(w)|}}\n\n.. [RES95] Resnik, Philip. \"Using information content to evaluate\n   semantic similarity.\" In Proceedings of the 14th International\n   Joint Conference on Artificial Intelligence (IJCAI-95), Montreal,\n   Canada, pp. 448-453. 1995.\n\n\nExample\n-------\n\nIn the Princeton WordNet 3.0 (hereafter *WordNet*, but note that the\nequivalent lexicon in Wn is the *OMW English Wordnet based on WordNet\n3.0* with specifier ``omw-en:1.4``), the frequency of a concept like\n**stone fruit** is not just the number of occurrences of *stone\nfruit*, but also includes the counts of the words for its hyponyms\n(*almond*, *olive*, etc.) and other taxonomic descendants (*Jordan\nalmond*, *green olive*, etc.). The word *almond* has two synsets: one\nfor the fruit or nut, another for the plant. Thus, if the word\n*almond* is encountered :math:`n` times in a corpus, then the weight\n(either the frequency :math:`n` or distributed weight\n:math:`\\frac{n}{2}`) is added to the total weights for both synsets\nand to those of their ancestors, but not for descendant synsets, such\nas for **Jordan almond**. The fruit/nut synset of almond has two\nhypernym paths which converge on **fruit**:\n\n1. **almond** ⊃ **stone fruit** ⊃ **fruit**\n2. **almond** ⊃ **nut** ⊃ **seed** ⊃ **fruit**\n\nThe weight is added to each ancestor (**stone fruit**, **nut**,\n**seed**, **fruit**, ...) once. That is, the weight is not added to\nthe convergent ancestor for **fruit** twice, but only once.\n\n\nCalculating Information Content\n-------------------------------\n\n.. autofunction:: information_content\n.. autofunction:: synset_probability\n\n\nComputing Corpus Weights\n------------------------\n\nIf pre-computed weights are not available for a wordnet or for some\ndomain, they can be computed given a corpus and a wordnet.\n\nThe corpus is an iterable of words. For large corpora it may help to\nuse a generator for this iterable, but the entire vocabulary (i.e.,\nunique words and counts) will be held at once in memory. Multi-word\nexpressions are also possible if they exist in the wordnet. For\ninstance, WordNet has *stone fruit*, with a single space delimiting\nthe words, as an entry.\n\nThe :class:`wn.Wordnet` object must be instantiated with a single\nlexicon, although it may have expand-lexicons for relation\ntraversal. For best results, the wordnet should use a lemmatizer to\nhelp it deal with inflected wordforms from running text.\n\n.. autofunction:: compute\n\n\nReading Pre-computed Information Content Files\n----------------------------------------------\n\nThe :func:`load` function reads pre-computed information content\nweights files as used by the `WordNet::Similarity\n<http://wn-similarity.sourceforge.net/>`_ Perl module or the `NLTK\n<http://www.nltk.org/>`_ Python package. These files are computed for\na specific version of a wordnet using the synset offsets from the\n`WNDB <https://wordnet.princeton.edu/documentation/wndb5wn>`_ format,\nwhich Wn does not use. These offsets therefore must be converted into\nan identifier that matches those used by the wordnet. By default,\n:func:`load` uses the lexicon identifier from its *wordnet* argument\nwith synset offsets (padded with 0s to make 8 digits) and\nparts-of-speech from the weights file to format an identifier, such as\n``omw-en-00001174-n``. For wordnets that use a different identifier\nscheme, the *get_synset_id* parameter of :func:`load` can be given a\ncallable created with :func:`wn.util.synset_id_formatter`. It can also\nbe given another callable with the same signature as shown below:\n\n.. code-block:: python\n\n   get_synset_id(*, offset: int, pos: str) -> str\n\n\nWhen loading pre-computed information content files, it is recommended\nto use the ones with smoothing (i.e., ``*-add1.dat`` or\n``*-resnik-add1.dat``) to avoid math domain errors when computing the\ninformation content value.\n\n.. warning::\n\n   The weights files are only valid for the version of wordnet for\n   which they were created. Files created for WordNet 3.0 do not work\n   for WordNet 3.1 because the offsets used in its identifiers are\n   different, although the *get_synset_id* parameter of :func:`load`\n   could be given a function that performs a suitable mapping. Some\n   `Open Multilingual Wordnet <https://github.com/omwn/omw-data>`_\n   wordnets use the WordNet 3.0 offsets in their identifiers and can\n   therefore technically use the weights, but this usage is\n   discouraged because the distributional properties of text in\n   another language and the structure of the other wordnet will not be\n   compatible with that of the English WordNet. For these cases, it is\n   recommended to compute new weights using :func:`compute`.\n\n.. autofunction:: load\n"
  },
  {
    "path": "docs/api/wn.ili.rst",
    "content": "wn.ili\n======\n\n.. automodule:: wn.ili\n\n.. note::\n\n   See :doc:`../guides/interlingual` for background and usage information about\n   ILIs.\n\n\nFunctions for Getting ILI Objects\n---------------------------------\n\nThe following functions are for getting individual :class:`ILI` and\n:class:`ProposedILI` objects from ILI identifiers or synsets, respectively, or\nto list all such known objects.\n\n.. autofunction:: get\n.. autofunction:: get_all\n.. autofunction:: get_proposed\n.. autofunction:: get_all_proposed\n\n\nILI Status\n----------\n\nThe status of an ILI object (:attr:`ILI.status` or :attr:`ProposedILI.status`)\nindicates what is known about its validity. Explicit information about ILIs can\nbe added to Wn with :func:`wn.add` (e.g., :python:`wn.add(\"cili\")`), but without\nit Wn can only make a guess.\n\nIf a lexicon has synsets referencing some ILI identifier and no ILI file has\nbeen loaded, that ILI would have a status of :attr:`ILIStatus.PRESUPPOSED`. If\nan ILI file has been loaded that lists the identifier, it would have a status of\n:attr:`ILIStatus.ACTIVE`, whether or not a lexicon has been added that uses\nthe ILI. Both of these cases use :class:`ILI` objects.\n\nA synset in the WN-LMF format may also propose a new ILI. It won't have an\nidentifier, but it should have a definition. These have the status of\n:attr:`ILIStatus.PROPOSED`. The :class:`ProposedILI` is used for these objects,\nand that is the only status they have.\n\nThe :attr:`ILIStatus.UNKNOWN` status is just a default (e.g., when manually\ncreating an :class:`ILI` object) and won't be encountered in normal scenarios.\n\n.. autoclass:: ILIStatus\n\n   .. autoattribute:: UNKNOWN\n   .. autoattribute:: ACTIVE\n   .. autoattribute:: PRESUPPOSED\n   .. autoattribute:: PROPOSED\n\n\nILI Classes\n-----------\n\n.. autoclass:: ILI\n\n   .. autoattribute:: id\n\n      The ILI identifier.\n\n   .. autoattribute:: status\n\n      The status of the ILI.\n\n   .. automethod:: definition\n\n\n.. autoclass:: ProposedILI\n\n   .. autoproperty:: id\n   .. autoproperty:: status\n   .. automethod:: definition\n   .. automethod:: synset\n   .. automethod:: lexicon\n\n\nILI Definitions\n---------------\n\nMost likely someone inspecting the definition of an :class:`ILI` or\n:class:`ProposedILI` only cares about the definition text, but for\ncompleteness' sake the :class:`ILIDefinition` object models the text\nalong with any metadata that may have appeared in the WN-LMF lexicon\nfile. ILI files do not currently model metadata.\n\n.. autoclass:: ILIDefinition\n\n   .. autoattribute:: text\n   .. automethod:: metadata\n"
  },
  {
    "path": "docs/api/wn.lmf.rst",
    "content": "\nwn.lmf\n======\n\n.. automodule:: wn.lmf\n\n.. autofunction:: load\n.. autofunction:: scan_lexicons\n.. autofunction:: is_lmf\n\n"
  },
  {
    "path": "docs/api/wn.morphy.rst",
    "content": "\nwn.morphy\n=========\n\n.. automodule:: wn.morphy\n\n.. seealso::\n\n   The Princeton WordNet `documentation\n   <https://wordnet.princeton.edu/documentation/morphy7wn>`_ describes\n   the original implementation of Morphy.\n\n   The :doc:`../guides/lemmatization` guide describes how Wn handles\n   lemmatization in general.\n\n\nInitialized and Uninitialized Morphy\n------------------------------------\n\nThere are two ways of using Morphy in Wn: initialized and\nuninitialized.\n\nUnintialized Morphy is a simple callable that returns lemma\n*candidates* for some given wordform. That is, the results might not\nbe valid lemmas, but this is not a problem in practice because\nsubsequent queries against the database will filter out the invalid\nones. This callable is obtained by creating a :class:`Morphy` object\nwith no arguments:\n\n>>> from wn import morphy\n>>> m = morphy.Morphy()\n\nAs an uninitialized Morphy cannot predict which lemmas in the result\nare valid, it always returns the original form and any transformations\nit can find for each part of speech:\n\n>>> m('lemmata', pos='n')  # exceptional form\n{'n': {'lemmata'}}\n>>> m('lemmas', pos='n')   # regular morphology with part-of-speech\n{'n': {'lemma', 'lemmas'}}\n>>> m('lemmas')            # regular morphology for any part-of-speech\n{None: {'lemmas'}, 'n': {'lemma'}, 'v': {'lemma'}}\n>>> m('wolves')            # invalid forms may be returned\n{None: {'wolves'}, 'n': {'wolf', 'wolve'}, 'v': {'wolve', 'wolv'}}\n\n\nThis lemmatizer can also be used with a :class:`wn.Wordnet` object to\nexpand queries:\n\n>>> import wn\n>>> ewn = wn.Wordnet('ewn:2020')\n>>> ewn.words('lemmas')\n[]\n>>> ewn = wn.Wordnet('ewn:2020', lemmatizer=morphy.Morphy())\n>>> ewn.words('lemmas')\n[Word('ewn-lemma-n')]\n\nAn initialized Morphy is created with a :class:`wn.Wordnet` object as\nits argument. It then uses the wordnet to build lists of valid lemmas\nand exceptional forms (this takes a few seconds). Once this is done,\nit will only return lemmas it knows about:\n\n>>> ewn = wn.Wordnet('ewn:2020')\n>>> m = morphy.Morphy(ewn)\n>>> m('lemmata', pos='n')  # exceptional form\n{'n': {'lemma'}}\n>>> m('lemmas', pos='n')   # regular morphology with part-of-speech\n{'n': {'lemma'}}\n>>> m('lemmas')            # regular morphology for any part-of-speech\n{'n': {'lemma'}}\n>>> m('wolves')            # invalid forms are pre-filtered\n{'n': {'wolf'}}\n\nIn order to use an initialized Morphy lemmatizer with a\n:class:`wn.Wordnet` object, it must be assigned to the object after\ncreation:\n\n>>> ewn = wn.Wordnet('ewn:2020')  # default: lemmatizer=None\n>>> ewn.words('lemmas')\n[]\n>>> ewn.lemmatizer = morphy.Morphy(ewn)\n>>> ewn.words('lemmas')\n[Word('ewn-lemma-n')]\n\nThere is little to no difference in the results obtained from a\n:class:`wn.Wordnet` object using an initialized or uninitialized\n:class:`Morphy` object, but there may be slightly different\nperformance profiles for future queries.\n\n\nDefault Morphy Lemmatizer\n-------------------------\n\nAs a convenience, an uninitialized Morphy lemmatizer is provided in\nthis module via the :data:`morphy` member.\n\n.. data:: morphy\n\n   A :class:`Morphy` object created without a :class:`wn.Wordnet`\n   object.\n\n\nThe Morphy Class\n----------------\n\n.. autoclass:: Morphy\n"
  },
  {
    "path": "docs/api/wn.project.rst",
    "content": "wn.project\n==========\n\n.. automodule:: wn.project\n\n.. autofunction:: get_project\n.. autofunction:: iterpackages\n.. autofunction:: is_package_directory\n.. autofunction:: is_collection_directory\n\nProject Classes\n---------------\n\nProjects can be simple resource files, :class:`Package` directories,\nor :class:`Collection` directories. For API consistency, resource\nfiles are modeled as a virtual package (:class:`ResourceOnlyPackage`).\n\n.. class:: Project\n\n   The base class for packages and collections.\n\n   This class is not used directly, but all subclasses will implement\n   the methods listed here.\n\n   .. autoproperty:: path\n   .. automethod:: readme\n   .. automethod:: license\n   .. automethod:: citation\n\n.. autoclass:: Package\n   :show-inheritance:\n\n   .. autoproperty:: type\n   .. automethod:: resource_file\n\n.. autoclass:: ResourceOnlyPackage\n   :show-inheritance:\n\n.. autoclass:: Collection\n   :show-inheritance:\n\n   .. automethod:: packages\n"
  },
  {
    "path": "docs/api/wn.rst",
    "content": "\nwn\n===\n\n.. automodule:: wn\n\n\nProject Management Functions\n----------------------------\n\n.. autofunction:: download\n.. autofunction:: add\n.. autofunction:: add_lexical_resource\n.. autofunction:: remove\n.. autofunction:: export\n.. autofunction:: projects\n.. autofunction:: reset_database\n\n\nWordnet Query Functions\n-----------------------\n\nWhile it is best to first instantiate a :class:`Wordnet` object with a\nspecific lexicon and use that for querying (see :ref:`default-mode`),\nthe following functions are also available for quick and simple\nqueries.\n\n.. autofunction:: word\n.. autofunction:: words\n.. autofunction:: lemmas\n.. autofunction:: sense\n.. autofunction:: senses\n.. autofunction:: synset\n.. autofunction:: synsets\n.. autofunction:: lexicons\n\n\nThe Wordnet Class\n-----------------\n\n.. autoclass:: Wordnet\n\n   .. automethod:: word\n   .. automethod:: words\n   .. automethod:: lemmas\n   .. automethod:: sense\n   .. automethod:: senses\n   .. automethod:: synset\n   .. automethod:: synsets\n   .. automethod:: lexicons\n   .. automethod:: expanded_lexicons\n   .. automethod:: describe\n\n\nWords, Senses, and Synsets\n--------------------------\n\nThe results of primary queries against a lexicon are :class:`Word`,\n:class:`Sense`, or :class:`Synset` objects. See\n:doc:`../guides/wordnet` for more information about the concepts these\nobject represent.\n\nWord Objects\n''''''''''''\n\n.. class:: Word\n\n   :class:`Word` (or \"lexical entry\") objects encode information about\n   word forms independent from their meaning.\n\n   .. autoattribute:: id\n\n      The identifier used within a lexicon.\n\n   .. autoattribute:: pos\n\n      The part of speech of the Word.\n\n   .. automethod:: lemma\n   .. automethod:: forms\n   .. automethod:: senses\n   .. automethod:: synsets\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n   .. automethod:: derived_words\n   .. automethod:: translate\n\n\nSense Objects\n'''''''''''''\n\n.. class:: Sense\n\n   :class:`Sense` objects represent a pairing of a :class:`Word` and a\n   :class:`Synset`.\n\n   .. autoattribute:: id\n\n      The identifier used within a lexicon.\n\n   .. automethod:: word\n   .. automethod:: synset\n   .. automethod:: examples\n   .. automethod:: lexicalized\n   .. automethod:: adjposition\n   .. automethod:: frames\n   .. automethod:: counts\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n   .. automethod:: relations\n   .. automethod:: synset_relations\n   .. automethod:: get_related\n   .. automethod:: get_related_synsets\n   .. automethod:: closure\n   .. automethod:: relation_paths\n   .. automethod:: translate\n\n\nSynset Objects\n''''''''''''''\n\n.. class:: Synset\n\n   :class:`Synset` objects represent a set of words that share a\n   meaning.\n\n   .. autoattribute:: id\n\n      The identifier used within a lexicon.\n\n   .. autoattribute:: pos\n\n      The part of speech of the Synset.\n\n   .. autoproperty:: ili\n\n      The interlingual index of the Synset.\n\n   .. automethod:: definition\n   .. automethod:: definitions\n   .. automethod:: examples\n   .. automethod:: senses\n   .. automethod:: lexicalized\n   .. automethod:: lexfile\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n   .. automethod:: words\n   .. automethod:: lemmas\n   .. automethod:: hypernyms\n   .. automethod:: hyponyms\n   .. automethod:: holonyms\n   .. automethod:: meronyms\n   .. automethod:: relations\n   .. automethod:: get_related\n   .. automethod:: closure\n   .. automethod:: relation_paths\n   .. automethod:: translate\n\n   .. The taxonomy methods below have been moved to wn.taxonomy\n\n   .. method:: hypernym_paths(simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.hypernym_paths`.\n\n   .. method:: min_depth(simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.min_depth`.\n\n   .. method:: max_depth(simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.max_depth`.\n\n   .. method:: shortest_path(other, simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.shortest_path`.\n\n   .. method:: common_hypernyms(other, simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.common_hypernyms`.\n\n   .. method:: lowest_common_hypernyms(other, simulate_root=False)\n\n      Shortcut for :func:`wn.taxonomy.lowest_common_hypernyms`.\n\n\nRelations\n---------\n\nThe :meth:`Sense.relation_map` and :meth:`Synset.relation_map` methods\nreturn a dictionary mapping :class:`Relation` objects to resolved\ntarget senses or synsets. They differ from :meth:`Sense.relations`\nand :meth:`Synset.relations` in two main ways:\n\n1. Relation objects map 1-to-1 to their targets instead of to a list\n   of targets sharing the same relation name.\n2. Relation objects encode not just relation names, but also the\n   identifiers of sources and targets, the lexicons they came from, and\n   any metadata they have.\n\nOne reason why :class:`Relation` objects are useful is for inspecting\nrelation metadata, particularly in order to distinguish ``other``\nrelations that differ only by the value of their ``dc:type`` metadata:\n\n>>> oewn = wn.Wordnet('oewn:2024')\n>>> alloy = oewn.senses(\"alloy\", pos=\"v\")[0]\n>>> alloy.relations()  # appears to only have one 'other' relation\n{'derivation': [Sense('oewn-alloy__1.27.00..')], 'other': [Sense('oewn-alloy__1.27.00..')]}\n>>> for rel in alloy.relation_map():  # but in fact there are two\n...     print(rel, rel.subtype)\n... \nRelation('derivation', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') None\nRelation('other', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') material\nRelation('other', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') result\n\nAnother reason why they are useful is to determine the source of a\nrelation used in :doc:`interlingual queries <../guides/interlingual>`.\n\n>>> es = wn.Wordnet(\"omw-es\", expand=\"omw-en\")\n>>> mapa = es.synsets(\"mapa\", pos=\"n\")[0]\n>>> rel, tgt = next(iter(mapa.relation_map().items()))\n>>> rel, rel.lexicon()  # relation comes from omw-en\n(Relation('hypernym', 'omw-en-03720163-n', 'omw-en-04076846-n'), <Lexicon omw-en:1.4 [en]>)\n>>> tgt, tgt.words(), tgt.lexicon()  # target is in omw-es\n(Synset('omw-es-04076846-n'), [Word('omw-es-representación-n')], <Lexicon omw-es:1.4 [es]>)\n\n.. class:: Relation\n\n   :class:`Relation` objects model relations between senses or synsets.\n\n   .. attribute:: name\n\n      The name of the relation. Also called the relation \"type\".\n\n   .. attribute:: source_id\n\n      The identifier of the source entity of the relation.\n\n   .. attribute:: target_id\n\n      The identifier of the target entity of the relation.\n\n   .. autoattribute:: subtype\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n\n\nAdditional Classes\n------------------\n\n.. class:: Form\n\n   :class:`Form` objects are returned by :meth:`Word.lemma` and\n   :meth:`Word.forms` when the :python:`data=True` argument is used,\n   and they make accessible several optional properties of word forms.\n   The word form itself is available via the :attr:`value` attribute.\n\n   >>> inu = wn.words('犬', lexicon='wnja')[0]\n   >>> inu.forms(data=True)[3]\n   Form(value='いぬ')\n   >>> inu.forms(data=True)[3].script\n   'hira'\n\n   The :attr:`script` is often unspecified (i.e., :python:`None`) and\n   this carries the implicit meaning that the form uses the canonical\n   script for the word's language or wordnet, whatever it may be.\n\n   .. attribute:: value\n\n      The word form string.\n\n   .. attribute:: id\n\n      An optional form identifier used within a lexicon. These\n      identifiers are often :python:`None`.\n\n   .. attribute:: script\n\n      The script of the word form. This should be an `ISO 15924\n      <https://en.wikipedia.org/wiki/ISO_15924>`_ code, or :python:`None`.\n\n   .. method:: pronunciations\n\n      Return the list of :class:`Pronunciation` objects.\n\n   .. method:: tags\n\n      Return the list of :class:`Tag` objects.\n\n   .. automethod:: lexicon\n\n\n.. class:: Pronunciation\n\n   :class:`Pronunciation` objects encode a text or audio\n   representation of how a word is pronounced. They are returned by\n   :meth:`Form.pronunciations`.\n\n   .. autoattribute:: value\n\n      The encoded pronunciation.\n\n   .. autoattribute:: variety\n\n      The language variety this pronunciation belongs to.\n\n   .. autoattribute:: notation\n\n      The notation used to encode the pronunciation. For example: the\n      International Phonetic Alphabet (IPA).\n\n   .. autoattribute:: phonemic\n\n      :python:`True` when the encoded pronunciation is a generalized\n      phonemic description, or :python:`False` for more precise\n      phonetic transcriptions.\n\n   .. autoattribute:: audio\n\n      A URI to an associated audio file.\n\n   .. automethod:: lexicon\n\n\n.. autoclass:: Tag\n\n   :class:`Tag` objects encode categorical information about word\n   forms. They are returned by :meth:`Form.tags`.\n\n   .. autoattribute:: tag\n\n      The text value of the tag.\n\n   .. autoattribute:: category\n\n      The category, or kind, of the tag.\n\n   .. automethod:: lexicon\n\n\n.. autoclass:: Count\n\n   :class:`Count` objects model sense counts previously computed over\n   some corpus. They are returned by :meth:`Sense.counts`.\n   \n   .. autoattribute:: value\n\n      The count of sense occurrences.\n\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n\n\n.. class:: Example\n\n   :class:`Example` objects model example phrases for senses and\n   synsets. They are returned by :meth:`Sense.examples` and\n   :meth:`Synset.examples` when the :python:`data=True` argument is\n   given.\n\n   .. autoattribute:: text\n      \n      The example text.\n\n   .. autoattribute:: language\n\n      The language of the example.\n\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n\n\n.. class:: Definition\n\n   :class:`Definition` objects model synset definitions. They are\n   returned by :meth:`Synset.definition` when the :python:`data=True`\n   argument is given.\n   \n   .. autoattribute:: text\n      \n      The example text.\n\n   .. autoattribute:: language\n\n      The language of the example.\n\n   .. autoattribute:: source_sense_id\n\n      The id of the particular sense the definition is for.\n\n   .. automethod:: lexicon\n   .. automethod:: metadata\n   .. automethod:: confidence\n\n\nInterlingual Indices\n--------------------\n\nAs of Wn v1.0.0, see :mod:`wn.ili` classes and functions for ILIs\n\n\nLexicon Objects\n---------------\n\n.. class:: Lexicon\n\n   Lexicon objects contain attributes and metadata about a single\n   :doc:`lexicon <../guides/lexicons>`.\n\n   .. autoattribute:: id\n\n      The lexicon's identifier.\n\n   .. autoattribute:: label\n\n      The full name of lexicon.\n\n   .. autoattribute:: language\n\n      The BCP 47 language code of lexicon.\n\n   .. autoattribute:: email\n\n      The email address of the wordnet maintainer.\n\n   .. autoattribute:: license\n\n      The URL or name of the wordnet's license.\n\n   .. autoattribute:: version\n\n      The version string of the resource.\n\n   .. autoattribute:: url\n\n      The project URL of the wordnet.\n\n   .. autoattribute:: citation\n\n      The canonical citation for the project.\n\n   .. autoattribute:: logo\n\n      A URL or path to a project logo.\n\n   .. automethod:: metadata\n   .. automethod:: confidence\n   .. automethod:: specifier\n   .. automethod:: modified\n   .. automethod:: requires\n   .. automethod:: extends\n   .. automethod:: extensions\n   .. automethod:: describe\n\n\nThe wn.config Object\n--------------------\n\nWn's data storage and retrieval can be configured through the\n:data:`wn.config` object.\n\n.. seealso::\n\n   :doc:`../setup` describes how to configure Wn using the\n   :data:`wn.config` instance.\n\n.. autodata:: config\n\nIt is an instance of the :class:`~wn._config.WNConfig` class, which is\ndefined in a non-public module and is not meant to be instantiated\ndirectly. Configuration should occur through the single\n:data:`wn.config` instance.\n\n.. autoclass:: wn._config.WNConfig\n\n   .. autoattribute:: data_directory\n   .. autoattribute:: database_path\n   .. attribute:: allow_multithreading\n\n      If set to :python:`True`, the database connection may be shared\n      across threads. In this case, it is the user's responsibility to\n      ensure that multiple threads don't try to write to the database\n      at the same time. The default is :python:`False`.\n\n   .. autoattribute:: downloads_directory\n   .. automethod:: add_project\n   .. automethod:: add_project_version\n   .. automethod:: get_project_info\n   .. automethod:: get_cache_path\n   .. automethod:: list_cache_entries\n   .. automethod:: update\n   .. automethod:: load_index\n\n\nAuxiliary WNConfig Types\n''''''''''''''''''''''''\n\nThe following classes are argument or return types of\n:class:`~wn._config.WNConfig` objects. They are documented here for reference,\nbut are not meant to be created directly.\n\n.. autoclass:: wn._config.ResourceType\n\n   Enumeration of resource types.\n\n   .. autoattribute:: WORDNET\n   .. autoattribute:: ILI\n\n\n.. autoclass:: wn._config.ProjectInfo\n   :members:\n   :undoc-members:\n\n   Dictionary of information about a project.\n\n\n.. autoclass:: wn._config.VersionInfo\n   :members:\n   :undoc-members:\n\n   Dictionary of information about a resource version.\n\n\n.. autoclass:: wn._config.ResolvedProjectInfo\n   :members:\n   :undoc-members:\n\n   Dictionary of information about a specific project resource.\n\n\n.. autoclass:: wn._config.CacheEntry\n   :members:\n   :undoc-members:\n\n   Dictionary of information about files in the download cache.\n\n\nExceptions\n----------\n\n.. autoexception:: Error\n.. autoexception:: DatabaseError\n.. autoexception:: WnWarning\n"
  },
  {
    "path": "docs/api/wn.similarity.rst",
    "content": "wn.similarity\n=============\n\n.. automodule:: wn.similarity\n\nTaxonomy-based Metrics\n----------------------\n\nThe `Path <Path Similarity_>`_, `Leacock-Chodorow <Leacock-Chodorow\nSimilarity_>`_, and `Wu-Palmer <Wu-Palmer Similarity_>`_ similarity\nmetrics work by finding path distances in the hypernym/hyponym\ntaxonomy. As such, they are most useful when the synsets are, in fact,\narranged in a taxonomy. For the Princeton WordNet and derivative\nwordnets, such as the `Open English Wordnet`_ and `OMW English Wordnet\nbased on WordNet 3.0`_ available to Wn, synsets for nouns and verbs\nare arranged taxonomically: the nouns mostly form a single structure\nwith a single root while verbs form many smaller structures with many\nroots. Synsets for the other parts of speech do not use\nhypernym/hyponym relations at all. This situation may be different for\nother wordnet projects or future versions of the English wordnets.\n\n.. _Open English Wordnet: https://en-word.net\n.. _OMW English Wordnet based on WordNet 3.0: https://github.com/omwn/omw-data\n\nThe similarity metrics tend to fail when the synsets are not connected\nby some path. When the synsets are in different parts of speech, or\neven in separate lexicons, this failure is acceptable and\nexpected. But for cases like the verbs in the Princeton WordNet, it\nmight be more useful to pretend that there is some unique root for all\nverbs so as to create a path connecting any two of them. For this\npurpose, the *simulate_root* parameter is available on the\n:func:`path`, :func:`lch`, and :func:`wup` functions, where it is\npassed on to calls to :meth:`wn.Synset.shortest_path` and\n:meth:`wn.Synset.lowest_common_hypernyms`. Setting *simulate_root* to\n:python:`True` can, however, give surprising results if the words are\nfrom a different lexicon. Currently, computing similarity for synsets\nfrom a different part of speech raises an error.\n\n\nPath Similarity\n'''''''''''''''\n\nWhen :math:`p` is the length of the shortest path between two synsets,\nthe path similarity is:\n\n.. math::\n\n   \\frac{1}{p + 1}\n\nThe similarity score ranges between 0.0 and 1.0, where the higher the\nscore is, the more similar the synsets are. The score is 1.0 when a\nsynset is compared to itself, and 0.0 when there is no path between\nthe two synsets (i.e., the path distance is infinite).\n\n.. autofunction:: path\n\n\n.. _leacock-chodorow-similarity:\n\nLeacock-Chodorow Similarity\n'''''''''''''''''''''''''''\n\nWhen :math:`p` is the length of the shortest path between two synsets\nand :math:`d` is the maximum taxonomy depth, the Leacock-Chodorow\nsimilarity is:\n\n.. math::\n\n   -\\text{log}\\left(\\frac{p + 1}{2d}\\right)\n\n.. autofunction:: lch\n\n\nWu-Palmer Similarity\n''''''''''''''''''''\n\nWhen *LCS* is the lowest common hypernym (also called \"least common\nsubsumer\") between two synsets, :math:`i` is the shortest path\ndistance from the first synset to *LCS*, :math:`j` is the shortest\npath distance from the second synset to *LCS*, and :math:`k` is the\nnumber of nodes (distance + 1) from *LCS* to the root node, then the\nWu-Palmer similarity is:\n\n.. math::\n\n   \\frac{2k}{i + j + 2k}\n\n.. autofunction:: wup\n\n\nInformation Content-based Metrics\n---------------------------------\n\nThe `Resnik <Resnik Similarity_>`_, `Jiang-Conrath <Jiang-Conrath\nSimilarity_>`_, and `Lin <Lin Similarity_>`_ similarity metrics work\nby computing the information content of the synsets and/or that of\ntheir lowest common hypernyms. They therefore require information\ncontent weights (see :mod:`wn.ic`), and the values returned\nnecessarily depend on the weights used.\n\n\nResnik Similarity\n'''''''''''''''''\n\nThe Resnik similarity (`Resnik 1995\n<https://arxiv.org/pdf/cmp-lg/9511007.pdf>`_) is the maximum\ninformation content value of the common subsumers (hypernym ancestors)\nof the two synsets. Formally it is defined as follows, where\n:math:`c_1` and :math:`c_2` are the two synsets being compared.\n\n.. math::\n\n   \\text{max}_{c \\in \\text{S}(c_1, c_2)} \\text{IC}(c)\n\nSince a synset's information content is always equal or greater than\nthe information content of its hypernyms, :math:`S(c_1, c_2)` above is\nmore efficiently computed using the lowest common hypernyms instead of\nall common hypernyms.\n\n.. autofunction:: res\n\n\nJiang-Conrath Similarity\n''''''''''''''''''''''''\n\nThe Jiang-Conrath similarity metric (`Jiang and Conrath, 1997\n<https://www.aclweb.org/anthology/O97-1002.pdf>`_) combines the ideas\nof the taxonomy-based and information content-based metrics. It is\ndefined as follows, where :math:`c_1` and :math:`c_2` are the two\nsynsets being compared and :math:`c_0` is the lowest common hypernym\nof the two with the highest information content weight:\n\n.. math::\n\n   \\frac{1}{\\text{IC}(c_1) + \\text{IC}(c_2) - 2(\\text{IC}(c_0))}\n\nThis equation is the simplified form given in the paper were several\nparameterized terms are cancelled out because the full form is not\noften used in practice.\n\nThere are two special cases:\n\n1. If the information content of :math:`c_0`, :math:`c_1`, and\n   :math:`c_2` are all zero, the metric returns zero. This occurs when\n   both :math:`c_1` and :math:`c_2` are the root node, but it can also\n   occur if the synsets did not occur in the corpus and the smoothing\n   value was set to zero.\n\n2. Otherwise if :math:`c_1 + c_2 = 2c_0`, the metric returns\n   infinity. This occurs when the two synsets are the same, one is a\n   descendant of the other, etc., such that they have the same\n   frequency as each other and as their lowest common hypernym.\n\n.. autofunction:: jcn\n\n\nLin Similarity\n''''''''''''''\n\nAnother formulation of information content-based similarity is the Lin\nmetric (`Lin 1997 <https://www.aclweb.org/anthology/P97-1009.pdf>`_),\nwhich is defined as follows, where :math:`c_1` and :math:`c_2` are the\ntwo synsets being compared and :math:`c_0` is the lowest common\nhypernym with the highest information content weight:\n\n.. math::\n\n   \\frac{2(\\text{IC}(c_0))}{\\text{IC}(c_1) + \\text{IC}(c_0)}\n\nOne special case is if either synset has an information content value\nof zero, in which case the metric returns zero.\n\n.. autofunction:: lin\n"
  },
  {
    "path": "docs/api/wn.taxonomy.rst",
    "content": "\nwn.taxonomy\n===========\n\n.. automodule:: wn.taxonomy\n\n\nOverview\n--------\n\nAmong the valid synset relations for wordnets (see\n:data:`wn.constants.SYNSET_RELATIONS`), those used for describing\n*is-a* `taxonomies <https://en.wikipedia.org/wiki/Taxonomy>`_ are\ngiven special treatment and they are generally the most\nwell-developed relations in any wordnet. Typically these are the\n``hypernym`` and ``hyponym`` relations, which encode *is-a-type-of*\nrelationships (e.g., a *hermit crab* is a type of *decapod*, which is\na type of *crustacean*, etc.). They also include ``instance_hypernym``\nand ``instance_hyponym``, which encode *is-an-instance-of*\nrelationships (e.g., *Oregon* is an instance of *American state*).\n\nThe taxonomy forms a multiply-inheriting hierarchy with the synsets as\nnodes. In the English wordnets, such as the Princeton WordNet and its\nderivatives, nearly all nominal synsets form such a hierarchy with\nsingle root node, while verbal synsets form many smaller hierarchies\nwithout a common root. Other wordnets may have different properties,\nbut as many are based off of the Princeton WordNet, they tend to\nfollow this structure.\n\nFunctions to find paths within the taxonomies form the basis of all\n:mod:`wordnet similarity measures <wn.similarity>`. For instance, the\n:ref:`leacock-chodorow-similarity` measure uses both\n:func:`shortest_path` and (indirectly) :func:`taxonomy_depth`.\n\n\nWordnet-level Functions\n-----------------------\n\nRoot and leaf synsets in the taxonomy are those with no ancestors\n(``hypernym``, ``instance_hypernym``, etc.) or hyponyms (``hyponym``,\n``instance_hyponym``, etc.), respectively.\n\nFinding root and leaf synsets\n'''''''''''''''''''''''''''''\n\n.. autofunction:: roots\n.. autofunction:: leaves\n\nComputing the taxonomy depth\n''''''''''''''''''''''''''''\n\nThe taxonomy depth is the maximum depth from a root node to a leaf\nnode within synsets for a particular part of speech.\n\n.. autofunction:: taxonomy_depth\n\n\nSynset-level Functions\n----------------------\n\n.. autofunction:: hypernym_paths\n.. autofunction:: min_depth\n.. autofunction:: max_depth\n.. autofunction:: shortest_path\n.. autofunction:: common_hypernyms\n.. autofunction:: lowest_common_hypernyms\n"
  },
  {
    "path": "docs/api/wn.util.rst",
    "content": "wn.util\n=======\n\n.. automodule:: wn.util\n\n.. autofunction:: synset_id_formatter\n\n.. autoclass:: ProgressHandler\n   :members:\n\n   .. attribute:: kwargs\n\n      A dictionary storing the updateable parameters for the progress\n      handler. The keys are:\n\n      - ``message`` (:class:`str`) -- a generic message or name\n      - ``count`` (:class:`int`) -- the current progress counter\n      - ``total`` (:class:`int`) -- the expected final value of the counter\n      - ``unit`` (:class:`str`) -- the unit of measurement\n      - ``status`` (:class:`str`) -- the current status of the process\n\n.. autoclass:: ProgressBar\n   :members:\n"
  },
  {
    "path": "docs/api/wn.validate.rst",
    "content": "\nwn.validate\n===========\n\n.. automodule:: wn.validate\n\n.. autofunction:: validate\n"
  },
  {
    "path": "docs/cli.rst",
    "content": "Command Line Interface\n======================\n\nSome of Wn's functionality is exposed via the command line.\n\nGlobal Options\n--------------\n\n.. option:: -d DIR, --dir DIR\n\n   Change to use ``DIR`` as the data directory prior to invoking any\n   commands.\n\n\nSubcommands\n-----------\n\ndownload\n--------\n\nDownload and add projects to the database given one or more project\nspecifiers or URLs.\n\n.. code-block:: console\n\n   $ python -m wn download oewn:2021 omw:1.4 cili\n   $ python -m wn download https://en-word.net/static/english-wordnet-2021.xml.gz\n\n.. option:: --index FILE\n\n   Use the index at ``FILE`` to resolve project specifiers.\n\n   .. code-block:: console\n\n      $ python -m wn download --index my-index.toml mywn\n\n.. option:: --no-add\n\n   Download and cache the remote file, but don't add it to the\n   database.\n\n\ncache\n-----\n\nView the files in the download cache. The ``download`` command caches the (often\ncompressed) files to the filesystem prior to adding to Wn's database. The files\nare renamed with a hash of the URL to avoid name clashes, but this also makes it\nhard to determine what a particular file is. This command cross-references the\ndownloaded files with what is in the index. An optional project specifier\nargument can help narrow down the results.\n\n.. code-block:: console\n\n   $ python -m wn cache  # many results; abbreviated here\n   af909070c29845b952d1799551bffc302e28d2c5        own-en  1.0.0   https://github.com/own-pt/openWordnet-PT/releases/download/v1.0.0/own-en.tar.gz\n   e25af66e46775b00d689619787013e6a35e5cbf7        oewn    2025    https://en-word.net/static/english-wordnet-2025.xml.gz\n   5a26d97a0081996db4cd621638a8a9b0da09aa25        odenet  1.4     https://github.com/hdaSprachtechnologie/odenet/releases/download/v1.4/odenet-1.4.tar.xz\n   [...]\n   $ python -m wn cache \"oewn:2025*\" # narrowed results\n   e25af66e46775b00d689619787013e6a35e5cbf7        oewn    2025    https://en-word.net/static/english-wordnet-2025.xml.gz\n   0f5371187dcfe7e05f2a93ab85b4e1168859a5c2        oewn    2025+   https://en-word.net/static/english-wordnet-2025-plus.xml.gz\n\n.. option:: --full-paths-only\n\n   Only print the full path of each cache file. This can be useful when one\n   wants to pipe the results to other commands. For example, on Unix-like\n   systems, the following will delete matching cache entries:\n\n   .. code-block:: console\n\n      $ python -m wn cache --full-paths-only \"omw*:1.4\" | xargs rm\n\n\nlexicons\n--------\n\nThe ``lexicons`` subcommand lets you quickly see what is installed:\n\n.. code-block:: console\n\n   $ python -m wn lexicons\n   omw-en\t1.4\t[en]\tOMW English Wordnet based on WordNet 3.0\n   omw-sk\t1.4\t[sk]\tSlovak WordNet\n   omw-pl\t1.4\t[pl]\tplWordNet\n   omw-is\t1.4\t[is]\tIceWordNet\n   omw-zsm\t1.4\t[zsm]\tWordnet Bahasa (Malaysian)\n   omw-sl\t1.4\t[sl]\tsloWNet\n   omw-ja\t1.4\t[ja]\tJapanese Wordnet\n   ...\n\n.. option:: -l LG, --lang LG\n.. option:: --lexicon SPEC\n\n   The ``--lang`` or ``--lexicon`` option can help you narrow down\n   the results:\n\n   .. code-block:: console\n\n      $ python -m wn lexicons --lang en\n      oewn\t2021\t[en]\tOpen English WordNet\n      omw-en\t1.4\t[en]\tOMW English Wordnet based on WordNet 3.0\n      $ python -m wn lexicons --lexicon \"omw-*\"\n      omw-en\t1.4\t[en]\tOMW English Wordnet based on WordNet 3.0\n      omw-sk\t1.4\t[sk]\tSlovak WordNet\n      omw-pl\t1.4\t[pl]\tplWordNet\n      omw-is\t1.4\t[is]\tIceWordNet\n      omw-zsm\t1.4\t[zsm]\tWordnet Bahasa (Malaysian)\n\n\nprojects\n--------\n\nThe ``projects`` subcommand lists all known projects in Wn's\nindex. This is helpful to see what is available for downloading.\n\n.. code-block::\n\n   $ python -m wn projects\n   ic      cili    1.0     [---]   Collaborative Interlingual Index\n   ic      oewn    2025+   [en]    Open English WordNet\n   ic      oewn    2025    [en]    Open English WordNet\n   ic      oewn    2024    [en]    Open English WordNet\n   ic      oewn    2023    [en]    Open English WordNet\n   ic      oewn    2022    [en]    Open English WordNet\n   ic      oewn    2021    [en]    Open English WordNet\n   ic      ewn     2020    [en]    Open English WordNet\n   ic      ewn     2019    [en]    Open English WordNet\n   ic      odenet  1.4     [de]    Open German WordNet\n   i-      odenet  1.3     [de]    Open German WordNet\n   ic      omw     2.0     [mul]   Open Multilingual Wordnet\n   ic      omw     1.4     [mul]   Open Multilingual Wordnet\n   ...\n\n\nvalidate\n--------\n\nGiven a path to a WN-LMF XML file, check the file for structural\nproblems and print a report.\n\n.. code-block::\n\n   $ python -m wn validate english-wordnet-2021.xml\n\n.. option:: --select CHECKS\n\n   Run the checks with the given comma-separated list of check codes\n   or categories.\n\n   .. code-block::\n\n      $ python -m wn validate --select E,W201,W204 deWordNet.xml\n\n.. option:: --output-file FILE\n\n   Write the report to FILE as a JSON object instead of printing the\n   report to stdout.\n"
  },
  {
    "path": "docs/conf.py",
    "content": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common options. For a full\n# list see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\n\n# -- Path setup --------------------------------------------------------------\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#\n# import os\n# import sys\n# sys.path.insert(0, os.path.abspath('.'))\n\n\n# -- Project information -----------------------------------------------------\n\nimport wn\n\nproject = \"wn\"\ncopyright = \"2020, Michael Wayne Goodman\"\nauthor = \"Michael Wayne Goodman\"\n\n# The short X.Y version\nversion = \".\".join(wn.__version__.split(\".\")[:2])\n# The full version, including alpha/beta/rc tags\nrelease = wn.__version__\n\n# -- General configuration ---------------------------------------------------\n\n# Add any Sphinx extension module names here, as strings. They can be\n# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom\n# ones.\nextensions = [\n    \"sphinx.ext.autodoc\",\n    \"sphinx.ext.intersphinx\",\n    \"sphinx.ext.coverage\",\n    # 'sphinx.ext.viewcode',\n    \"sphinx.ext.githubpages\",\n    \"sphinx.ext.napoleon\",\n    \"sphinx_copybutton\",\n]\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = [\"_templates\"]\n\n# List of patterns, relative to source directory, that match files and\n# directories to ignore when looking for source files.\n# This pattern also affects html_static_path and html_extra_path.\nexclude_patterns = [\"_build\", \"Thumbs.db\", \".DS_Store\"]\n\n# Global definitions\nrst_prolog = \"\"\"\n.. role:: python(code)\n   :language: python\n   :class: highlight\n\"\"\"\n\n# smartquotes = False\nsmartquotes_action = \"De\"  # D = en- and em-dash; e = ellipsis\n\n# -- Options for HTML output -------------------------------------------------\n\n# The theme to use for HTML and HTML Help pages.  See the documentation for\n# a list of builtin themes.#\n\nhtml_theme = \"furo\"\nhtml_theme_options = {\n    \"light_css_variables\": {\n        \"color-brand-primary\": \"#006699\",\n        \"color-brand-content\": \"#006699\",\n        # \"color-background\": \"#f0f0f0\",\n        # \"color-sidebar-background\": \"#ddd\",\n    },\n    \"dark_css_variables\": {\n        \"color-brand-primary\": \"#00CCFF\",\n        \"color-brand-content\": \"#00CCFF\",\n    },\n}\n\nhtml_logo = \"_static/wn-logo.svg\"\n\npygments_style = \"manni\"\npygments_dark_style = \"monokai\"\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\nhtml_static_path = [\"_static\"]\nhtml_css_files = [\n    \"css/svg.css\",\n]\n\n# Don't offer to show the source of the current page\nhtml_show_sourcelink = False\n\n# -- Options for autodoc extension -------------------------------------------\n\n# autodoc_typehints = 'description'\nautodoc_typehints = \"signature\"\n# autodoc_typehints = 'none'\n\n# -- Options for intersphinx extension ---------------------------------------\n\n# Example configuration for intersphinx: refer to the Python standard library.\nintersphinx_mapping = {\n    \"python\": (\"https://docs.python.org/3\", None),\n    \"httpx\": (\"https://httpx.readthedocs.io/en/latest/\", None),\n}\n\n# -- Options for sphinx_copybutton extension ---------------------------------\n\ncopybutton_prompt_text = (\n    r\">>> \"  # regular Python prompt\n    r\"|\\.\\.\\. \"  # Python continuation prompt\n    r\"|\\$ \"  # Basic shell\n    r\"|In \\[\\d*\\]: \"  # Jupyter notebook\n)\ncopybutton_prompt_is_regexp = True\n"
  },
  {
    "path": "docs/docutils.conf",
    "content": "[restructuredtext parser]\nsyntax_highlight = short\n\n"
  },
  {
    "path": "docs/faq.rst",
    "content": "FAQ\n===\n\nIs Wn related to the NLTK's `nltk.corpus.wordnet` module?\n---------------------------------------------------------\n\nOnly in spirit. There was an effort to develop the `NLTK`_\\ 's module as\na standalone package (see https://github.com/nltk/wordnet/), but\ndevelopment had slowed. Wn has the same broad goals and a similar API\nas that standalone package, but fundamental architectural differences\ndemanded a complete rewrite, so Wn was created as a separate\nproject. With approval from the other package's maintainer, Wn\nacquired the `wn <https://pypi.org/project/wn>`_ project on PyPI and\ncan be seen as its successor.\n\nIs Wn compatible with the NLTK's module?\n----------------------------------------\n\nThe API is intentionally similar, but not exactly the same (for\ninstance see the next question), and there are differences in the ways\nthat results are retrieved, particularly for non-English wordnets. See\n:doc:`guides/nltk-migration` for more information. Also see\n:ref:`princeton-wordnet`.\n\nWhere are the ``Lemma`` objects? What are ``Word`` and ``Sense`` objects?\n-------------------------------------------------------------------------\n\nUnlike the original `WNDB`_ data format of the original WordNet, the\n`WN-LMF`_ XML format grants words (called *lexical entries* in WN-LMF\nand a :class:`~wn.Word` object in Wn) and word senses\n(:class:`~wn.Sense` in Wn) explicit, first-class status alongside\nsynsets.  While senses are essentially links between words and\nsynsets, they may contain metadata and be the source or target of\nsense relations, so in some ways they are more like nodes than edges\nwhen the wordnet is viewed as a graph. The `NLTK`_\\ 's module, using\nthe WNDB format, combines the information of a word and a sense into a\nsingle object called a ``Lemmas``. Wn also has an unrelated concept\ncalled a :meth:`~wn.Word.lemma`, but it is merely the canonical form\nof a word.\n\n.. _princeton-wordnet:\n\nWhere is the Princeton WordNet data?\n------------------------------------\n\nThe original English wordnet, named simply *WordNet* but often\nreferred to as the *Princeton WordNet* to better distinguish it from\nother projects, is specifically the data distributed by Princeton in\nthe `WNDB`_ format. The `Open Multilingual Wordnet <OMW_>`_ (OMW)\npackages an export of the WordNet data as the *OMW English Wordnet\nbased on WordNet 3.0* which is used by Wn (with the lexicon ID\n``omw-en``). It also has a similar export for WordNets 1.5, 1.6, 1.7,\n1.7.1, 2.0, 2.1, and 3.1 data (``omw-en15``, ``omw-en16``, ``omw-en17``,\n``omw-en171``, ``omw-en20``, ``omw-en21``, and ``omw-en31``,\nrespectively). All of these are highly compatible with the original\ndata and can be used as drop-in replacements.\n\nPrior to Wn version 0.9 (and, correspondingly, prior to the `OMW\ndata`_ version 1.4), the ``pwn:3.0`` and ``pwn:3.1`` English wordnets\ndistributed by OMW were incorrectly called the *Princeton WordNet*\n(for WordNet 3.0 and 3.1, respectively). From Wn version 0.9 (and from\nversion 1.4 of the OMW data), these are called the *OMW English\nWordnet based on WordNet 3.0/3.1* (``omw-en:1.4`` and\n``omw-en31:1.4``, respectively). These lexicons are intentionally\ncompatible with the original WordNet data, and the 1.4 versions are\neven more compatible than the previous ``pwn:3.0`` and ``pwn:3.1``\nlexicons, so it is strongly recommended to use them over the previous\nversions. Similarly, the 2.0 version of OMW is more compatible yet.\nThe data corresponding to WordNet versions 1.5 through 2.1 are only\navailable from OMW 2.0.\n\n.. _OMW data: https://github.com/omwn/omw-data\n\nWhy don't all wordnets share the same synsets?\n----------------------------------------------\n\nThe `Open Multilingual Wordnet <OMW_>`_ (OMW) contains wordnets for\nmany languages created using the *expand* methodology [VOSSEN1998]_,\nwhere non-English wordnets provide words on top of the English\nwordnet's synset structure. This allows new wordnets to be built in\nmuch less time than starting from scratch, but with a few drawbacks,\nsuch as that words cannot be added if they do not have a synset in the\nEnglish wordnet, and that it is difficult to version the wordnets\nindependently (e.g., for reproducibility of experiments involving\nwordnet data) as all are interconnected. Wn, therefore, creates new\nsynsets for each wordnet added to its database, and synsets then\nspecify which resource they belong to. Queries can specify which\nresources may be examined. Also see :doc:`guides/interlingual`.\n\nWhy does Wn's database get so big?\n----------------------------------\n\nThe *OMW English Wordnet based on WordNet 3.0* takes about 114 MiB of\ndisk space in Wn's database, which is only about 8 MiB more than it\ntakes as a `WN-LMF`_ XML file. The `NLTK`_, however, uses the obsolete\n`WNDB`_ format which is more compact, requiring only 35 MiB of disk\nspace. The difference with the Open Multilingual Wordnet 1.4 is more\nstriking: it takes about 659 MiB of disk space in the database, but\nonly 49 MiB in the NLTK. Part of the difference here is that the OMW\nfiles in the NLTK are simple tab-separated-value files listing only\nthe words added to each synset for each language. In addition, Wn\ncreates new synsets for each wordnet added (see the previous\nquestion). One more reason is that Wn creates various indexes in the\ndatabase for efficient lookup.\n\n.. _NLTK: https://www.nltk.org/\n.. _OMW: http://github.com/omwn\n.. [VOSSEN1998] Piek Vossen. 1998. *Introduction to EuroWordNet.* Computers and the Humanities, 32(2): 73--89.\n.. _Open English Wordnet 2021: https://en-word.net/\n.. _WNDB: https://wordnet.princeton.edu/documentation/wndb5wn\n.. _WN-LMF: https://globalwordnet.github.io/schemas/\n"
  },
  {
    "path": "docs/guides/basic.rst",
    "content": "Basic Usage\n===========\n\n.. seealso::\n\n   This document covers the basics of querying wordnets, filtering\n   results, and performing secondary queries on the results. For\n   adding, removing, or inspecting lexicons, see :doc:`lexicons`. For\n   more information about interlingual queries, see\n   :doc:`interlingual`.\n\nFor the most basic queries, Wn provides several module functions for\nretrieving words, senses, and synsets:\n\n>>> import wn\n>>> wn.words('pike')\n[Word('ewn-pike-n')]\n>>> wn.senses('pike')\n[Sense('ewn-pike-n-03311555-04'), Sense('ewn-pike-n-07795351-01'), Sense('ewn-pike-n-03941974-01'), Sense('ewn-pike-n-03941726-01'), Sense('ewn-pike-n-02563739-01')]\n>>> wn.synsets('pike')\n[Synset('ewn-03311555-n'), Synset('ewn-07795351-n'), Synset('ewn-03941974-n'), Synset('ewn-03941726-n'), Synset('ewn-02563739-n')]\n\nOnce you start working with multiple wordnets, these simple queries\nmay return more than desired:\n\n>>> wn.words('pike')\n[Word('ewn-pike-n'), Word('wnja-n-66614')]\n>>> wn.words('chat')\n[Word('ewn-chat-n'), Word('ewn-chat-v'), Word('frawn-lex14803'), Word('frawn-lex21897')]\n\nYou can specify which language or lexicon you wish to query:\n\n>>> wn.words('pike', lang='ja')\n[Word('wnja-n-66614')]\n>>> wn.words('chat', lexicon='frawn')\n[Word('frawn-lex14803'), Word('frawn-lex21897')]\n\nBut it might be easier to create a :class:`~wn.Wordnet` object and use\nit for queries:\n\n>>> wnja = wn.Wordnet(lang='ja')\n>>> wnja.words('pike')\n[Word('wnja-n-66614')]\n>>> frawn = wn.Wordnet(lexicon='frawn')\n>>> frawn.words('chat')\n[Word('frawn-lex14803'), Word('frawn-lex21897')]\n\nIn fact, the simple queries above implicitly create such a\n:class:`~wn.Wordnet` object, but one that includes all installed\nlexicons.\n\n\n.. _primary-queries:\n\nPrimary Queries\n---------------\n\nThe queries shown above are \"primary\" queries, meaning they are the\nfirst step in a user's interaction with a wordnet. Operations\nperformed on the resulting objects are then `secondary\nqueries`_. Primary queries optionally take several fields for\nfiltering the results, namely the word form and part of\nspeech. Synsets may also be filtered by an interlingual index (ILI).\n\nSearching for Words\n'''''''''''''''''''\n\nThe :func:`wn.words()` function returns a list of :class:`~wn.Word`\nobjects that match the given word form or part of speech:\n\n>>> wn.words('pencil')\n[Word('ewn-pencil-n'), Word('ewn-pencil-v')]\n>>> wn.words('pencil', pos='v')\n[Word('ewn-pencil-v')]\n\nCalling the function without a word form will return all words in the\ndatabase:\n\n>>> len(wn.words())\n311711\n>>> len(wn.words(pos='v'))\n29419\n>>> len(wn.words(pos='v', lexicon='ewn'))\n11595\n\nIf you know the word identifier used by a lexicon, you can retrieve\nthe word directly with the :func:`wn.word()` function. Identifiers are\nguaranteed to be unique within a single lexicon, but not across\nlexicons, so it's best to call this function from an instantiated\n:class:`~wn.Wordnet` object or with the ``lexicon`` parameter\nspecified. If multiple words are found when querying multiple\nlexicons, only the first is returned.\n\n>>> wn.word('ewn-pencil-n', lexicon='ewn')\nWord('ewn-pencil-n')\n\n\nSearching for Senses\n''''''''''''''''''''\n\nThe :func:`wn.senses()` and :func:`wn.sense()` functions behave\nsimilarly to :func:`wn.words()` and :func:`wn.word()`, except that\nthey return matching :class:`~wn.Sense` objects.\n\n>>> wn.senses('plow', pos='n')\n[Sense('ewn-plow-n-03973894-01')]\n>>> wn.sense('ewn-plow-v-01745745-01')\nSense('ewn-plow-v-01745745-01')\n\nSenses represent a relationship between a :class:`~wn.Word` and a\n:class:`~wn.Synset`. Seen as an edge between nodes, senses are often\ngiven less prominence than words or synsets, but they are the natural\nlocus of several interesting features such as sense relations (e.g.,\nfor derived words) and the natural level of representation for\ntranslations to other languages.\n\nSearching for Synsets\n'''''''''''''''''''''\n\nThe :func:`wn.synsets()` and :func:`wn.synset()` functions are like\nthose above but allow the ``ili`` parameter for filtering by\ninterlingual index, which is useful in interlingual queries:\n\n>>> wn.synsets('scepter')\n[Synset('ewn-14467142-n'), Synset('ewn-07282278-n')]\n>>> wn.synset('ewn-07282278-n').ili\n'i74874'\n>>> wn.synsets(ili='i74874')\n[Synset('ewn-07282278-n'), Synset('wnja-07267573-n'), Synset('frawn-07267573-n')]\n\n\nSecondary Queries\n-----------------\n\nOnce you have gotten some results from a primary query, you can\nperform operations on the :class:`~wn.Word`, :class:`~wn.Sense`, or\n:class:`~wn.Synset` objects to get at further information in the\nwordnet.\n\nExploring Words\n'''''''''''''''\n\nHere are some of the things you can do with :class:`~wn.Word` objects:\n\n>>> w = wn.words('goose')[0]\n>>> w.pos  # part of speech\n'n'\n>>> w.forms()  # other word forms (e.g., irregular inflections)\n['goose', 'geese']\n>>> w.lemma()  # canonical form\n'goose'\n>>> w.derived_words()\n[Word('ewn-gosling-n'), Word('ewn-goosy-s'), Word('ewn-goosey-s')]\n>>> w.senses()\n[Sense('ewn-goose-n-01858313-01'), Sense('ewn-goose-n-10177319-06'), Sense('ewn-goose-n-07662430-01')]\n>>> w.synsets()\n[Synset('ewn-01858313-n'), Synset('ewn-10177319-n'), Synset('ewn-07662430-n')]\n\nSince translations of a word into another language depend on the sense\nused, :meth:`Word.translate <wn.Word.translate>` returns a dictionary\nmapping each sense to words in the target language:\n\n>>> for sense, ja_words in w.translate(lang='ja').items():\n...     print(sense, ja_words)\n... \nSense('ewn-goose-n-01858313-01') [Word('wnja-n-1254'), Word('wnja-n-33090'), Word('wnja-n-38995')]\nSense('ewn-goose-n-10177319-06') []\nSense('ewn-goose-n-07662430-01') [Word('wnja-n-1254')]\n\n\nExploring Senses\n''''''''''''''''\n\nCompared to :class:`~wn.Word` and :class:`~wn.Synset` objects, there\nare relatively few operations available on :class:`~wn.Sense`\nobjects. Sense relations and translations, however, are important\noperations on senses.\n\n>>> s = wn.senses('dark', pos='n')[0]\n>>> s.word()    # each sense links to a single word\nWord('ewn-dark-n')\n>>> s.synset()  # each sense links to a single synset\nSynset('ewn-14007000-n')\n>>> s.get_related('antonym')\n[Sense('ewn-light-n-14006789-01')]\n>>> s.get_related('derivation')\n[Sense('ewn-dark-a-00273948-01')]\n>>> s.translate(lang='fr')  # translation returns a list of senses\n[Sense('frawn-lex52992--13983515-n')]\n>>> s.translate(lang='fr')[0].word().lemma()\n'obscurité'\n\n\nExploring Synsets\n'''''''''''''''''\n\nMany of the operations people care about happen on synsets, such as\nhierarchical relations and metrics.\n\n>>> ss = wn.synsets('hound', pos='n')[0]\n>>> ss.senses()\n[Sense('ewn-hound-n-02090203-01'), Sense('ewn-hound_dog-n-02090203-02')]\n>>> ss.words()\n[Word('ewn-hound-n'), Word('ewn-hound_dog-n')]\n>>> ss.lemmas()\n['hound', 'hound dog']\n>>> ss.definition()\n'any of several breeds of dog used for hunting typically having large drooping ears'\n>>> ss.hypernyms()\n[Synset('ewn-02089774-n')]\n>>> ss.hypernyms()[0].lemmas()\n['hunting dog']\n>>> len(ss.hyponyms())\n20\n>>> ss.hyponyms()[0].lemmas()\n['Afghan', 'Afghan hound']\n>>> ss.max_depth()\n15\n>>> ss.shortest_path(wn.synsets('dog', pos='n')[0])\n[Synset('ewn-02090203-n'), Synset('ewn-02089774-n'), Synset('ewn-02086723-n')]\n>>> ss.translate(lang='fr')  # translation returns a list of synsets\n[Synset('frawn-02087551-n')]\n>>> ss.translate(lang='fr')[0].lemmas()\n['chien', 'chien de chasse']\n\n\nFiltering by Language\n---------------------\n\nThe ``lang`` parameter of :func:`wn.words()`, :func:`wn.senses()`,\n:func:`wn.synsets()`, and :class:`~wn.Wordnet` allows a single `BCP 47\n<https://en.wikipedia.org/wiki/IETF_language_tag>`_ language\ncode. When this parameter is used, only entries in the specified\nlanguage will be returned.\n\n>>> import wn\n>>> wn.words('chat')\n[Word('ewn-chat-n'), Word('ewn-chat-v'), Word('frawn-lex14803'), Word('frawn-lex21897')]\n>>> wn.words('chat', lang='fr')\n[Word('frawn-lex14803'), Word('frawn-lex21897')]\n\nIf a language code not used by any lexicon is specified, a\n:exc:`wn.Error` is raised.\n\n\nFiltering by Lexicon\n--------------------\n\nThe ``lexicon`` parameter of :func:`wn.words()`, :func:`wn.senses()`,\n:func:`wn.synsets()`, and :class:`~wn.Wordnet` take a string of\nspace-delimited :ref:`lexicon specifiers\n<lexicon-specifiers>`. Entries in a lexicon whose ID matches one of\nthe lexicon specifiers will be returned. For these, the following\nrules are used:\n\n- A full ``id:version`` string (e.g., ``ewn:2020``) selects a specific\n  lexicon\n- Only a lexicon ``id`` (e.g., ``ewn``) selects the most recently\n  added lexicon with that ID\n- A star ``*`` may be used to match any lexicon; a star may not\n  include a version\n\n>>> wn.words('chat', lexicon='ewn:2020')\n[Word('ewn-chat-n'), Word('ewn-chat-v')]\n>>> wn.words('chat', lexicon='wnja')\n[]\n>>> wn.words('chat', lexicon='wnja frawn')\n[Word('frawn-lex14803'), Word('frawn-lex21897')]\n"
  },
  {
    "path": "docs/guides/interlingual.rst",
    "content": "Interlingual Queries\n====================\n\nThis guide explains how interlingual queries work within Wn.  To get\nstarted, you'll need at least two lexicons that use interlingual\nindices (ILIs).  For this guide, we'll use the Open English WordNet\n(``oewn:2024``), the Open German WordNet (``odenet:1.4``), also\nknown as OdeNet, and the Japanese wordnet (``omw-ja:1.4``).\n\n  >>> import wn\n  >>> wn.download('oewn:2024')\n  >>> wn.download('odenet:1.4')\n  >>> wn.download('omw-ja:1.4')\n\nWe will query these wordnets with the following :class:`~wn.Wordnet`\nobjects:\n\n  >>> en = wn.Wordnet('oewn:2024')\n  >>> de = wn.Wordnet('odenet:1.4')\n\nThe object for the Japanese wordnet will be discussed and created\nbelow, in :ref:`cross-lingual-relation-traversal`.\n\nWhat are Interlingual Indices?\n------------------------------\n\nIt is common for users of the `Princeton WordNet\n<https://wordnet.princeton.edu/>`_ to refer to synsets by their `WNDB\n<https://wordnet.princeton.edu/documentation/wndb5wn>`_ offset and type,\nbut this is problematic because the offset is a byte-offset in the\nwordnet data files and it will differ for wordnets in other languages\nand even between versions of the same wordnet. Interlingual indices\n(ILIs) address this issue by providing stable identifiers for concepts,\nwhether for a synset across versions of a wordnet or across languages.\n\nThe idea of ILIs was proposed by [Vossen99]_ and it came to fruition\nwith the release of the Collaborative Interlingual Index (CILI;\n[Bond16]_).  CILI therefore represents an instance of, and a namespace\nfor, ILIs. There could, in theory, be alternative indexes for\nparticular domains (e.g., names of people or places), but currently\nthere is only the one.\n\nAs an example, the synset for *apricot* (fruit) in WordNet 3.0 is\n``07750872-n``, but it is ``07766848-n`` in WordNet 3.1. In OdeNet\n1.4, which is not released in the WNDB format and therefore doesn't\nuse offsets at all, it is ``13235-n`` for the equivalent word\n(*Aprikose*). However, all three use the same ILI: ``i77784``.\n\nGenerally, only one synset within a wordnet will be mapped to a\nparticular ILI, but this may not always be true, nor does every synset\nnecessarily map to an ILI. Some concepts that are lexicalized in one\nlanguage may not be in another language. For example, *rice* in English\nmay refer to the rice plant, rice grain, or cooked rice, but in\nlanguages like Japanese they are distinct things (稲 *ine*, 米 *kome*,\nand 飯 *meshi* / ご飯 *gohan*, respectively).\n\nThe ``ili`` property of Synsets serves two purposes in Wn. Mainly it is\nfor encoding the ILI identifier associated with the synset, but it is\nalso used to indicate when a lexicon is proposing a new concept that is\nnot yet part of CILI. In the latter case, a WN-LMF lexicon file will\nhave the special value of ``in`` for a synset's ILI and it will provide\nan ``<ILIDefinition>`` element. In Wn, this translates to\n:attr:`wn.Synset.ili` returning :python:`None`, the same as if no ILI\nwere mapped at all. Both synsets with proposed ILIs and those with no\nILI cannot be used in interlingual queries. Proposed ILIs can be\ninspected using the :mod:`wn.ili.get_proposed` function, if you know\nhave the synset, or :mod:`wn.ili.get_all_proposed` to get all of them.\n\n\n.. [Vossen99]\n   Vossen, Piek, Wim Peters, and Julio Gonzalo.\n   \"Towards a universal index of meaning.\"\n   In Proceedings of ACL-99 workshop, Siglex-99, standardizing lexical resources, pp. 81-90.\n   University of Maryland, 1999.\n\n.. [Bond16]\n   Bond, Francis, Piek Vossen, John Philip McCrae, and Christiane Fellbaum.\n   \"CILI: the Collaborative Interlingual Index.\"\n   In Proceedings of the 8th Global WordNet Conference (GWC), pp. 50-57. 2016.\n\nUsing Interlingual Indices\n--------------------------\n\nFor synsets that have an associated ILI, you can retrieve it via the\n:data:`wn.Synset.ili` property:\n\n  >>> apricot = en.synsets('apricot')[1]\n  >>> apricot.ili\n  'i77784'\n\nThe value is a :class:`str` ILI identifier. These may be used directly\nfor things like interlingual synset lookups:\n\n  >>> de.synsets(ili=apricot.ili)[0].lemmas()\n  ['Marille', 'Aprikose']\n\nThere may be more information about the ILI itself which you can get\nfrom the :mod:`wn.ili` module:\n\n  >>> from wn import ili\n  >>> apricot_ili = ili.get(apricot.ili)\n  >>> apricot_ili\n  ILI(id='i77784')\n\nFrom this object you can get various properties of the ILI, such as\nthe ID string, its status, and its definition, but if you have\nnot added CILI to Wn's database, it will not be very informative:\n\n  >>> apricot_ili.id\n  'i77784'\n  >>> apricot_ili.status\n  'presupposed'\n  >>> apricot_ili.definition() is None\n  True\n\nThe ``presupposed`` status means that the ILI ID is in use by a\nlexicon, but there is no other source of truth for the index. CILI can\nbe downloaded just like a lexicon:\n\n  >>> wn.download('cili:1.0')\n\nNow the status and definition should be more useful:\n\n  >>> apricot_ili.status\n  'active'\n  >>> apricot_ili.definition()\n  'downy yellow to rosy-colored fruit resembling a small peach'\n\n\nTranslating Words, Senses, and Synsets\n--------------------------------------\n\nRather than manually inserting the ILI IDs into Wn's lookup functions\nas shown above, Wn provides the :meth:`wn.Synset.translate` method to\nmake it easier:\n\n  >>> apricot.translate(lexicon='odenet:1.4')\n  [Synset('odenet-13235-n')]\n\nThe method returns a list for two reasons: first, it's not guaranteed\nthat the target lexicon has only one synset with the ILI and, second,\nyou can translate to more than one lexicon at a time.\n\n:class:`~wn.Sense` objects also have a :meth:`~wn.Sense.translate`\nmethod, returning a list of senses instead of synsets:\n\n  >>> de_senses = apricot.senses()[0].translate(lexicon='odenet:1.4')\n  >>> [s.word().lemma() for s in de_senses]\n  ['Marille', 'Aprikose']\n\n:class:`~wn.Word` have a :meth:`~wn.Word.translate` method, too, but\nit works a bit differently. Since each word may be part of multiple\nsynsets, the method returns a mapping of each word sense to the list\nof translated words:\n\n  >>> result = en.words('apricot')[0].translate(lexicon='odenet:1.4')\n  >>> for sense, de_words in result.items():\n  ...     print(sense, [w.lemma() for w in de_words])\n  ... \n  Sense('oewn-apricot__1.20.00..') []\n  Sense('oewn-apricot__1.13.00..') ['Marille', 'Aprikose']\n  Sense('oewn-apricot__1.07.00..') ['lachsrosa', 'lachsfarbig', 'in Lachs', 'lachsfarben', 'lachsrot', 'lachs']\n\nThe three senses above are for *apricot* as a tree, a fruit, and a\ncolor. OdeNet does not have a synset for apricot trees, or it has one\nnot associated with the appropriate ILI, and therefore it could not\ntranslate any words for that sense.\n\n\n.. _cross-lingual-relation-traversal:\n\nCross-lingual Relation Traversal\n--------------------------------\n\nILIs have a second use in Wn, which is relation traversal for wordnets\nthat depend on other lexicons, i.e., those created with the *expand*\nmethodology. These wordnets, such as many of those in the `Open\nMultilingual Wordnet <https://github.com/omwn/>`_, do not include\nsynset relations on their own as they were built using the English\nWordNet as their taxonomic scaffolding. Trying to load such a lexicon\nwhen the lexicon it requires is not added to the database presents a\nwarning to the user:\n\n  >>> ja = wn.Wordnet('omw-ja:1.4')\n  [...] WnWarning: lexicon dependencies not available: omw-en:1.4\n  >>> ja.expanded_lexicons()\n  []\n\n.. warning::\n\n   Do not rely on the presence of a warning to determine if the\n   lexicon has its expand lexicon loaded. Python's default warning\n   filter may only show the warning the first time it is\n   encountered. Instead, inspect :meth:`wn.Wordnet.expanded_lexicons`\n   to see if it is non-empty.\n\nWhen a dependency is unmet, Wn only issues a warning, not an error,\nand you can continue to use the lexicon as it is, but it won't be\nuseful for exploring relations such as hypernyms and hyponyms:\n\n  >>> anzu = ja.synsets(ili='i77784')[0]\n  >>> anzu.lemmas()\n  ['アンズ', 'アプリコット', '杏']\n  >>> anzu.hypernyms()\n  []\n\nOne way to resolve this issue is to install the lexicon it requires:\n\n  >>> wn.download('omw-en:1.4')\n  >>> ja = wn.Wordnet('omw-ja:1.4')  # no warning\n  >>> ja.expanded_lexicons()\n  [<Lexicon omw-en:1.4 [en]>]\n\nWn will detect the dependency and load ``omw-en:1.4`` as the *expand*\nlexicon for ``omw-ja:1.4`` when the former is in the database. You may\nalso specify an expand lexicon manually, even one that isn't the\nspecified dependency:\n\n  >>> ja = wn.Wordnet('omw-ja:1.4', expand='oewn:2024')  # no warning\n  >>> ja.expanded_lexicons()\n  [<Lexicon oewn:2024 [en]>]\n\nIn this case, the Open English WordNet is an actively-developed fork\nof the lexicon that ``omw-ja:1.4`` depends on, and it should contain\nall the relations, so you'll see little difference between using it\nand ``omw-en:1.4``. This works because the relations are found using\nILIs and not synset offsets. You may still prefer to use the specified\ndependency if you have strict compatibility needs, such as for\nexperiment reproducibility and/or compatibility with the `NLTK\n<https://nltk.org>`_. Using some other lexicon as the expand lexicon\nmay yield very different results. For instance, ``odenet:1.4`` is much\nsmaller than the English wordnets and has fewer relations, so it would\nnot be a good substitute for ``omw-ja:1.4``'s expand lexicon.\n\nWhen an appropriate expand lexicon is loaded, relations between\nsynsets, such as hypernyms, are more likely to be present:\n\n  >>> anzu = ja.synsets(ili='i77784')[0]  # recreate the synset object\n  >>> anzu.hypernyms()\n  [Synset('omw-ja-07705931-n')]\n  >>> anzu.hypernyms()[0].lemmas()\n  ['果物']\n  >>> anzu.hypernyms()[0].translate(lexicon='oewn:2024')[0].lemmas()\n  ['edible fruit']\n"
  },
  {
    "path": "docs/guides/lemmatization.rst",
    "content": "\nLemmatization and Normalization\n===============================\n\nWn provides two methods for expanding queries: lemmatization_ and\nnormalization_\\ . Wn also has a setting that allows `alternative forms\n<alternative-forms_>`_ stored in the database to be included in\nqueries.\n\n.. seealso::\n\n   The :mod:`wn.morphy` module is a basic English lemmatizer included\n   with Wn.\n\n.. _lemmatization:\n\nLemmatization\n-------------\n\nWhen querying a wordnet with wordforms from natural language text, it\nis important to be able to find entries for inflected forms as the\ndatabase generally contains only lemmatic forms, or *lemmas* (or\n*lemmata*, if you prefer irregular plurals).\n\n>>> import wn\n>>> en = wn.Wordnet('oewn:2021')\n>>> en.words('plurals')\n[]\n>>> en.words('plural')\n[Word('oewn-plural-a'), Word('oewn-plural-n')]\n\nLemmas are sometimes called *citation forms* or *dictionary forms* as\nthey are often used as the head words in dictionary entries. In\nNatural Language Processing (NLP), *lemmatization* is a technique\nwhere a possibly inflected word form is transformed to yield a\nlemma. In Wn, this concept is generalized somewhat to mean a\ntransformation that yields a form matching wordforms stored in the\ndatabase. For example, the English word *sparrows* is the plural\ninflection of *sparrow*, while the word *leaves* is ambiguous between\nthe plural inflection of the nouns *leaf* and *leave* and the\n3rd-person singular inflection of the verb *leave*.\n\nFor tasks where high-accuracy is needed, wrapping the wordnet queries\nwith external tools that handle tokenization, lemmatization, and\npart-of-speech tagging will likely yield the best results as this\nmethod can make use of word context. That is, something like this:\n\n.. code-block:: python\n\n   for lemma, pos in fancy_shmancy_analysis(corpus):\n       synsets = w.synsets(lemma, pos=pos)\n\nFor modest needs, however, Wn provides a way to integrate basic\nlemmatization directly into the queries.\n\nLemmatization in Wn works as follows: if a :class:`wn.Wordnet` object\nis instantiated with a *lemmatizer* argument, then queries involving\nwordforms (e.g., :meth:`wn.Wordnet.words`, :meth:`wn.Wordnet.senses`,\n:meth:`wn.Wordnet.synsets`) will first lemmatize the wordform and then\ncheck all resulting wordforms and parts of speech against the\ndatabase as successive queries.\n\nLemmatization Functions\n'''''''''''''''''''''''\n\nThe *lemmatizer* argument of :class:`wn.Wordnet` is a callable that\ntakes two string arguments: (1) the original wordform, and (2) a\npart-of-speech or :python:`None`. It returns a dictionary mapping\nparts-of-speech to sets of lemmatized wordforms. The signature is as\nfollows:\n\n.. code-block:: python\n\n   lemmatizer(s: str, pos: str | None) -> Dict[str | None, Set[str]]\n\nThe part-of-speech may be used by the function to determine which\nmorphological rules to apply. If the given part-of-speech is\n:python:`None`, then it is not specified and any rule may apply. A\nlemmatizer that only deinflects should not change any specified\npart-of-speech, but this is not a requirement, and a function could be\nprovided that undoes derivational morphology (e.g., *democratic* →\n*democracy*).\n\nQuerying With Lemmatization\n'''''''''''''''''''''''''''\n\nAs the needs of lemmatization differs from one language to another, Wn\ndoes not provide a lemmatizer by default, and therefore it is\nunavailable to the convenience functions :func:`wn.words`,\n:func:`wn.senses`, and :func:`wn.synsets`. A lemmatizer can be added\nto a :class:`wn.Wordnet` object. For example, using :mod:`wn.morphy`:\n\n>>> import wn\n>>> from wn.morphy import Morphy\n>>> en = wn.Wordnet('oewn:2021', lemmatizer=Morphy())\n>>> en.words('sparrows')\n[Word('oewn-sparrow-n')]\n>>> en.words('leaves')\n[Word('oewn-leave-v'), Word('oewn-leaf-n'), Word('oewn-leave-n')]\n\nQuerying Without Lemmatization\n''''''''''''''''''''''''''''''\n\nWhen lemmatization is not used, inflected terms may not return any\nresults:\n\n>>> en = wn.Wordnet('oewn:2021')\n>>> en.words('sparrows')\n[]\n\nDepending on the lexicon, there may be situations where results are\nreturned for inflected lemmas, such as when the inflected form is\nlexicalized as its own entry:\n\n>>> en.words('glasses')\n[Word('oewn-glasses-n')]\n\nOr if the lexicon lists the inflected form as an alternative form. For\nexample, the English Wordnet lists irregular inflections as\nalternative forms:\n\n>>> en.words('lemmata')\n[Word('oewn-lemma-n')]\n\nSee below for excluding alternative forms from such queries.\n\n.. _alternative-forms:\n\nAlternative Forms in the Database\n---------------------------------\n\nA lexicon may include alternative forms in addition to lemmas for each\nword, and by default these are included in queries. What exactly is\nincluded as an alternative form depends on the lexicon. The English\nWordnet, for example, adds irregular inflections (or \"exceptional\nforms\"), while the Japanese Wordnet includes the same word in multiple\northographies (original, hiragana, katakana, and two romanizations).\nFor the English Wordnet, this means that you might get basic\nlemmatization for irregular forms only:\n\n>>> en = wn.Wordnet('oewn:2021')\n>>> en.words('learnt', pos='v')\n[Word('oewn-learn-v')]\n>>> en.words('learned', pos='v')\n[]\n\nIf this is undesirable, the alternative forms can be excluded from\nqueries with the *search_all_forms* parameter:\n\n>>> en = wn.Wordnet('oewn:2021', search_all_forms=False)\n>>> en.words('learnt', pos='v')\n[]\n>>> en.words('learned', pos='v')\n[]\n\n.. _normalization:\n\nNormalization\n-------------\n\nWhile lemmatization deals with morphological variants of words,\nnormalization handles minor orthographic variants. Normalized forms,\nhowever, may be invalid as wordforms in the target language, and as\nsuch they are only used behind the scenes for query expansion and not\npresented to users. For instance, a user might attempt to look up\n*résumé* in the English wordnet, but the wordnet only contains the\nform without diacritics: *resume*. With strict string matching, the\nentry would not be found using the wordform in the query. By\nnormalizing the query word, the entry can be found. Similarly in the\nSpanish wordnet, *soñar* (to dream) and *sonar* (to ring) are two\ndifferent words. A user who types *soñar* likely does not want to get\nresults for *sonar*, but one who types *sonar* may be a non-Spanish\nspeaker who is unaware of the missing diacritic or does not have an\ninput method that allows them to type the diacritic, so this query\nwould return both entries by matching against the normalized forms in\nthe database. Wn handles all of these use cases.\n\nWhen a lexicon is added to the database, potentially two wordforms are\ninserted for every one in the lexicon: the original wordform and a\nnormalized form. When querying against the database, the original\nquery string is first compared with the original wordforms and, if\nnormalization is enabled, with the normalized forms in the database as\nwell. If this first attempt yields no results and if normalization is\nenabled, the query string is normalized and tried again.\n\nNormalization Functions\n'''''''''''''''''''''''\n\nThe normalized form is obtained from a *normalizer* function, passed\nas an argument to :class:`wn.Wordnet`, that takes a single string\nargument and returns a string. That is, a function with the following\nsignature:\n\n.. code-block:: python\n\n   normalizer(s: str) -> str\n\nWhile custom *normalizer* functions could be used, in practice the\nchoice is either the default normalizer or :python:`None`. The default\nnormalizer works by downcasing the string and performing NFKD_\nnormalization to remove diacritics. If the normalized form is the same\nas the original, only the original is inserted into the database.\n\n.. table:: Examples of normalization\n   :align: center\n\n   =============  ===============\n   Original Form  Normalized Form\n   =============  ===============\n   résumé         resume\n   soñar          sonar\n   San José       san jose\n   ハラペーニョ   ハラヘーニョ\n   =============  ===============\n\n.. _NFKD: https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms\n\nQuerying With Normalization\n'''''''''''''''''''''''''''\n\nBy default, normalization is enabled when a :class:`wn.Wordnet` is\ncreated. Enabling normalization does two things: it allows queries to\ncheck the original wordform in the query against the normalized forms\nin the database and, if no results are returned in the first step, it\nallows the queried wordform to be normalized as a back-off technique.\n\n>>> en = wn.Wordnet('oewn:2021')\n>>> en.words('résumé')\n[Word('oewn-resume-n'), Word('oewn-resume-v')]\n>>> es = wn.Wordnet('omw-es:1.4')\n>>> es.words('soñar')\n[Word('omw-es-soñar-v')]\n>>> es.words('sonar')\n[Word('omw-es-sonar-v'), Word('omw-es-soñar-v')]\n\n.. note::\n\n   Users may supply a custom *normalizer* function to the\n   :class:`wn.Wordnet` object, but currently this is discouraged as\n   the result is unlikely to match normalized forms in the database\n   and there is not yet a way to customize the normalization of forms\n   added to the database.\n\nQuerying Without Normalization\n''''''''''''''''''''''''''''''\n\nNormalization can be disabled by passing :python:`None` as the\nargument of the *normalizer* parameter of :class:`wn.Wordnet`. The\nqueried wordform will not be checked against normalized forms in the\ndatabase and neither will it be normalized as a back-off technique.\n\n>>> en = wn.Wordnet('oewn:2021', normalizer=None)\n>>> en.words('résumé')\n[]\n>>> es = wn.Wordnet('omw-es:1.4', normalizer=None)\n>>> es.words('soñar')\n[Word('omw-es-soñar-v')]\n>>> es.words('sonar')\n[Word('omw-es-sonar-v')]\n\n.. note::\n\n   It is not possible to disable normalization for the convenience\n   functions :func:`wn.words`, :func:`wn.senses`, and\n   :func:`wn.synsets`.\n"
  },
  {
    "path": "docs/guides/lexicons.rst",
    "content": "Working with Lexicons\n=====================\n\nTerminology\n-----------\n\nIn Wn, the following terminology is used:\n\n:lexicon: An inventory of words, senses, synsets, relations, etc. that\n          share a namespace (i.e., that can refer to each other).\n:wordnet: A group of lexicons (but usually just one).\n:resource: A file containing lexicons.\n:package: A directory containing a resource and optionally some\n          metadata files.\n:collection: A directory containing packages and optionally some\n             metadata files.\n:project: A general term for a resource, package, or collection,\n          particularly pertaining to its creation, maintenance, and\n          distribution.\n\nIn general, each resource contains one lexicon. For large projects\nlike the `Open English WordNet`_, that lexicon is also a wordnet on\nits own. For a collection like the `Open Multilingual Wordnet`_, most\nlexicons do not include relations as they are instead expected to use\nthose from the OMW's included English wordnet, which is derived from\nthe `Princeton WordNet`_. As such, a wordnet for these sub-projects is\nbest thought of as the grouping of the lexicon with the lexicon\nproviding the relations.\n\n.. _Open English WordNet: https://en-word.net\n.. _Open Multilingual Wordnet: https://github.com/omwn/\n.. _Princeton WordNet: https://wordnet.princeton.edu/\n\n.. _lexicon-specifiers:\n\nLexicon and Project Specifiers\n------------------------------\n\nWn uses *lexicon specifiers* to deal with the possibility of having\nmultiple lexicons and multiple versions of lexicons loaded in the same\ndatabase. The specifiers are the joining of a lexicon's name (ID) and\nversion, delimited by ``:``. Here are the possible forms:\n\n.. code-block:: none\n\n    *           -- any/all lexicons\n    id          -- the most recently added lexicon with the given id\n    id:*        -- all lexicons with the given id\n    id:version  -- the lexicon with the given id and version\n    *:version   -- all lexicons with the given version\n\nFor example, if ``ewn:2020`` was installed followed by ``ewn:2019``,\nthen ``ewn`` would specify the ``2019`` version, ``ewn:*`` would\nspecify both versions, and ``ewn:2020`` would specify the ``2020``\nversion.\n\nThe same format is used for *project specifiers*, which refer to\nprojects as defined in Wn's index. In most cases the project specifier\nis the same as the lexicon specifier (e.g., ``ewn:2020`` refers both\nto the project to be downloaded and the lexicon that is installed),\nbut sometimes it is not. The 1.4 release of the `Open Multilingual\nWordnet`_, for instance, has the project specifier ``omw:1.4`` but it\ninstalls a number of lexicons with their own lexicon specifiers\n(``omw-zsm:1.4``, ``omw-cmn:1.4``, etc.). When only an id is given\n(e.g., ``ewn``), a project specifier gets the *first* version listed\nin the index (in the default index, conventionally, the first version\nis the latest release).\n\n.. _lexicon-filters:\n\nFiltering Queries with Lexicons\n-------------------------------\n\nQueries against the database will search all installed lexicons unless\nthey are filtered by ``lang`` or ``lexicon`` arguments:\n\n>>> import wn\n>>> len(wn.words())\n1538449\n>>> len(wn.words(lang=\"en\"))\n318289\n>>> len(wn.words(lexicon=\"oewn:2024\"))\n161705\n\nThe ``lexicon`` parameter can also take multiple specifiers so you can\ninclude things like lexicon extensions or to explicitly include\nmultiple lexicons:\n\n>>> len(wn.words(lexicon=\"oewn:2024 omw-en:1.4\"))\n318289\n\nIf a lexicon selected by the ``lexicon`` or ``lang`` arguments\nspecifies a dependency, the dependency is automatically added as an\n*expand* lexicon. Explicitly set :python:`expand=''` to disable this\nbehavior:\n\n>>> wn.lexicons(lexicon=\"omw-es:1.4\")[0].requires()  # omw-es requires omw-en\n{'omw-en:1.4': <Lexicon omw-en:1.4 [en]>}\n>>> es = wn.Wordnet(\"omw-es:1.4\")\n>>> es.lexicons()\n[<Lexicon omw-es:1.4 [es]>]\n>>> es.expanded_lexicons()  # omw-en automatically added\n[<Lexicon omw-en:1.4 [en]>]\n>>> es_no_en = wn.Wordnet(\"omw-es:1.4\", expand='')\n>>> es_no_en.lexicons()\n[<Lexicon omw-es:1.4 [es]>]\n>>> es_no_en.expanded_lexicons()  # no expand lexicons\n[]\n\nAlso see :ref:`cross-lingual-relation-traversal` for\nselecting expand lexicons for relations.\n\nThe objects returned by queries retain the \"lexicon configuration\"\nused, which includes the lexicons and expand lexicons. This\nconfiguration determines which lexicons are searched during secondary\nqueries. The lexicon configuration also stores a flag indicating\nwhether no lexicon filters were used at all, which triggers\n:ref:`default mode <default-mode>` secondary queries.\n\n.. _default-mode:\n\nDefault Mode Queries\n--------------------\n\nA special \"default mode\" is activated when making a module-function\nquery (:func:`wn.words`, :func:`wn.synsets`, etc.) or instantiating a\n:class:`wn.Wordnet` object with no ``lexicon`` or ``lang`` argument\n(so-named because the mode is triggered by using the default values of\n``lexicon`` and ``lang``):\n\n>>> w = wn.Wordnet()\n>>> wn.words(\"pineapple\")  # for example\n\nDefault-mode causes the following behavior:\n\n1. Primary queries search any installed lexicon\n2. Secondary queries only search the lexicon of the primary entity\n   (e.g., :meth:`Synset.words` only finds words from the same lexicon\n   as the synset). If the lexicon has any extensions or is itself an\n   extension, any extension/base lexicons are also included.\n3. If the ``expand`` argument is :python:`None` (always true for\n   module functions like :func:`wn.synsets`), all installed lexicons\n   are used as expand lexicons for relations queries.\n\n.. warning::\n\n   Default-mode queries are not reproducible as the results can change\n   as lexicons are added or removed from the database. For anything\n   more than a casual query, it is highly suggested to instead create\n   a :class:`wn.Wordnet` object with fully-specified ``lexicon`` and\n   ``expand`` arguments.\n\nDownloading Lexicons\n--------------------\n\nUse :py:func:`wn.download` to download lexicons from the web given\neither an indexed project specifier or the URL of a resource, package,\nor collection.\n\n>>> import wn\n>>> wn.download('odenet')  # get the latest Open German WordNet\n>>> wn.download('odenet:1.3')  # get the 1.3 version\n>>> # download from a URL\n>>> wn.download('https://github.com/omwn/omw-data/releases/download/v1.4/omw-1.4.tar.xz')\n\nThe project specifier is only used to retrieve information from Wn's\nindex. The lexicon IDs of the corresponding resource files are what is\nstored in the database.\n\nAdding Local Lexicons\n---------------------\n\nLexicons can be added from local files with :py:func:`wn.add`:\n\n>>> wn.add('~/data/omw-1.4/omw-nb/omw-nb.xml')\n\nOr with the parent directory as a package:\n\n>>> wn.add('~/data/omw-1.4/omw-nb/')\n\nOr with the grandparent directory as a collection (installing all\npackages contained by the collection):\n\n>>> wn.add('~/data/omw-1.4/')\n\nOr from a compressed archive of one of the above:\n\n>>> wn.add('~/data/omw-1.4/omw-nb/omw-nb.xml.xz')\n>>> wn.add('~/data/omw-1.4/omw-nb.tar.xz')\n>>> wn.add('~/data/omw-1.4.tar.xz')\n\nListing Installed Lexicons\n--------------------------\n\nIf you wish to see which lexicons have been added to the database,\n:py:func:`wn.lexicons()` returns the list of :py:class:`wn.Lexicon`\nobjects that describe each one.\n\n>>> for lex in wn.lexicons():\n...     print(f'{lex.id}:{lex.version}\\t{lex.label}')\n...\nomw-en:1.4\tOMW English Wordnet based on WordNet 3.0\nomw-nb:1.4\tNorwegian Wordnet (Bokmål)\nodenet:1.3\tOffenes Deutsches WordNet\newn:2020\tEnglish WordNet\newn:2019\tEnglish WordNet\n\nRemoving Lexicons\n-----------------\n\nLexicons can be removed from the database with :py:func:`wn.remove`:\n\n>>> wn.remove('omw-nb:1.4')\n\nNote that this removes a single lexicon and not a project, so if, for\ninstance, you've installed a multi-lexicon project like ``omw``, you\nwill need to remove each lexicon individually or use a star specifier:\n\n>>> wn.remove('omw-*:1.4')\n\nWN-LMF Files, Packages, and Collections\n---------------------------------------\n\nWn can handle projects with 3 levels of structure:\n\n* WN-LMF XML files\n* WN-LMF packages\n* WN-LMF collections\n\nWN-LMF XML Files\n''''''''''''''''\n\nA WN-LMF XML file is a file with a ``.xml`` extension that is valid\naccording to the `WN-LMF specification\n<https://github.com/globalwordnet/schemas/>`_.\n\nWN-LMF Packages\n'''''''''''''''\n\nIf one needs to distribute metadata or additional files along with\nWN-LMF XML file, a WN-LMF package allows them to include the files in\na directory. The directory should contain exactly one ``.xml`` file,\nwhich is the WN-LMF XML file. In addition, it may contain additional\nfiles and Wn will recognize three of them:\n\n:``LICENSE`` (``.txt`` | ``.md`` | ``.rst`` ): the full text of the license\n:``README`` (``.txt`` | ``.md`` | ``.rst`` ): the project README\n:``citation.bib``: a BibTeX file containing academic citations for the project\n\n\n.. code-block::\n\n   omw-sq/\n   ├── omw-sq.xml\n   ├── LICENSE.txt\n   └── README.md\n\nWN-LMF Collections\n''''''''''''''''''\n\nIn some cases a project may manage multiple resources and distribute\nthem as a collection. A collection is a directory containing\nsubdirectories which are WN-LMF packages. The collection may contain\nits own README, LICENSE, and citation files which describe the project\nas a whole.\n\n.. code-block::\n\n   omw-1.4/\n   ├── omw-sq\n   │   ├── oms-sq.xml\n   │   ├── LICENSE.txt\n   │   └── README.md\n   ├── omw-lt\n   │   ├── citation.bib\n   │   ├── LICENSE\n   │   └── omw-lt.xml\n   ├── ...\n   ├── citation.bib\n   ├── LICENSE\n   └── README.md\n"
  },
  {
    "path": "docs/guides/nltk-migration.rst",
    "content": "Migrating from the NLTK\n=======================\n\nThis guide is for users of the `NLTK <https://www.nltk.org/>`_\\ 's\n``nltk.corpus.wordnet`` module who are migrating to Wn. It is not\nguaranteed that Wn will produce the same results as the NLTK's module,\nbut with some care its behavior can be very similar.\n\nOverview\n--------\n\nOne important thing to note is that Wn will search all wordnets in the\ndatabase by default where the NLTK would only search the English.\n\n>>> from nltk.corpus import wordnet as nltk_wn\n>>> nltk_wn.synsets('chat')                 # only English\n>>> nltk_wn.synsets('chat', lang='fra')     # only French\n>>> import wn\n>>> wn.synsets('chat')                      # all wordnets\n>>> wn.synsets('chat', lang='fr')           # only French\n\nWith Wn it helps to create a :class:`wn.Wordnet` object to pre-filter\nthe results by language or lexicon.\n\n>>> en = wn.Wordnet('omw-en:1.4')\n>>> en.synsets('chat')                     # only the OMW English Wordnet\n\nEquivalent Operations\n---------------------\n\nThe following table lists equivalent API calls for the NLTK's wordnet\nmodule and Wn assuming the respective modules have been instantiated\n(in separate Python sessions) as follows:\n\nNLTK:\n\n>>> from nltk.corpus import wordnet as wn\n>>> ss = wn.synsets(\"chat\", pos=\"v\")[0]\n\nWn:\n\n>>> import wn\n>>> en = wn.Wordnet('omw-en:1.4')\n>>> ss = en.synsets(\"chat\", pos=\"v\")[0]\n\n.. default-role:: python\n\nPrimary Queries\n'''''''''''''''\n\n=========================================  ===============================================\nNLTK                                       Wn\n=========================================  ===============================================\n`wn.langs()`                               `[lex.language for lex in wn.lexicons()]`\n`wn.lemmas(\"chat\")`                        --\n--                                         `en.words(\"chat\")`\n--                                         `en.senses(\"chat\")`\n`wn.synsets(\"chat\")`                       `en.synsets(\"chat\")`\n`wn.synsets(\"chat\", pos=\"v\")`              `en.synsets(\"chat\", pos=\"v\")`\n`wn.all_synsets()`                         `en.synsets()`\n`wn.all_synsets(pos=\"v\")`                  `en.synsets(pos=\"v\")`\n=========================================  ===============================================\n\nSynsets -- Basic\n''''''''''''''''\n\n===================  =================\nNLTK                 Wn\n===================  =================\n`ss.lemmas()`        --\n--                   `ss.senses()`\n--                   `ss.words()`\n`ss.lemmas_names()`  `ss.lemmas()`\n`ss.definition()`    `ss.definition()`\n`ss.examples()`      `ss.examples()`\n`ss.pos()`           `ss.pos`\n===================  =================\n\nSynsets -- Relations\n''''''''''''''''''''\n\n==========================================  =====================================\nNLTK                                        Wn\n==========================================  =====================================\n`ss.hypernyms()`                            `ss.get_related(\"hypernym\")`\n`ss.instance_hypernyms()`                   `ss.get_related(\"instance_hypernym\")`\n`ss.hypernyms() + ss.instance_hypernyms()`  `ss.hypernyms()`\n`ss.hyponyms()`                             `ss.get_related(\"hyponym\")`\n`ss.member_holonyms()`                      `ss.get_related(\"holo_member\")`\n`ss.member_meronyms()`                      `ss.get_related(\"mero_member\")`\n`ss.closure(lambda x: x.hypernyms())`       `ss.closure(\"hypernym\")`\n==========================================  =====================================\n\nSynsets -- Taxonomic Structure\n''''''''''''''''''''''''''''''\n\n================================  =========================================================\nNLTK                              Wn\n================================  =========================================================\n`ss.min_depth()`                  `ss.min_depth()`\n`ss.max_depth()`                  `ss.max_depth()`\n`ss.hypernym_paths()`             `[list(reversed([ss] + p)) for p in ss.hypernym_paths()]`\n`ss.common_hypernyms(ss)`         `ss.common_hypernyms(ss)`\n`ss.lowest_common_hypernyms(ss)`  `ss.lowest_common_hypernyms(ss)`\n`ss.shortest_path_distance(ss)`   `len(ss.shortest_path(ss))`\n================================  =========================================================\n\n.. reset default role\n.. default-role::\n\n(these tables are incomplete)\n"
  },
  {
    "path": "docs/guides/wordnet.rst",
    "content": ".. raw:: html\n\n    <style>.center {margin-left:20%}</style>\n\n\nThe Structure of a Wordnet\n==========================\nA **wordnet** is an online lexicon which is organized by concepts. \n\nThe basic unit of a wordnet is the synonym set (**synset**), a group of words that all refer to the \nsame concept. Words and synsets are linked by means of conceptual-semantic relations to form the \nstructure of wordnet. \n\nWords, Senses, and Synsets\n--------------------------\nWe all know that **words** are the basic building blocks of languages, a word is built up with two parts, \nits form and its meaning, but in natural languages, the word form and word meaning are not in an elegant \none-to-one match, one word form may connect to many different meanings, so hereforth, we need **senses**, \nto work as the unit of word meanings, for example, the word *bank* has at least two senses:\n\n1. bank\\ :sup:`1`\\: financial institution, like *City Bank*;\n2. bank\\ :sup:`2`\\: sloping land, like *river bank*;\n\nSince **synsets** are group of words sharing the same concept, bank\\ :sup:`1`\\ and bank\\ :sup:`2`\\ are members of \ntwo different synsets, although they have the same word form.\n\nOn the other hand, different word forms may also convey the same concept, such as *cab* and *taxi*, \nthese word forms with the same concept are grouped together into one synset.\n\n.. raw:: html\n    :file: images/word-sense-synset.svg\n\n\n.. role:: center\n    :class: center\n\n:center:`Figure: relations between words, senses and synsets`\n\n\nSynset Relations\n----------------\nIn wordnet, synsets are linked with each other to form various kinds of relations. For example, if \nthe concept expressed by a synset is more general than a given synset, then it is in a \n*hypernym* relation with the given synset. As shown in the figure below, the synset with *car*, *auto* and *automobile* as its \nmember is the *hypernym* of the other synset with *cab*, *taxi* and *hack*. Such relation which is built on \nthe synset level is categorized as synset relations.\n\n.. raw:: html\n    :file: images/synset-synset.svg\n\n:center:`Figure: example of synset relations`\n\nSense Relations\n---------------\n\nSome relations in wordnet are also built on sense level, which can be further divided into two types, \nrelations that link sense with another sense, and relations that link sense with another synset.\n\n.. note::  In wordnet, synset relation and sense relation can both employ a particular \n    relation type, such as `domain topic <https://globalwordnet.github.io/gwadoc/#domain_topic>`_.\n\n**Sense-Sense**\n\nSense to sense relations emphasize the connections between different senses, especially when dealing \nwith morphologically related words. For example, *behavioral* is the adjective to the noun *behavior*, \nwhich is known as in the *pertainym* relation with *behavior*, however, such relation doesn't exist between \n*behavioral* and *conduct*, which is a synonym of *behavior* and is in the same synset. Here *pertainym* \nis a sense-sense relation.\n\n.. raw:: html\n    :file: images/sense-sense.svg\n\n:center:`Figure: example of sense-sense relations`\n\n**Sense-Synset**\n\nSense-synset relations connect a particular sense with a synset. For example, *cursor* is a term in the \n*computer science* discipline, in wordnet, it is in the *has domain topic* relation with the \n*computer science* synset, but *pointer*, which is in the same synset with *cursor*, is not a term, thus \nhas no such relation with *computer science* synset.\n\n.. raw:: html\n    :file: images/sense-synset.svg\n\n:center:`Figure: example of sense-synset relations`\n\nOther Information\n-----------------\nA wordnet should be built in an appropriate form, two schemas are accepted:\n\n* XML schema based on the Lexical Markup Framework (LMF)\n* JSON-LD using the Lexicon Model for Ontologies\n\nThe structure of a wordnet should contain below info:\n\n**Definition**\n\nDefinition is used to define senses and synsets in a wordnet, it is given in the language \nof the wordnet it came from. \n\n**Example**\n\nExample is used to clarify the senses and synsets in a wordnet, users can understand the definition \nmore clearly with a given example.\n\n**Metadata**\n\nA wordnet has its own metadata, based on the `Dublin Core <https://dublincore.org/>`_, to state the \nbasic info of it, below table lists all the items in the metadata of a wordnet:\n\n+------------------+-----------+-----------+\n| contributor      | Optional  |  str      |\n+------------------+-----------+-----------+\n| coverage         | Optional  |  str      |\n+------------------+-----------+-----------+\n| creator          | Optional  |  str      |\n+------------------+-----------+-----------+\n| date             | Optional  |  str      |\n+------------------+-----------+-----------+\n| description      | Optional  |  str      |\n+------------------+-----------+-----------+\n| format           | Optional  |  str      |\n+------------------+-----------+-----------+\n| identifier       | Optional  |  str      |\n+------------------+-----------+-----------+\n| publisher        | Optional  |  str      |\n+------------------+-----------+-----------+\n| relation         | Optional  |  str      |\n+------------------+-----------+-----------+\n| rights           | Optional  |  str      |\n+------------------+-----------+-----------+\n| source           | Optional  |  str      |\n+------------------+-----------+-----------+\n| subject          | Optional  |  str      |\n+------------------+-----------+-----------+\n| title            | Optional  |  str      |\n+------------------+-----------+-----------+\n| type             | Optional  |  str      |\n+------------------+-----------+-----------+\n| status           | Optional  |  str      |\n+------------------+-----------+-----------+\n| note             | Optional  |  str      |\n+------------------+-----------+-----------+\n| confidence       | Optional  |  float    |\n+------------------+-----------+-----------+"
  },
  {
    "path": "docs/index.rst",
    "content": "\nWn Documentation\n================\n\nOverview\n--------\n\nThis package provides an interface to wordnet data, from simple lookup\nqueries, to graph traversals, to more sophisticated algorithms and\nmetrics. Features include:\n\n- Support for wordnets in the\n  `WN-LMF <https://globalwordnet.github.io/schemas/>`_ format\n- A `SQLite <https://sqlite.org>`_ database backend for data\n  consistency and efficient queries\n- Accurate modeling of Words, Senses, and Synsets\n\nQuick Start\n-----------\n\n.. code-block:: console\n\n   $ pip install wn\n\n.. code-block:: python\n\n   >>> import wn\n   >>> wn.download('ewn:2020')\n   >>> wn.synsets('coffee')\n   [Synset('ewn-04979718-n'), Synset('ewn-07945591-n'), Synset('ewn-07945759-n'), Synset('ewn-12683533-n')]\n\n\nContents\n--------\n\n.. toctree::\n   :maxdepth: 2\n\n   setup.rst\n   cli.rst\n   faq.rst\n\n.. toctree::\n   :caption: Guides\n   :maxdepth: 2\n\n   guides/lexicons.rst\n   guides/basic.rst\n   guides/interlingual.rst\n   guides/wordnet.rst\n   guides/lemmatization.rst\n   guides/nltk-migration.rst\n\n.. toctree::\n   :caption: API Reference\n   :maxdepth: 1\n   :hidden:\n\n   api/wn.rst\n   api/wn.compat.rst\n   api/wn.constants.rst\n   api/wn.ic.rst\n   api/wn.ili.rst\n   api/wn.lmf.rst\n   api/wn.morphy.rst\n   api/wn.project.rst\n   api/wn.similarity.rst\n   api/wn.taxonomy.rst\n   api/wn.util.rst\n   api/wn.validate.rst\n"
  },
  {
    "path": "docs/make.bat",
    "content": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sphinx-build\r\n)\r\nset SOURCEDIR=.\r\nset BUILDDIR=_build\r\n\r\nif \"%1\" == \"\" goto help\r\n\r\n%SPHINXBUILD% >NUL 2>NUL\r\nif errorlevel 9009 (\r\n\techo.\r\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\r\n\techo.installed, then set the SPHINXBUILD environment variable to point\r\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\r\n\techo.may add the Sphinx directory to PATH.\r\n\techo.\r\n\techo.If you don't have Sphinx installed, grab it from\r\n\techo.http://sphinx-doc.org/\r\n\texit /b 1\r\n)\r\n\r\n%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\ngoto end\r\n\r\n:help\r\n%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\r\n\r\n:end\r\npopd\r\n"
  },
  {
    "path": "docs/requirements.txt",
    "content": "sphinx ~= 8.1\nfuro == 2024.8.6\nsphinx-copybutton == 0.5.2\n.\n\n"
  },
  {
    "path": "docs/setup.rst",
    "content": "Installation and Configuration\n==============================\n\n.. seealso::\n\n   This guide is for installing and configuring the Wn software. For\n   adding lexicons to the database, see :doc:`guides/lexicons`.\n\n\nInstalling from PyPI\n--------------------\n\nInstall the latest release from `PyPI <https://pypi.org/project/wn>`_:\n\n.. code-block:: bash\n\n   pip install wn\n\n\nThe Data Directory\n------------------\n\nBy default, Wn stores its data (such as downloaded LMF files and the\ndatabase file) in a ``.wn_data/`` directory under the user's home\ndirectory. This directory can be changed (see `Configuration`_\nbelow). Whenever Wn attempts to download a resource or access its\ndatabase, it will check for the existence of, and create if necessary,\nthis directory, the ``.wn_data/downloads/`` subdirectory, and the\n``.wn_data/wn.db`` database file. The file system will look like\nthis::\n\n    .wn_data/\n    ├── downloads\n    │   ├── ...\n    │   └── ...\n    └── wn.db\n\nThe ``...`` entries in the ``downloads/`` subdirectory represent the\nfiles of resources downloaded from the web. Their filename is a hash\nof the URL so that Wn can avoid downloading the same file twice.\n\n\nConfiguration\n-------------\n\nThe :py:data:`wn.config` object contains the paths Wn uses for local\nstorage and information about resources available on the web. To\nchange the directory Wn uses for storing data locally, modify the\n:python:`wn.config.data_directory` member:\n\n.. code-block:: python\n\n   import wn\n   wn.config.data_directory = '~/Projects/wn_data'\n\nYou can alternatively set the ``WN_DATA_DIR`` environment variable\nprior to importing Wn. On Unix-like systems, this would look like:\n\n.. code-block:: console\n\n   $ export WN_DATA_DIR=~/path/to/wn_data\n   $ python3 ...\n\nIf you are using Wn from the command line, a third option is to use\nthe ``--dir`` or ``-d`` option:\n\n.. code-block:: console\n\n   $ python3 -m wn --dir ~/path/to/wn_data download ...\n\nThere are some things to note:\n\n- The downloads directory and database path are always relative to the\n  data directory and cannot be changed directly.\n- This change only affects subsequent operations, so any data in the\n  previous location will not be moved or deleted.\n- This change only affects the current session. If you want a script\n  or application to always use the new location, it must reset the\n  data directory each time it is initialized.\n\nYou can also add project information for remote resources. First you\nadd a project, with a project ID, full name, and language code. Then\nyou create one or more versions for that project with a version ID,\nresource URL, and license information. This may be done either through\nthe :py:data:`wn.config` object's\n:py:meth:`~wn._config.WNConfig.add_project` and\n:py:meth:`~wn._config.WNConfig.add_project_version` methods, or loaded\nfrom a TOML_ file via the :py:data:`wn.config` object's\n:py:meth:`~wn._config.WNConfig.load_index` method.\n\n.. _TOML: https://toml.io\n\n.. code-block:: python\n\n   wn.config.add_project('ewn', 'English WordNet', 'en')\n   wn.config.add_project_version(\n       'ewn', '2020',\n       'https://en-word.net/static/english-wordnet-2020.xml.gz',\n       'https://creativecommons.org/licenses/by/4.0/',\n   )\n\n\nRebuilding the Database\n-----------------------\n\nNew versions of Wn may occasionally alter the database schema in a way\nthat makes an existing database incompatible with the code. You will\nsee an error like this (abbreviated):\n\n>>> import wn\n>>> wn.Wordnet(\"oewn:2024\")\nTraceback (most recent call last):\n  [...]\nwn.DatabaseError: Wn's schema has changed and is no longer compatible with the database.\nLexicons currently installed:\n  odenet:1.4\n  oewn:2023\n  oewn:2024\n  omw-arb:1.4\n  [...]]\nRun wn.reset_database(rebuild=True) to rebuild the database.\n\nYou can then run, as directed, :func:`wn.reset_database` with\n``rebuild=True``, which will delete the database, initialize a new one,\nand attempt to add all the lexicons that were previously added. You can\nalso run with ``rebuild=False`` to reinitialize the database without\nre-adding lexicons, or alternatively simply delete the database file\nfrom your filesystem. See the documentation for\n:func:`wn.reset_database` for more information.\n\n\nInstalling From Source\n----------------------\n\nIf you wish to install the code from the source repository (e.g., to\nget an unreleased feature or to contribute toward Wn's development),\nclone the repository and use `Hatch <https://hatch.pypa.io/>`_ to\nstart a virtual environment with Wn installed:\n\n.. code-block:: console\n\n   $ git clone https://github.com/goodmami/wn.git\n   $ cd wn\n   $ hatch shell\n"
  },
  {
    "path": "pyproject.toml",
    "content": "[build-system]\nrequires = [\"hatchling\"]\nbuild-backend = \"hatchling.build\"\n\n[project]\ndynamic = ['version']\n\nname = \"wn\"\ndescription = \"Wordnet interface library\"\nreadme = \"README.md\"\nrequires-python = \">=3.10\"\nlicense = {file = \"LICENSE\"}\nkeywords = [\"wordnet\", \"interlingual\", \"linguistics\", \"language\", \"library\"]\nauthors = [\n  {name = \"Michael Wayne Goodman\", email = \"1428419+goodmami@users.noreply.github.com\"}\n]\nclassifiers = [\n  \"Development Status :: 4 - Beta\",\n  \"Environment :: Console\",\n  \"Intended Audience :: Developers\",\n  \"Intended Audience :: Information Technology\",\n  \"Intended Audience :: Science/Research\",\n  \"License :: OSI Approved :: MIT License\",\n  \"Programming Language :: Python :: 3\",\n  \"Programming Language :: Python :: 3.10\",\n  \"Programming Language :: Python :: 3.11\",\n  \"Programming Language :: Python :: 3.12\",\n  \"Programming Language :: Python :: 3.13\",\n  \"Programming Language :: Python :: 3.14\",\n  \"Topic :: Scientific/Engineering :: Information Analysis\",\n  \"Topic :: Software Development :: Libraries :: Python Modules\",\n  \"Topic :: Text Processing :: Linguistic\",\n]\n\ndependencies = [\n  \"httpx\",\n  \"tomli; python_version < '3.11'\",\n]\n\n[project.optional-dependencies]\neditor = [\n  \"wn-editor\"\n]\n\n[project.urls]\nhomepage = \"https://github.com/goodmami/wn\"\ndocumentation = \"https://wn.readthedocs.io\"\nchangelog = \"https://github.com/goodmami/wn/blob/main/CHANGELOG.md\"\n\n[tool.hatch.version]\npath = \"wn/__init__.py\"\n\n[tool.hatch.build.targets.sdist]\nexclude = [\n  \"/.github\",\n]\n\n[tool.hatch.envs.hatch-test]\nextra-dependencies = [\n  \"pytest-benchmark\",\n]\n\n[tool.hatch.envs.mypy]\ndependencies = [\n  \"mypy\",\n]\n\n[tool.hatch.envs.mypy.scripts]\ncheck = \"mypy wn/\"\n\n[tool.hatch.envs.types]\ndependencies = [\n  \"wn[dev]\",\n  \"ty\",\n]\n\n[tool.hatch.envs.types.scripts]\ncheck = \"ty check {args:wn/}\"\n\n[tool.hatch.envs.docs]\ndependencies = [\n  \"wn\",\n  \"furo\",\n  \"sphinx\",\n  \"sphinx-copybutton\",\n  \"sphinx-autobuild\",\n]\n\n[tool.hatch.envs.docs.scripts]\nbuild = \"sphinx-build -M html docs docs/_build\"\nclean = \"sphinx-build -M clean docs docs/_build\"\nwatch = \"sphinx-autobuild docs docs/_build/html\"\n\n[tool.ruff]\ntarget-version = \"py310\"\nline-length = 88\n\n[tool.ruff.lint]\nselect = [\n  \"B\",      # flake8-bugbear\n  \"C4\",     # comprehensions\n  \"C90\",    # McCabe cyclomatic complexity\n  \"E\",      # pycodestyle\n  \"F\",      # Pyflakes\n  \"I\",      # isort\n  \"LOG\",    # logging\n  \"PT\",     # pytest style\n  \"RUF\",    # ruff-specific fixes\n  \"SIM\",    # simplifications\n  \"TC\",     # type checking\n  \"UP\",     # newer python features\n  \"W\",      # pycodestyle\n]\n"
  },
  {
    "path": "tests/_config_test.py",
    "content": "from pathlib import Path\n\nfrom wn._config import WNConfig\n\n\ndef test_envvar_data_dir(monkeypatch, tmp_path):\n    assert WNConfig().data_directory == Path.home() / \".wn_data\"\n    with monkeypatch.context() as mp:\n        mp.setenv(\"WN_DATA_DIR\", str(tmp_path))\n        assert WNConfig().data_directory == tmp_path\n"
  },
  {
    "path": "tests/_util_test.py",
    "content": "from wn._util import (\n    flatten,\n    format_lexicon_specifier,\n    normalize_form,\n    split_lexicon_specifier,\n    unique_list,\n)\n\n\ndef test_flatten():\n    assert flatten([]) == []\n    assert flatten([[]]) == []\n    assert flatten([[], []]) == []\n    assert flatten([[[], []], [[], []]]) == [[], [], [], []]\n    assert flatten([[1]]) == [1]\n    assert flatten([[1, 2], [3, 4]]) == [1, 2, 3, 4]\n    assert flatten([\"AB\", \"CD\"]) == [\"A\", \"B\", \"C\", \"D\"]\n\n\ndef test_unique_list():\n    assert unique_list([]) == []\n    assert unique_list([1]) == [1]\n    assert unique_list([1, 1, 1, 1, 1]) == [1]\n    assert unique_list([1, 1, 2, 2, 1]) == [1, 2]\n    assert unique_list([2, 1, 2, 2, 1]) == [2, 1]\n    assert unique_list(\"A\") == [\"A\"]\n    assert unique_list(\"AAA\") == [\"A\"]\n    assert unique_list(\"ABABA\") == [\"A\", \"B\"]\n    assert unique_list([(1, 2), (1, 2), (2, 3)]) == [(1, 2), (2, 3)]\n\n\ndef test_normalize_form():\n    assert normalize_form(\"ABC\") == \"abc\"\n    assert normalize_form(\"so\\xf1ar\") == \"sonar\"  # soñar with single ñ character\n    assert normalize_form(\"son\\u0303ar\") == \"sonar\"  # soñar with combining tilde\n    assert normalize_form(\"Weiß\") == \"weiss\"\n\n\ndef test_format_lexicon_specifier():\n    assert format_lexicon_specifier(\"\", \"\") == \":\"\n    assert format_lexicon_specifier(\"foo\", \"\") == \"foo:\"\n    assert format_lexicon_specifier(\"\", \"bar\") == \":bar\"\n    assert format_lexicon_specifier(\"foo\", \"bar\") == \"foo:bar\"\n\n\ndef test_split_lexicon_specifier():\n    assert split_lexicon_specifier(\"\") == (\"\", \"\")\n    assert split_lexicon_specifier(\":\") == (\"\", \"\")\n    assert split_lexicon_specifier(\"foo\") == (\"foo\", \"\")\n    assert split_lexicon_specifier(\"foo:\") == (\"foo\", \"\")\n    assert split_lexicon_specifier(\":bar\") == (\"\", \"bar\")\n    assert split_lexicon_specifier(\"foo:bar\") == (\"foo\", \"bar\")\n"
  },
  {
    "path": "tests/compat_sensekey_test.py",
    "content": "import pytest\n\nimport wn\nfrom wn.compat import sensekey\n\n\ndef test_unescape_oewn_sense_key():\n    def unescape(s: str) -> str:\n        return sensekey.unescape(s, flavor=\"oewn\")\n\n    assert unescape(\"\") == \"\"\n    assert unescape(\"abc\") == \"abc\"\n    assert unescape(\".\") == \".\"  # only becomes : in second part of key\n    # escape patterns\n    assert unescape(\"-ap-\") == \"'\"\n    assert unescape(\"-ex-\") == \"!\"\n    assert unescape(\"-cm-\") == \",\"\n    assert unescape(\"-cn-\") == \":\"\n    assert unescape(\"-pl-\") == \"+\"\n    assert unescape(\"-sl-\") == \"/\"\n    # adjacent escapes need their own dashes\n    assert unescape(\"-ap-ex-\") == \"'ex-\"\n    assert unescape(\"-ap--ex-\") == \"'!\"\n    # invalid escapes are unchanged\n    assert unescape(\"-foo-\") == \"-foo-\"  # not an escape sequence\n    assert unescape(\"-sp-\") == \"-sp-\"  # not valid in lemma portion\n    assert unescape(\"ap-\") == \"ap-\"  # no preceding dash\n    assert unescape(\"-ap\") == \"-ap\"  # no trailing dash\n    assert unescape(\"-AP-\") == \"-AP-\"  # case sensitivity\n    # full key, second part escapes differently\n    assert unescape(\"abc__1.23.00..\") == \"abc%1:23:00::\"\n    assert unescape(\"abc__1.23.00.foo-sp-bar.\") == \"abc%1:23:00:foo_bar:\"\n    assert unescape(\"abc__1.23.00.foo-ap-bar.\") == \"abc%1:23:00:foo-ap-bar:\"\n\n\ndef test_escape_oewn_sense_key():\n    def escape(s: str) -> str:\n        return sensekey.escape(s, flavor=\"oewn\")\n\n    assert escape(\"\") == \"\"\n    assert escape(\"abc\") == \"abc\"\n    assert escape(\".\") == \".\"  # only becomes : in second part of key\n    # escape patterns\n    assert escape(\"'\") == \"-ap-\"\n    assert escape(\"!\") == \"-ex-\"\n    assert escape(\",\") == \"-cm-\"\n    assert escape(\":\") == \"-cn-\"\n    assert escape(\"+\") == \"-pl-\"\n    assert escape(\"/\") == \"-sl-\"\n    # adjacent escapes need their own dashes\n    assert escape(\"'!\") == \"-ap--ex-\"\n    # full key, second part escapes differently\n    assert escape(\"abc%1:23:00::\") == \"abc__1.23.00..\"\n    assert escape(\"abc%1:23:00:foo_bar:\") == \"abc__1.23.00.foo-sp-bar.\"\n    assert escape(\"abc%1:23:00:foo'bar:\") == \"abc__1.23.00.foo'bar.\"\n\n\ndef test_unescape_oewn_v2_sense_key():\n    def unescape(s: str) -> str:\n        return sensekey.unescape(s, flavor=\"oewn-v2\")\n\n    assert unescape(\"\") == \"\"\n    assert unescape(\"abc\") == \"abc\"\n    assert unescape(\".\") == \".\"  # only becomes : in second part of key\n    # escape patterns\n    assert unescape(\"-apos-\") == \"'\"\n    assert unescape(\"-excl-\") == \"!\"\n    assert unescape(\"-comma-\") == \",\"\n    assert unescape(\"-colon-\") == \":\"\n    assert unescape(\"-plus-\") == \"+\"\n    assert unescape(\"-sol-\") == \"/\"\n    assert unescape(\"--\") == \"-\"\n    # adjacent escapes need their own dashes\n    assert unescape(\"-apos-excl-\") == \"'excl-\"\n    assert unescape(\"-apos--excl-\") == \"'!\"\n    # invalid escapes are unchanged\n    assert unescape(\"-foo-\") == \"-foo-\"  # not an escape sequence\n    assert unescape(\"-sp-\") == \"-sp-\"  # not valid in lemma portion\n    assert unescape(\"ap-\") == \"ap-\"  # no preceding dash\n    assert unescape(\"-ap\") == \"-ap\"  # no trailing dash\n    assert unescape(\"-AP-\") == \"-AP-\"  # case sensitivity\n    # full key, second part escapes differently\n    assert unescape(\"abc__1.23.00..\") == \"abc%1:23:00::\"\n    assert unescape(\"abc__1.23.00.foo-sp-bar.\") == \"abc%1:23:00:foo_bar:\"\n    assert unescape(\"abc__1.23.00.foo-ap-bar.\") == \"abc%1:23:00:foo-ap-bar:\"\n\n\ndef test_escape_oewn_v2_sense_key():\n    def escape(s: str) -> str:\n        return sensekey.escape(s, flavor=\"oewn-v2\")\n\n    assert escape(\"\") == \"\"\n    assert escape(\"abc\") == \"abc\"\n    assert escape(\".\") == \".\"  # only becomes : in second part of key\n    # escape patterns\n    assert escape(\"'\") == \"-apos-\"\n    assert escape(\"!\") == \"-excl-\"\n    assert escape(\",\") == \"-comma-\"\n    assert escape(\":\") == \"-colon-\"\n    assert escape(\"+\") == \"-plus-\"\n    assert escape(\"/\") == \"-sol-\"\n    assert escape(\"-\") == \"--\"\n    # adjacent escapes need their own dashes\n    assert escape(\"'!\") == \"-apos--excl-\"\n    # full key, second part escapes differently\n    assert escape(\"abc%1:23:00::\") == \"abc__1.23.00..\"\n    assert escape(\"abc%1:23:00:foo_bar:\") == \"abc__1.23.00.foo-sp-bar.\"\n    assert escape(\"abc%1:23:00:foo'bar:\") == \"abc__1.23.00.foo'bar.\"\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_sense_key_getter(datadir):\n    wn.add(datadir / \"sense-key-variations.xml\")\n    wn.add(datadir / \"sense-key-variations2.xml\")\n\n    get_omw_sense_key = sensekey.sense_key_getter(\"omw-en:1.4\")\n    get_oewn2024_sense_key = sensekey.sense_key_getter(\"oewn:2024\")\n    get_oewn2025_sense_key = sensekey.sense_key_getter(\"oewn:2025\")\n\n    omw_sense = wn.sense(\"omw-en--apos-s_Gravenhage-08950407-n\", lexicon=\"omw-en:1.4\")\n    oewn2024_sense = wn.sense(\"oewn--ap-s_gravenhage__1.15.00..\", lexicon=\"oewn:2024\")\n    oewn2025_sense = wn.sense(\"oewn--apos-s_gravenhage__1.15.00..\", lexicon=\"oewn:2025\")\n\n    assert get_omw_sense_key(omw_sense) == \"'s_gravenhage%1:15:00::\"\n    assert get_omw_sense_key(oewn2024_sense) is None\n    assert get_omw_sense_key(oewn2025_sense) is None\n\n    assert get_oewn2024_sense_key(omw_sense) is None\n    assert get_oewn2024_sense_key(oewn2024_sense) == \"'s_gravenhage%1:15:00::\"\n    assert get_oewn2024_sense_key(oewn2025_sense) == \"-apos-s_gravenhage%1:15:00::\"\n\n    assert get_oewn2025_sense_key(omw_sense) is None\n    assert get_oewn2025_sense_key(oewn2024_sense) == \"-ap-s_gravenhage%1:15:00::\"\n    assert get_oewn2025_sense_key(oewn2025_sense) == \"'s_gravenhage%1:15:00::\"\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_sense_getter(datadir):\n    wn.add(datadir / \"sense-key-variations.xml\")\n    wn.add(datadir / \"sense-key-variations2.xml\")\n\n    get_omw_sense = sensekey.sense_getter(\"omw-en:1.4\")\n    get_oewn2024_sense = sensekey.sense_getter(\"oewn:2024\")\n    get_oewn2025_sense = sensekey.sense_getter(\"oewn:2025\")\n\n    omw_sense = wn.sense(\"omw-en--apos-s_Gravenhage-08950407-n\", lexicon=\"omw-en:1.4\")\n    oewn2024_sense = wn.sense(\"oewn--ap-s_gravenhage__1.15.00..\", lexicon=\"oewn:2024\")\n    oewn2025_sense = wn.sense(\"oewn--apos-s_gravenhage__1.15.00..\", lexicon=\"oewn:2025\")\n\n    assert get_omw_sense(\"'s_gravenhage%1:15:00::\") == omw_sense\n    assert get_oewn2024_sense(\"'s_gravenhage%1:15:00::\") == oewn2024_sense\n    assert get_oewn2025_sense(\"'s_gravenhage%1:15:00::\") == oewn2025_sense\n"
  },
  {
    "path": "tests/conftest.py",
    "content": "import lzma\nfrom pathlib import Path\n\nimport pytest\n\nimport wn\n\n\n@pytest.fixture(scope=\"session\")\ndef datadir():\n    return Path(__file__).parent / \"data\"\n\n\n@pytest.fixture\ndef uninitialized_datadir(monkeypatch, tmp_path: Path):\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", tmp_path / \"uninitialized_datadir\")\n        yield\n\n\n@pytest.fixture(scope=\"session\")\ndef empty_db(tmp_path_factory):\n    dir = tmp_path_factory.mktemp(\"wn_data_empty\")\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        yield\n\n\n# We want to build these DBs once per session, but connections\n# are created once for every test.\n\n\n@pytest.fixture(scope=\"session\")\ndef mini_db_dir(datadir, tmp_path_factory):\n    dir = tmp_path_factory.mktemp(\"wn_data_mini\")\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        wn.add(datadir / \"mini-lmf-1.0.xml\")\n        wn.add(datadir / \"mini-ili.tsv\")\n        wn._db.clear_connections()\n\n    return Path(dir)\n\n\n@pytest.fixture\ndef mini_lmf_compressed(datadir, tmp_path):\n    data = (datadir / \"mini-lmf-1.0.xml\").read_bytes()\n    path = tmp_path / \"temp.xml.xz\"\n    with lzma.open(path, \"w\") as f:\n        f.write(data)\n    return Path(path)\n\n\n@pytest.fixture(scope=\"session\")\ndef mini_db_1_1_dir(datadir, tmp_path_factory):\n    dir = tmp_path_factory.mktemp(\"wn_data_mini_1_1\")\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        wn.add(datadir / \"mini-lmf-1.0.xml\")\n        wn.add(datadir / \"mini-lmf-1.1.xml\")\n        wn._db.clear_connections()\n\n    return Path(dir)\n\n\n@pytest.fixture(scope=\"session\")\ndef mini_db_1_4_dir(datadir, tmp_path_factory):\n    dir = tmp_path_factory.mktemp(\"wn_data_mini_1_4\")\n    with pytest.MonkeyPatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", dir)\n        wn.add(datadir / \"mini-lmf-1.0.xml\")\n        wn.add(datadir / \"mini-lmf-1.4.xml\")\n        wn._db.clear_connections()\n\n    return Path(dir)\n\n\n@pytest.fixture\ndef mini_db(monkeypatch, mini_db_dir):\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", mini_db_dir)\n        yield\n        wn._db.clear_connections()\n\n\n@pytest.fixture\ndef mini_db_1_1(monkeypatch, mini_db_1_1_dir):\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", mini_db_1_1_dir)\n        yield\n        wn._db.clear_connections()\n\n\n@pytest.fixture\ndef mini_db_1_4(monkeypatch, mini_db_1_4_dir):\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", mini_db_1_4_dir)\n        yield\n        wn._db.clear_connections()\n"
  },
  {
    "path": "tests/data/E101-0.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- duplicate ID in lexical entries -->\n\n  <Lexicon id=\"test-e101\"\n           label=\"Testing E101\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-e101-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-e101-foo\" synset=\"test-e101-01-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-e101-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo2\" />\n      <Sense id=\"test-e101-foo2\" synset=\"test-e101-01-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-e101-01-n\" ili=\"i12345\" partOfSpeech=\"n\" />\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/E101-1.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- duplicate ID in senses -->\n\n  <Lexicon id=\"test-e101\"\n           label=\"Testing E101\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-e101-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-e101-foo\" synset=\"test-e101-01-n\" />\n      <Sense id=\"test-e101-foo\" synset=\"test-e101-02-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-e101-01-n\" ili=\"i12345\" partOfSpeech=\"n\" />\n    <Synset id=\"test-e101-02-n\" ili=\"i12346\" partOfSpeech=\"n\" />\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/E101-2.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- duplicate ID in synsets -->\n\n  <Lexicon id=\"test-e101\"\n           label=\"Testing E101\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-e101-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-e101-foo-n\" synset=\"test-e101-01-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-e101-01-n\" ili=\"i12345\" partOfSpeech=\"n\" />\n    <Synset id=\"test-e101-01-n\" ili=\"i12346\" partOfSpeech=\"n\" />\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/E101-3.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- duplicate ID in different entity types -->\n\n  <Lexicon id=\"test-e101\"\n           label=\"Testing E101\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-e101-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-e101-foo-n\" synset=\"test-e101-01-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-e101-01-n\" ili=\"i12345\" partOfSpeech=\"n\" />\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/README.md",
    "content": "# Testing Data Directory\n\nThis directory is used to store data files used by the testing system.\n\n"
  },
  {
    "path": "tests/data/W305-0.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- blank definition in synset -->\n\n  <Lexicon id=\"test-w305\"\n           label=\"Testing W305\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-w305-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-w305-foo-n\" synset=\"test-w305-01-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-w305-01-n\" ili=\"i12345\" partOfSpeech=\"n\">\n      <Definition>\n        \n      </Definition>\n    </Synset>\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/W306-0.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- blank example in synset -->\n\n  <Lexicon id=\"test-w306\"\n           label=\"Testing W306\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-w306-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-w306-foo-n\" synset=\"test-w306-01-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-w306-01-n\" ili=\"i12345\" partOfSpeech=\"n\">\n      <Example>\n        \n      </Example>\n    </Synset>\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/W307-0.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- repeated definition in synset -->\n\n  <Lexicon id=\"test-w307\"\n           label=\"Testing W307\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-w307-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-w307-foo-1-n\" synset=\"test-w307-01-n\" />\n      <Sense id=\"test-w307-foo-2-n\" synset=\"test-w307-02-n\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-w307-01-n\" ili=\"i12345\" partOfSpeech=\"n\">\n      <Definition>foo</Definition>\n    </Synset>\n\n    <Synset id=\"test-w307-02-n\" ili=\"i12346\" partOfSpeech=\"n\">\n      <Definition>foo</Definition>\n    </Synset>\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/mini-ili-with-status.tsv",
    "content": "ILI\tDefinition\tStatus\ni1\ti1 definition\tactive\ni2\t\tdeprecated\ni67447\tknowledge acquired through study or experience or instruction\tactive\n"
  },
  {
    "path": "tests/data/mini-ili.tsv",
    "content": "ILI\tDefinition\ni1\ti1 definition\ni2\ni67447\tknowledge acquired through study or experience or instruction\n"
  },
  {
    "path": "tests/data/mini-lmf-1.0.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\">\n<!--\nThis sample document provides small lexicons in English and Spanish\nwith the following words and hypernym/derivation relations:\n\nEnglish:\n- information ⊃ (example, illustration) ⊃ sample ⊃ random sample\n- information ⊃ datum\n- random sample (second synset)\n- example ⊳ exemplify\n- illustration ⊳ illustrate\n- resignate\n\nSpanish:\n- información, ejemplo, ilustración, muestra aleatoria\n- ejemplo ⊳ ejemplificar\n- ilustración ⊳ ilustrar\n\n-->\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n  <Lexicon id=\"test-en\"\n           label=\"Testing English WordNet\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\"\n           url=\"https://example.com/test-en\"\n           dc:description=\"An example lexicon for testing.\"\n           confidenceScore=\"0.9\">\n\n    <LexicalEntry id=\"test-en-information-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"information\" script=\"Latn\">\n        <Tag category=\"tag-category\">tag-text</Tag>\n      </Lemma>\n      <Sense id=\"test-en-information-n-0001-01\" synset=\"test-en-0001-n\">\n        <Count dc:source=\"some corpus\">3</Count>\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-example-n\" confidenceScore=\"1.0\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"example\" />\n      <Sense id=\"test-en-example-n-0002-01\" synset=\"test-en-0002-n\" >\n        <SenseRelation relType=\"derivation\" target=\"test-en-exemplify-v-0003-01\" confidenceScore=\"0.5\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-sample-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"sample\" />\n      <Sense id=\"test-en-sample-n-0004-01\" synset=\"test-en-0004-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-random_sample-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"random sample\" />\n      <Sense id=\"test-en-random_sample-n-0005-01\" synset=\"test-en-0005-n\" />\n      <Sense id=\"test-en-random_sample-n-0005-02\" synset=\"test-en-0008-n\" lexicalized=\"false\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-exemplify-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"exemplify\" />\n      <Form writtenForm=\"exemplifies\" />\n      <Form writtenForm=\"exemplified\" />\n      <Form writtenForm=\"exemplifying\" />\n      <Sense id=\"test-en-exemplify-v-0003-01\" synset=\"test-en-0003-v\" >\n        <SenseRelation relType=\"derivation\" target=\"test-en-example-n-0002-01\" />\n      </Sense>\n      <SyntacticBehaviour senses=\"test-en-exemplify-v-0003-01\" subcategorizationFrame=\"Somebody ----s something\" />\n      <SyntacticBehaviour senses=\"test-en-exemplify-v-0003-01\" subcategorizationFrame=\"Something ----s something\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-illustration-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"illustration\" />\n      <Sense id=\"test-en-illustration-n-0002-01\" synset=\"test-en-0002-n\" >\n        <SenseRelation relType=\"derivation\" target=\"test-en-illustrate-v-0003-01\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-illustrate-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"illustrate\" />\n      <Sense id=\"test-en-illustrate-v-0003-01\" synset=\"test-en-0003-v\" >\n        <SenseRelation relType=\"derivation\" target=\"test-en-illustration-n-0002-01\" />\n        <SenseRelation relType=\"other\" target=\"test-en-illustration-n-0002-01\" dc:type=\"result\" />\n        <SenseRelation relType=\"other\" target=\"test-en-illustration-n-0002-01\" dc:type=\"event\" />\n      </Sense>\n      <SyntacticBehaviour senses=\"test-en-illustrate-v-0003-01\" subcategorizationFrame=\"Somebody ----s something\" />\n      <SyntacticBehaviour senses=\"test-en-illustrate-v-0003-01\" subcategorizationFrame=\"Something ----s something\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-datum-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"datum\" />\n      <Form writtenForm=\"data\" />\n      <Sense id=\"test-en-datum-n-0006-01\" synset=\"test-en-0006-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-en-resignate-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"resignate\" />\n      <Sense id=\"test-en-resignate-v-0007-01\" synset=\"test-en-0007-v\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-en-0001-n\" ili=\"i67447\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\" confidenceScore=\"1.0\">\n      <Definition sourceSense=\"test-en-information-n-0001-01\" confidenceScore=\"0.95\">something that informs</Definition>\n      <SynsetRelation relType=\"hyponym\" target=\"test-en-0002-n\" confidenceScore=\"0.8\" />\n      <SynsetRelation relType=\"hyponym\" target=\"test-en-0006-n\" confidenceScore=\"0.8\" />\n      <Example confidenceScore=\"0.7\">\"this is information\"</Example>\n    </Synset>\n\n    <Synset id=\"test-en-0002-n\" ili=\"i67469\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>something that exemplifies</Definition>\n      <SynsetRelation relType=\"hypernym\" target=\"test-en-0001-n\" />\n      <SynsetRelation relType=\"hyponym\" target=\"test-en-0004-n\" />\n      <Example>\"this is an example\"</Example>\n    </Synset>\n\n    <Synset id=\"test-en-0003-v\" ili=\"i26682\" partOfSpeech=\"v\" dc:subject=\"verb.communication\">\n      <Definition>providing an example</Definition>\n    </Synset>\n\n    <Synset id=\"test-en-0004-n\" ili=\"i67474\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>a subset of exemplars from some population</Definition>\n      <SynsetRelation relType=\"hypernym\" target=\"test-en-0002-n\" />\n      <SynsetRelation relType=\"hyponym\" target=\"test-en-0005-n\" />\n    </Synset>\n\n    <Synset id=\"test-en-0005-n\" ili=\"i67479\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>a sample randomly drawn from some population</Definition>\n      <SynsetRelation relType=\"hypernym\" target=\"test-en-0004-n\" />\n    </Synset>\n\n    <Synset id=\"test-en-0008-n\" ili=\"\" partOfSpeech=\"n\" lexicalized=\"false\" dc:subject=\"noun.cognition\">\n      <Definition>a sample that is random</Definition>\n    </Synset>\n\n    <Synset id=\"test-en-0006-n\" ili=\"i67448\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>a measured or recorded piece of information</Definition>\n      <SynsetRelation relType=\"hypernym\" target=\"test-en-0001-n\" />\n    </Synset>\n\n    <Synset id=\"test-en-0007-v\" ili=\"in\" partOfSpeech=\"v\" dc:subject=\"verb.social\">\n      <!-- should probably be a hyponym of i33760 -->\n      <ILIDefinition dc:creator=\"MM\">to fire someone while making it look like it was their idea</ILIDefinition>\n    </Synset>\n\n  </Lexicon>\n\n  <Lexicon id=\"test-es\"\n           label=\"Testing Spanish WordNet\"\n           language=\"es\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\"\n           url=\"https://example.com/test-es\">\n\n    <LexicalEntry id=\"test-es-información-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"información\" />\n      <Sense id=\"test-es-información-n-0001-01\" synset=\"test-es-0001-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-es-ejemplo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"ejemplo\" />\n      <Sense id=\"test-es-ejemplo-n-0002-01\" synset=\"test-es-0002-n\" >\n        <SenseRelation relType=\"derivation\" target=\"test-es-ejemplificar-v-0003-01\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-es-muestra_aleatoria-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"muestra aleatoria\" />\n      <Sense id=\"test-es-muestra_aleatoria-n-0005-01\" synset=\"test-es-0005-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-es-ejemplificar-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"ejemplificar\" />\n      <Sense id=\"test-es-ejemplificar-v-0003-01\" synset=\"test-es-0003-v\" >\n        <SenseRelation relType=\"derivation\" target=\"test-es-ejemplo-n-0002-01\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-es-ilustración-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"ilustración\" />\n      <Sense id=\"test-es-ilustración-n-0002-01\" synset=\"test-es-0002-n\" >\n        <SenseRelation relType=\"derivation\" target=\"test-es-ilustrar-v-0003-01\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-es-ilustrar-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"ilustrar\" />\n      <Sense id=\"test-es-ilustrar-v-0003-01\" synset=\"test-es-0003-v\" >\n        <SenseRelation relType=\"derivation\" target=\"test-es-ilustración-n-0002-01\" />\n      </Sense>\n    </LexicalEntry>\n\n    <Synset id=\"test-es-0001-n\" ili=\"i67447\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>algo que informa</Definition>\n      <Example>\"este es la información\"</Example>\n    </Synset>\n\n    <Synset id=\"test-es-0002-n\" ili=\"i67469\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>algo que ejemplifica</Definition>\n      <Example>\"este es el ejemplo\"</Example>\n    </Synset>\n\n    <Synset id=\"test-es-0003-v\" ili=\"i26682\" partOfSpeech=\"v\" dc:subject=\"verb.communication\">\n      <Definition>dar un ejemplo</Definition>\n    </Synset>\n\n    <Synset id=\"test-es-0005-n\" ili=\"i67479\" partOfSpeech=\"n\" dc:subject=\"noun.cognition\">\n      <Definition>una muestra extraída aleatoriamente de alguna población</Definition>\n    </Synset>\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/mini-lmf-1.1.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.1.dtd\">\n\n<LexicalResource xmlns:dc=\"https://globalwordnet.github.io/schemas/dc/\">\n\n  <Lexicon id=\"test-ja\"\n           label=\"Testing Japanese WordNet\"\n           language=\"ja\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\"\n           url=\"https://example.com/test-ja\"\n           logo=\"logo.svg\">\n\n    <Requires id=\"test-en\" version=\"1\" />\n\n    <LexicalEntry id=\"test-ja-情報-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"情報\" script=\"Jpan\" />\n      <Form id=\"test-ja-情報-n-じょうほう\" writtenForm=\"じょうほう\" script=\"Hira\" />\n      <Form id=\"test-ja-情報-n-ジョウホウ\" writtenForm=\"ジョウホウ\" script=\"Kana\" />\n      <Form id=\"test-ja-情報-n-zyouhou\" writtenForm=\"zyouhou\" script=\"Latn-kunrei\" />\n      <Sense id=\"test-ja-情報-n-0001-01\" synset=\"test-ja-0001-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-ja-例え-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"例え\">\n        <Pronunciation variety=\"standard\" notation=\"ipa\" audio=\"tatoe.wav\">tatoe</Pronunciation>\n      </Lemma>\n      <Form id=\"test-ja-例え-n-たとえ\" writtenForm=\"たとえ\" script=\"Hira\" />\n      <Form id=\"test-ja-例え-n-タトエ\" writtenForm=\"タトエ\" script=\"Kana\" />\n      <Form id=\"test-ja-例え-n-tatoe\" writtenForm=\"tatoe\" script=\"Latn-kunrei\" />\n      <Sense id=\"test-ja-例え-n-0002-01\" synset=\"test-ja-0002-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-ja-事例-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"事例\" />\n      <Form id=\"test-ja-事例-n-じれい\" writtenForm=\"じれい\" script=\"Hira\" />\n      <Form id=\"test-ja-事例-n-ジレイ\" writtenForm=\"ジレイ\" script=\"Kana\" />\n      <Form id=\"test-ja-事例-n-zirei\" writtenForm=\"zirei\" script=\"Latn-kunrei\" />\n      <Sense id=\"test-ja-事例-n-0002-01\" synset=\"test-ja-0002-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-ja-示す-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"示す\" />\n      <Form id=\"test-ja-示す-v-しめす\" writtenForm=\"しめす\" script=\"Hira\" />\n      <Form id=\"test-ja-示す-v-シメス\" writtenForm=\"シメス\" script=\"Kana\" />\n      <Form id=\"test-ja-示す-v-simesu\" writtenForm=\"simesu\" script=\"Latn-kunrei\" />\n      <Sense id=\"test-ja-示す-v-0003-01\" synset=\"test-ja-0003-v\" subcat=\"frame-1\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-ja-0001-n\" ili=\"i67447\" partOfSpeech=\"n\" lexfile=\"noun.cognition\"\n            members=\"test-ja-情報-n-0001-01\" />\n\n    <Synset id=\"test-ja-0002-n\" ili=\"i67469\" partOfSpeech=\"n\" lexfile=\"noun.cognition\"\n            members=\"test-ja-事例-n-0002-01 test-ja-例え-n-0002-01\" />\n\n    <Synset id=\"test-ja-0003-v\" ili=\"i26682\" partOfSpeech=\"v\" lexfile=\"verb.communication\"\n            members=\"test-ja-示す-v-0003-01\" />\n\n    <SyntacticBehaviour id=\"frame-1\" subcategorizationFrame=\"ある人が何かを----\" />\n  </Lexicon>\n\n  <LexiconExtension id=\"test-en-ext\"\n                    label=\"Testing English Extension\"\n                    language=\"en\"\n                    email=\"maintainer@example.com\"\n                    license=\"https://creativecommons.org/licenses/by/4.0/\"\n                    version=\"1\"\n                    url=\"https://example.com/test-en-ext\">\n\n    <Extends id=\"test-en\" version=\"1\" url=\"https://example.com/test-en\" />\n\n    <!-- add sense relation -->\n    <ExternalLexicalEntry id=\"test-en-information-n\">\n      <ExternalLemma>\n        <!-- pronunciations copied from the Open English Wordnet 2024 (CC-BY-4.0) -->\n        <Pronunciation variety=\"GB\">ˌɪnfəˈmeɪʃən</Pronunciation>\n        <Pronunciation variety=\"US\">ˌɪnfɚˈmeɪʃən</Pronunciation>\n      </ExternalLemma>\n      <ExternalSense id=\"test-en-information-n-0001-01\">\n        <SenseRelation relType=\"pertainym\" target=\"test-en-ext-info-n-0001-01\" />\n      </ExternalSense>\n    </ExternalLexicalEntry>\n\n    <!-- add a tag -->\n    <ExternalLexicalEntry id=\"test-en-exemplify-v\">\n      <ExternalLemma>\n        <Tag category=\"tense\">INF</Tag>\n      </ExternalLemma>\n    </ExternalLexicalEntry>\n\n    <!-- add a sense to an existing entry -->\n    <ExternalLexicalEntry id=\"test-en-illustrate-v\">\n      <Sense id=\"test-en-ext-illustrate-v-0008-01\" synset=\"test-en-ext-0008-v\">\n\t    <Example>\"the artist illustrated the story beautifully\"</Example>\n      </Sense>\n    </ExternalLexicalEntry>\n\n    <!-- add a new entry for an existing synset -->\n    <LexicalEntry id=\"test-en-ext-info-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"info\" />\n      <Sense id=\"test-en-ext-info-n-0001-01\" synset=\"test-en-0001-n\">\n        <SenseRelation relType=\"pertainym\" target=\"test-en-information-n-0001-01\" />\n      </Sense>\n    </LexicalEntry>\n    \n    <!-- add a new entry with a new synset -->\n    <LexicalEntry id=\"test-en-ext-fire-v\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"fire\" />\n      <Sense id=\"test-en-ext-fire-v-0009-01\" synset=\"test-en-ext-0009-v\" subcat=\"social-transitive\" />\n    </LexicalEntry>\n\n    <!-- only needed for ids -->\n    <ExternalSynset id=\"test-en-0001-n\" />\n\n    <!-- add a relation to an existing synset -->\n    <ExternalSynset id=\"test-en-0007-v\">\n      <SynsetRelation relType=\"hypernym\" target=\"test-en-ext-0009-v\" />\n    </ExternalSynset>\n\n    <Synset id=\"test-en-ext-0008-v\" ili=\"i30181\" partOfSpeech=\"v\"\n            members=\"test-en-ext-illustrate-v-0008-01\" lexfile=\"verb.creation\">\n      <Definition>depict something in a visual medium</Definition>\n    </Synset>\n\n    <Synset id=\"test-en-ext-0009-v\" ili=\"i33760\" partOfSpeech=\"v\"\n            members=\"test-en-ext-fire-v-0009-01\" lexfile=\"verb.social\">\n      <Definition>terminate employment</Definition>\n      <SynsetRelation relType=\"hyponym\" target=\"test-en-0007-v\" />\n    </Synset>\n\n    <SyntacticBehaviour id=\"social-transitive\" subcategorizationFrame=\"Somebody ----s somebody\" />\n\n  </LexiconExtension>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/mini-lmf-1.3.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.3.dtd\">\n\n<!--\nWN-LMF 1.3 is the same as 1.1 and 1.2 except for allowing xml:space on\nnodes with text content.\n-->\n\n<LexicalResource xmlns:dc=\"https://globalwordnet.github.io/schemas/dc/\">\n\n  <Lexicon id=\"test-ws\"\n           label=\"Testing Whitespace WordNet\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\"\n           url=\"https://example.com/test-whitespace\"\n           logo=\"logo.svg\">\n\n    <LexicalEntry id=\"test-ws-foo\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-ws-foo-1\" synset=\"test-ws-1\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-ws-bar\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"bar\" />\n      <Sense id=\"test-ws-bar-2\" synset=\"test-ws-2\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-ws-baz\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"baz\" />\n      <Sense id=\"test-ws-baz-3\" synset=\"test-ws-3\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-ws-1\" ili=\"\" partOfSpeech=\"n\">\n      <Definition>\n        one\n          two\n        three\n      </Definition>\n    </Synset>\n\n    <Synset id=\"test-ws-2\" ili=\"\" partOfSpeech=\"n\">\n      <Definition xml:space=\"default\">\n        one\n          two\n        three\n      </Definition>\n    </Synset>\n\n    <Synset id=\"test-ws-3\" ili=\"\" partOfSpeech=\"n\">\n      <Definition xml:space=\"preserve\">\n        one\n          two\n        three\n      </Definition>\n    </Synset>\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/mini-lmf-1.4.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.4.dtd\">\n\n<!--\nWN-LMF 1.4 has the following changes:\n- optional 'index' attribute on LexicalEntry\n- optional 'n' attribute on Sense\n- Pronunciation elements under Definition and Example\n-->\n\n<LexicalResource xmlns:dc=\"https://globalwordnet.github.io/schemas/dc/\">\n\n  <Lexicon id=\"test-1.4\"\n           label=\"Testing WN-LMF 1.4\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-1.4-Foo_Bar-n\" index=\"foo_bar\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"Foo Bar\" />\n      <Sense id=\"test-1.4-Foo_Bar-n-1\" synset=\"test-1.4-1\" n=\"3\">\n        <SenseRelation relType=\"metaphor\" target=\"test-1.4-baz-n-1\" />\n      </Sense>\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-1.4-foo_bar-n\" index=\"foo_bar\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo bar\" />\n      <Sense id=\"test-1.4-foo_bar-n-1\" synset=\"test-1.4-1\" n=\"2\" />\n      <Sense id=\"test-1.4-foo_bar-n-2\" synset=\"test-1.4-2\" n=\"1\" />\n    </LexicalEntry>\n\n    <!-- ommitted index defaults to writtenForm (baz) when added to db -->\n    <LexicalEntry id=\"test-1.4-baz-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"baz\" />\n      <Sense id=\"test-1.4-baz-n-1\" synset=\"test-1.4-1\">\n        <SenseRelation relType=\"has_metaphor\" target=\"test-1.4-Foo_Bar-n-1\" />\n      </Sense>\n    </LexicalEntry>\n\n    <!-- this should share the index with the one above -->\n    <LexicalEntry id=\"test-1.4-BAZ-n\" index=\"baz\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"BAZ\" />\n      <Sense id=\"test-1.4-BAZ-n-1\" synset=\"test-1.4-1\" n=\"2\" />\n    </LexicalEntry>\n\n    <!-- this one does not share the index -->\n    <LexicalEntry id=\"test-1.4-Baz-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"Baz\" />\n      <Sense id=\"test-1.4-Baz-n-1\" synset=\"test-1.4-1\" n=\"2\" />\n      <!-- omitted 'n' defaults to position (2) when added to db -->\n      <Sense id=\"test-1.4-Baz-n-2\" synset=\"test-1.4-2\" />\n    </LexicalEntry>\n\n    <!-- indexes are shared only in the same part of speech -->\n    <LexicalEntry id=\"test-1.4-baz-v\" index=\"baz\">\n      <Lemma partOfSpeech=\"v\" writtenForm=\"baz\" />\n      <Sense id=\"test-1.4-baz-v-1\" synset=\"test-1.4-3\" n=\"1\" />\n    </LexicalEntry>\n\n    <Synset id=\"test-1.4-1\" ili=\"\" partOfSpeech=\"n\" members=\"test-1.4-Foo_Bar-n-1 test-1.4-foo_bar-n-1 test-1.4-baz-n-1 test-1.4-BAZ-n-1 test-1.4-Baz-n-1\" />\n\n    <Synset id=\"test-1.4-2\" ili=\"\" partOfSpeech=\"n\" members=\"test-1.4-foo_bar-n-2 test-1.4-Baz-n-2\" />\n\n    <Synset id=\"test-1.4-3\" ili=\"\" partOfSpeech=\"v\" members=\"test-1.4-baz-v-1\" />\n\n  </Lexicon>\n\n  <LexiconExtension id=\"test-ext-1.4\"\n                    label=\"Testing WN-LMF 1.4 extensions\"\n                    language=\"zxx\"\n                    email=\"maintainer@example.com\"\n                    license=\"https://creativecommons.org/licenses/by/4.0/\"\n                    version=\"1\">\n    <!-- WN-LMF 1.4 changes the 'id' attribute to 'ref' in Requires and Extends -->\n    <Extends ref=\"test-en\" version=\"1\" url=\"https://example.com/test-en\" />\n  </LexiconExtension>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/sense-key-variations.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.1.dtd\">\n<LexicalResource xmlns:dc=\"https://globalwordnet.github.io/schemas/dc/\">\n\n  <Lexicon id=\"omw-en\"\n           label=\"OMW English Wordnet based on WordNet-3.0 sample\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://wordnet.princeton.edu/license-and-commercial-use\"\n           version=\"1.4\"\n           url=\"https://github.com/omwn/omw-data\"\n           citation=\"Christiane Fellbaum (1998, ed.) *WordNet: An Electronic Lexical Database*. MIT Press.\">\n    <LexicalEntry id=\"omw-en--apos-s_Gravenhage-n\">\n      <Lemma writtenForm=\"'s Gravenhage\" partOfSpeech=\"n\" />\n      <Sense id=\"omw-en--apos-s_Gravenhage-08950407-n\" synset=\"omw-en-08950407-n\" dc:identifier=\"'s_gravenhage%1:15:00::\" />\n    </LexicalEntry>\n    <Synset id=\"omw-en-08950407-n\" ili=\"\" />\n  </Lexicon>\n\n  <Lexicon id=\"oewn\"\n           label=\"Open Engish Wordnet sample\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0\"\n           version=\"2024\"\n           url=\"https://github.com/globalwordnet/english-wordnet\">\n    <LexicalEntry id=\"oewn--ap-s_Gravenhage-n\">\n      <Lemma writtenForm=\"&apos;s Gravenhage\" partOfSpeech=\"n\"/>\n      <Sense id=\"oewn--ap-s_gravenhage__1.15.00..\" synset=\"oewn-08970180-n\"/>\n    </LexicalEntry>\n    <Synset id=\"oewn-08970180-n\" ili=\"\" />\n  </Lexicon>\n\n</LexicalResource>"
  },
  {
    "path": "tests/data/sense-key-variations2.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.1.dtd\">\n<LexicalResource xmlns:dc=\"https://globalwordnet.github.io/schemas/dc/\">\n\n  <Lexicon id=\"oewn\"\n           label=\"Open Engish Wordnet sample\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0\"\n           version=\"2025\"\n           url=\"https://github.com/globalwordnet/english-wordnet\">\n    <LexicalEntry id=\"oewn--apos-s_Gravenhage-n\">\n      <Lemma writtenForm=\"&apos;s Gravenhage\" partOfSpeech=\"n\"/>\n      <Sense id=\"oewn--apos-s_gravenhage__1.15.00..\" synset=\"oewn-08970180-n\"/>\n    </LexicalEntry>\n    <Synset id=\"oewn-08970180-n\" ili=\"\" />\n  </Lexicon>\n\n</LexicalResource>"
  },
  {
    "path": "tests/data/sense-member-order.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.1.dtd\">\n<LexicalResource xmlns:dc=\"http://purl.org/dc/elements/1.1/\">\n\n<!-- duplicate ID in synsets -->\n\n  <Lexicon id=\"test\"\n           label=\"Testing Sense Member Orders\"\n           language=\"en\"\n           email=\"maintainer@example.com\"\n           license=\"https://creativecommons.org/licenses/by/4.0/\"\n           version=\"1\">\n\n    <LexicalEntry id=\"test-foo-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"foo\" />\n      <Sense id=\"test-01-foo-n\" synset=\"test-01-n\" />\n      <Sense id=\"test-02-foo-n\" synset=\"test-02-n\" />\n    </LexicalEntry>\n\n    <LexicalEntry id=\"test-bar-n\">\n      <Lemma partOfSpeech=\"n\" writtenForm=\"bar\" />\n      <Sense id=\"test-02-bar-n\" synset=\"test-02-n\" />\n      <Sense id=\"test-01-bar-n\" synset=\"test-01-n\" />\n    </LexicalEntry>\n\n    <!-- sense IDs as members -->\n    <Synset id=\"test-01-n\" ili=\"i12345\" partOfSpeech=\"n\" members=\"test-01-bar-n test-01-foo-n\"/>\n    <!-- word IDs as members -->\n    <Synset id=\"test-02-n\" ili=\"i12346\" partOfSpeech=\"n\" members=\"test-bar-n test-foo-n\" />\n\n  </Lexicon>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/data/test-package/LICENSE",
    "content": "Test License\n"
  },
  {
    "path": "tests/data/test-package/README.md",
    "content": "# Test README\n"
  },
  {
    "path": "tests/data/test-package/citation.bib",
    "content": "% test bib\n"
  },
  {
    "path": "tests/data/test-package/test-wn.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1.3.dtd\">\n\n<LexicalResource>\n\n</LexicalResource>\n"
  },
  {
    "path": "tests/db_test.py",
    "content": "import sqlite3\nimport threading\n\nimport pytest\n\nimport wn\nfrom wn import lmf\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_schema_compatibility():\n    conn = sqlite3.connect(str(wn.config.database_path))\n    schema_hash = wn._db.schema_hash(conn)\n    assert schema_hash in wn._db.COMPATIBLE_SCHEMA_HASHES\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_db_multithreading():\n    \"\"\"\n    See https://github.com/goodmami/wn/issues/86\n    Thanks: @fushinari\n    \"\"\"\n\n    class WNThread:\n        w = None\n\n        def __init__(self):\n            w_thread = threading.Thread(target=self.set_w)\n            w_thread.start()\n            w_thread.join()\n            self.w.synsets()\n\n        def set_w(self):\n            if self.w is None:\n                self.w = wn.Wordnet()\n\n    # close the connections by resetting the pool\n    wn._db.pool = {}\n    with pytest.raises(sqlite3.ProgrammingError):\n        WNThread()\n    wn._db.pool = {}\n    wn.config.allow_multithreading = True\n    WNThread()  # no error\n    wn.config.allow_multithreading = False\n    wn._db.pool = {}\n\n\ndef test_remove_extension(datadir, tmp_path):\n    old_data_dir = wn.config.data_directory\n    wn.config.data_directory = tmp_path / \"wn_data_1_1_trigger\"\n    wn.add(datadir / \"mini-lmf-1.0.xml\")\n    wn.add(datadir / \"mini-lmf-1.1.xml\")\n    assert len(wn.lexicons()) == 4\n    wn.remove(\"test-en-ext\")\n    assert len(wn.lexicons()) == 3\n    wn.remove(\"test-ja\")\n    assert len(wn.lexicons()) == 2\n    wn.add(datadir / \"mini-lmf-1.1.xml\")\n    assert len(wn.lexicons()) == 4\n    wn.remove(\"test-en\")\n    assert {lex.id for lex in wn.lexicons()} == {\"test-es\", \"test-ja\"}\n    wn.config.data_directory = old_data_dir\n    # close any open DB connections before teardown\n    for conn in wn._db.pool.values():\n        conn.close()\n\n\ndef test_add_lexical_resource(datadir, tmp_path):\n    old_data_dir = wn.config.data_directory\n    wn.config.data_directory = tmp_path / \"wn_data_add_lexical_resource\"\n    wn.add_lexical_resource(lmf.load(datadir / \"mini-lmf-1.0.xml\"))\n    assert len(wn.lexicons()) == 2\n    wn.add_lexical_resource(lmf.load(datadir / \"mini-lmf-1.1.xml\"))\n    assert len(wn.lexicons()) == 4\n    wn.config.data_directory = old_data_dir\n    # close any open DB connections before teardown\n    for conn in wn._db.pool.values():\n        conn.close()\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_reset_database(datadir):\n    wn.add(datadir / \"mini-lmf-1.0.xml\")\n    assert {lex.specifier() for lex in wn.lexicons()} == {\"test-en:1\", \"test-es:1\"}\n    wn.reset_database(rebuild=False)  # cannot rebuild from unindexed local files\n    assert wn.lexicons() == []\n"
  },
  {
    "path": "tests/export_test.py",
    "content": "from xml.etree import ElementTree as ET\n\nimport pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_export(datadir, tmp_path):\n    tmpdir = tmp_path / \"test_export\"\n    tmpdir.mkdir()\n    tmppath = tmpdir / \"mini_lmf_export.xml\"\n    lexicons = wn.lexicons(lexicon=\"test-en test-es\")\n    wn.export(lexicons, tmppath, version=\"1.0\")\n\n    # remove comments, indentation, etc.\n    orig = ET.canonicalize(from_file=datadir / \"mini-lmf-1.0.xml\", strip_text=True)\n    temp = ET.canonicalize(from_file=tmppath, strip_text=True)\n    # additional transformation to help with debugging\n    orig = orig.replace(\"<\", \"\\n<\")\n    temp = temp.replace(\"<\", \"\\n<\")\n    assert orig == temp\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_export_1_1(datadir, tmp_path):\n    tmpdir = tmp_path / \"test_export_1_1\"\n    tmpdir.mkdir()\n    tmppath = tmpdir / \"mini_lmf_export_1_1.xml\"\n    lexicons = wn.lexicons(lexicon=\"test-ja test-en-ext\")\n    wn.export(lexicons, tmppath, version=\"1.1\")\n\n    # remove comments, indentation, etc.\n    orig = ET.canonicalize(from_file=datadir / \"mini-lmf-1.1.xml\", strip_text=True)\n    temp = ET.canonicalize(from_file=tmppath, strip_text=True)\n    # additional transformation to help with debugging\n    orig = orig.replace(\"<\", \"\\n<\")\n    temp = temp.replace(\"<\", \"\\n<\")\n    assert orig == temp\n\n    # fails when exporting to WN-LMF 1.0\n    with pytest.raises(wn.Error):\n        wn.export(lexicons, tmppath, version=\"1.0\")\n\n\n@pytest.mark.usefixtures(\"mini_db_1_4\")\ndef test_export_1_4(datadir, tmp_path):\n    tmpdir = tmp_path / \"test_export_1_4\"\n    tmpdir.mkdir()\n    tmppath = tmpdir / \"mini_lmf_export_1_4.xml\"\n    lexicons = wn.lexicons(lexicon=\"test-1.4 test-ext-1.4\")\n    wn.export(lexicons, tmppath, version=\"1.4\")\n\n    # remove comments, indentation, etc.\n    orig = ET.canonicalize(from_file=datadir / \"mini-lmf-1.4.xml\", strip_text=True)\n    temp = ET.canonicalize(from_file=tmppath, strip_text=True)\n    # additional transformation to help with debugging\n    orig = orig.replace(\"<\", \"\\n<\")\n    temp = temp.replace(\"<\", \"\\n<\")\n    assert orig == temp\n"
  },
  {
    "path": "tests/ic_test.py",
    "content": "from math import log\n\nimport pytest\n\nimport wn\nimport wn.ic\nfrom wn.constants import ADJ, ADV, NOUN, VERB\nfrom wn.util import synset_id_formatter\n\nsynset_id = {\n    \"information\": \"test-en-0001-n\",\n    \"illustration_example\": \"test-en-0002-n\",\n    \"sample\": \"test-en-0004-n\",\n    \"random_sample\": \"test-en-0005-n\",\n    \"random_sample2\": \"test-en-0008-n\",  # no hypernyms\n    \"datum\": \"test-en-0006-n\",\n    \"illustrate_exemplify\": \"test-en-0003-v\",\n    \"resignate\": \"test-en-0007-v\",\n}\n\n\nwords = [\n    \"For\",\n    \"example\",\n    \":\",\n    \"random sample\",\n    \".\",\n    \"This\",\n    \"will\",\n    \"illustrate\",\n    \"and\",\n    \"exemplify\",\n    \".\",\n    \"A\",\n    \"sample\",\n    \"of\",\n    \"data\",\n    \".\",\n]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_compute_nodistribute_nosmoothing():\n    w = wn.Wordnet(\"test-en:1\")\n    assert wn.ic.compute(words, w, distribute_weight=False, smoothing=0) == {\n        NOUN: {\n            synset_id[\"information\"]: 4.0,\n            synset_id[\"illustration_example\"]: 3.0,\n            synset_id[\"sample\"]: 2.0,\n            synset_id[\"random_sample\"]: 1.0,\n            synset_id[\"random_sample2\"]: 1.0,\n            synset_id[\"datum\"]: 1.0,\n            None: 5.0,\n        },\n        VERB: {\n            synset_id[\"illustrate_exemplify\"]: 2.0,\n            synset_id[\"resignate\"]: 0.0,\n            None: 2.0,\n        },\n        ADJ: {None: 0.0},\n        ADV: {None: 0.0},\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_compute_nodistribute_smoothing():\n    w = wn.Wordnet(\"test-en:1\")\n    assert wn.ic.compute(words, w, distribute_weight=False, smoothing=1.0) == {\n        NOUN: {\n            synset_id[\"information\"]: 5.0,\n            synset_id[\"illustration_example\"]: 4.0,\n            synset_id[\"sample\"]: 3.0,\n            synset_id[\"random_sample\"]: 2.0,\n            synset_id[\"random_sample2\"]: 2.0,\n            synset_id[\"datum\"]: 2.0,\n            None: 6.0,\n        },\n        VERB: {\n            synset_id[\"illustrate_exemplify\"]: 3.0,\n            synset_id[\"resignate\"]: 1.0,\n            None: 3.0,\n        },\n        ADJ: {None: 1.0},\n        ADV: {None: 1.0},\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_compute_distribute_smoothing():\n    w = wn.Wordnet(\"test-en:1\")\n    assert wn.ic.compute(words, w, distribute_weight=True, smoothing=1.0) == {\n        NOUN: {\n            synset_id[\"information\"]: 4.5,\n            synset_id[\"illustration_example\"]: 3.5,\n            synset_id[\"sample\"]: 2.5,\n            synset_id[\"random_sample\"]: 1.5,\n            synset_id[\"random_sample2\"]: 1.5,\n            synset_id[\"datum\"]: 2.0,\n            None: 5.0,\n        },\n        VERB: {\n            synset_id[\"illustrate_exemplify\"]: 3.0,\n            synset_id[\"resignate\"]: 1.0,\n            None: 3.0,\n        },\n        ADJ: {None: 1.0},\n        ADV: {None: 1.0},\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_load(tmp_path):\n    w = wn.Wordnet(\"test-en:1\")\n    icpath = tmp_path / \"foo.dat\"\n    icpath.write_text(\n        \"wnver:1234567890AbCdEf\\n\"\n        \"1n 4.0 ROOT\\n\"\n        \"2n 3.0\\n\"\n        \"4n 2.0\\n\"\n        \"5n 1.0\\n\"\n        \"8n 1.0 ROOT\\n\"\n        \"6n 1.0\\n\"\n        \"3v 2.0 ROOT\\n\"\n        \"7v 0.0 ROOT\\n\"\n    )\n\n    get_synset_id = synset_id_formatter(\"test-en-{offset:04}-{pos}\")\n    assert wn.ic.load(icpath, w, get_synset_id=get_synset_id) == wn.ic.compute(\n        words, w, distribute_weight=False, smoothing=0.0\n    )\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_information_content():\n    w = wn.Wordnet(\"test-en:1\")\n    ic = wn.ic.compute(words, w)\n    info = w.synsets(\"information\")[0]\n    samp = w.synsets(\"sample\")[0]\n    # info is a root but not the only one, so its IC is not 0.0\n    assert wn.ic.information_content(info, ic) == -log(ic[\"n\"][info.id] / ic[\"n\"][None])\n    assert wn.ic.information_content(samp, ic) == -log(ic[\"n\"][samp.id] / ic[\"n\"][None])\n"
  },
  {
    "path": "tests/ili_test.py",
    "content": "from pathlib import Path\n\nimport pytest\n\nimport wn\nfrom wn import ili\n\nI67447_DEFN = \"knowledge acquired through study or experience or instruction\"\n\n\ndef test_is_ili_tsv(datadir: Path) -> None:\n    assert ili.is_ili_tsv(datadir / \"mini-ili.tsv\")\n    assert ili.is_ili_tsv(datadir / \"mini-ili-with-status.tsv\")\n    assert not ili.is_ili_tsv(datadir / \"mini-lmf-1.0.xml\")\n    assert not ili.is_ili_tsv(datadir / \"does-not-exist\")\n\n\ndef test_load_tsv(datadir: Path) -> None:\n    assert list(ili.load_tsv(datadir / \"mini-ili.tsv\")) == [\n        {\"ili\": \"i1\", \"definition\": \"i1 definition\"},\n        {\"ili\": \"i2\", \"definition\": \"\"},\n        {\"ili\": \"i67447\", \"definition\": I67447_DEFN},\n    ]\n    assert list(ili.load_tsv(datadir / \"mini-ili-with-status.tsv\")) == [\n        {\"ili\": \"i1\", \"definition\": \"i1 definition\", \"status\": \"active\"},\n        {\"ili\": \"i2\", \"definition\": \"\", \"status\": \"deprecated\"},\n        {\"ili\": \"i67447\", \"definition\": I67447_DEFN, \"status\": \"active\"},\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_get() -> None:\n    # present in ili file, not in lexicon\n    i = ili.get(\"i1\")\n    assert i.id == \"i1\"\n    assert i.status == ili.ILIStatus.ACTIVE\n    assert i.definition() == \"i1 definition\"\n    defn = i.definition(data=True)\n    assert defn.text == \"i1 definition\"\n    assert defn.metadata() == {}\n    assert defn.confidence() == 1.0\n    # present in lexicon, not in ili file\n    i = ili.get(\"i67469\")\n    assert i.id == \"i67469\"\n    assert i.status == ili.ILIStatus.PRESUPPOSED\n    assert i.definition() is None\n    assert i.definition(data=True) is None\n    # present in ili file and lexicon\n    i = ili.get(\"i67447\")\n    assert i.id == \"i67447\"\n    assert i.status == ili.ILIStatus.ACTIVE\n    assert i.definition() == I67447_DEFN\n    defn = i.definition(data=True)\n    assert defn.text == I67447_DEFN\n    assert defn.metadata() == {}\n    assert defn.confidence() == 1.0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_get_proposed() -> None:\n    proposed_defn = \"to fire someone while making it look like it was their idea\"\n    # synset with proposed ili\n    ss = wn.synset(\"test-en-0007-v\", lexicon=\"test-en\")\n    i = ili.get_proposed(ss)\n    assert i is not None\n    assert i.id is None\n    assert i.synset() == ss\n    assert i.status == ili.ILIStatus.PROPOSED\n    assert i.lexicon() == ss.lexicon()\n    assert i.definition() == proposed_defn\n    defn = i.definition(data=True)\n    assert defn.text == proposed_defn\n    assert defn.metadata() == {\"creator\": \"MM\"}\n    assert defn.confidence() == 0.9  # inherited from lexicon\n\n    # synset without proposed ili\n    ss = wn.synset(\"test-en-0006-n\", lexicon=\"test-en\")\n    assert ili.get_proposed(ss) is None\n"
  },
  {
    "path": "tests/lmf_test.py",
    "content": "from xml.etree import ElementTree as ET\n\nfrom wn import lmf\n\n\ndef test_is_lmf(datadir):\n    assert lmf.is_lmf(datadir / \"mini-lmf-1.0.xml\")\n    assert lmf.is_lmf(str(datadir / \"mini-lmf-1.0.xml\"))\n    assert not lmf.is_lmf(datadir / \"README.md\")\n    assert not lmf.is_lmf(datadir / \"missing.xml\")\n    assert lmf.is_lmf(datadir / \"mini-lmf-1.1.xml\")\n\n\ndef test_scan_lexicons(datadir):\n    assert lmf.scan_lexicons(datadir / \"mini-lmf-1.0.xml\") == [\n        {\n            \"id\": \"test-en\",\n            \"version\": \"1\",\n            \"label\": \"Testing English WordNet\",\n            \"extends\": None,\n        },\n        {\n            \"id\": \"test-es\",\n            \"version\": \"1\",\n            \"label\": \"Testing Spanish WordNet\",\n            \"extends\": None,\n        },\n    ]\n\n    assert lmf.scan_lexicons(datadir / \"mini-lmf-1.1.xml\") == [\n        {\n            \"id\": \"test-ja\",\n            \"version\": \"1\",\n            \"label\": \"Testing Japanese WordNet\",\n            \"extends\": None,\n        },\n        {\n            \"id\": \"test-en-ext\",\n            \"version\": \"1\",\n            \"label\": \"Testing English Extension\",\n            \"extends\": {\n                \"id\": \"test-en\",\n                \"version\": \"1\",\n            },\n        },\n    ]\n\n\ndef test_load_1_0(datadir):\n    resource = lmf.load(datadir / \"mini-lmf-1.0.xml\")\n    lexicons = resource[\"lexicons\"]\n    assert len(lexicons) == 2\n    lexicon = lexicons[0]\n\n    assert lexicon[\"id\"] == \"test-en\"\n    assert lexicon[\"label\"] == \"Testing English WordNet\"\n    assert lexicon[\"language\"] == \"en\"\n    assert lexicon[\"email\"] == \"maintainer@example.com\"\n    assert lexicon[\"license\"] == \"https://creativecommons.org/licenses/by/4.0/\"\n    assert lexicon[\"version\"] == \"1\"\n    assert lexicon[\"url\"] == \"https://example.com/test-en\"\n\n    assert len(lexicon[\"entries\"]) == 9\n    le = lexicon[\"entries\"][0]\n    assert le[\"id\"] == \"test-en-information-n\"\n\n    assert le[\"lemma\"][\"writtenForm\"] == \"information\"\n    assert le[\"lemma\"][\"partOfSpeech\"] == \"n\"\n    assert le[\"lemma\"][\"script\"] == \"Latn\"\n    assert len(le[\"lemma\"][\"tags\"]) == 1\n\n    assert len(le.get(\"forms\", [])) == 0\n\n    assert len(le[\"senses\"]) == 1\n    sense = le[\"senses\"][0]\n    assert sense[\"id\"] == \"test-en-information-n-0001-01\"\n    assert sense[\"synset\"] == \"test-en-0001-n\"\n    assert len(sense.get(\"relations\", [])) == 0\n    # assert sense[\"relations\"][0][\"target\"] == \"test-en-exemplify-v-01023137-01\"\n    # assert sense[\"relations\"][0][\"type\"] == \"derivation\"\n\n    assert len(lexicon.get(\"frames\", [])) == 0  # frames are on lexical entry\n    assert len(lexicon[\"entries\"][6][\"frames\"]) == 2\n    frames = lexicon[\"entries\"][6][\"frames\"]\n    assert frames[0][\"subcategorizationFrame\"] == \"Somebody ----s something\"\n    assert frames[0][\"senses\"] == [\"test-en-illustrate-v-0003-01\"]\n\n    assert len(lexicon[\"synsets\"]) == 8\n\n    assert lexicons[1][\"id\"] == \"test-es\"\n\n\ndef test_load_1_1(datadir):\n    resource = lmf.load(datadir / \"mini-lmf-1.1.xml\")\n    lexicons = resource[\"lexicons\"]\n    assert len(lexicons) == 2\n    lexicon = lexicons[0]\n    assert lexicon[\"id\"] == \"test-ja\"\n    assert lexicon[\"version\"] == \"1\"\n    # assert lexicon.logo == \"logo.svg\"\n    assert lexicon.get(\"requires\") == [{\"id\": \"test-en\", \"version\": \"1\"}]\n\n    lexicon = lexicons[1]\n    assert lexicon[\"id\"] == \"test-en-ext\"\n    assert lexicon.get(\"extends\") == {\n        \"id\": \"test-en\",\n        \"url\": \"https://example.com/test-en\",\n        \"version\": \"1\",\n    }\n\n\ndef test_load_1_3(datadir):\n    resource = lmf.load(datadir / \"mini-lmf-1.3.xml\")\n    lexicons = resource[\"lexicons\"]\n    assert len(lexicons) == 1\n    lexicon = lexicons[0]\n    synsets = lexicon[\"synsets\"]\n    assert synsets[0][\"definitions\"][0][\"text\"] == \"one two three\"\n    assert synsets[1][\"definitions\"][0][\"text\"] == \"one two three\"\n    assert (\n        synsets[2][\"definitions\"][0][\"text\"]\n        == \"\"\"\n        one\n          two\n        three\n      \"\"\"\n    )\n\n\ndef test_load_1_4(datadir):\n    resource = lmf.load(datadir / \"mini-lmf-1.4.xml\")\n    lexicons = resource[\"lexicons\"]\n    assert len(lexicons) == 2\n    lexicon = lexicons[0]\n    assert lexicon[\"entries\"][0].get(\"index\") == \"foo_bar\"\n    assert lexicon[\"entries\"][1].get(\"index\") == \"foo_bar\"\n    assert lexicon[\"entries\"][2].get(\"index\") is None\n    assert lexicon[\"entries\"][3].get(\"index\") == \"baz\"\n    assert lexicon[\"entries\"][4].get(\"index\") is None\n    assert lexicon[\"entries\"][5].get(\"index\") == \"baz\"\n\n    assert lexicon[\"entries\"][0][\"senses\"][0].get(\"n\") == 3\n    assert lexicon[\"entries\"][1][\"senses\"][0].get(\"n\") == 2\n    assert lexicon[\"entries\"][1][\"senses\"][1].get(\"n\") == 1\n    assert lexicon[\"entries\"][2][\"senses\"][0].get(\"n\") is None\n    assert lexicon[\"entries\"][3][\"senses\"][0].get(\"n\") == 2\n    assert lexicon[\"entries\"][4][\"senses\"][0].get(\"n\") == 2\n    assert lexicon[\"entries\"][4][\"senses\"][1].get(\"n\") is None\n    assert lexicon[\"entries\"][5][\"senses\"][0].get(\"n\") == 1\n\n    extension = lexicons[1]\n    assert extension[\"id\"] == \"test-ext-1.4\"\n    assert extension.get(\"extends\") == {\n        \"id\": \"test-en\",\n        \"version\": \"1\",\n        \"url\": \"https://example.com/test-en\",\n    }\n\n\ndef test_dump(datadir, tmp_path):\n    tmpdir = tmp_path / \"test_dump\"\n    tmpdir.mkdir()\n    tmppath = tmpdir / \"mini_lmf_dump.xml\"\n\n    def assert_xml_equal(mini_lmf, dump_lmf):\n        orig = ET.canonicalize(from_file=mini_lmf, strip_text=True)\n        temp = ET.canonicalize(from_file=dump_lmf, strip_text=True)\n        # additional transformation to help with debugging\n        orig = orig.replace(\"<\", \"\\n<\")\n        temp = temp.replace(\"<\", \"\\n<\")\n        assert orig == temp\n\n    lmf.dump(lmf.load(datadir / \"mini-lmf-1.0.xml\"), tmppath)\n    assert_xml_equal(datadir / \"mini-lmf-1.0.xml\", tmppath)\n\n    lmf.dump(lmf.load(datadir / \"mini-lmf-1.1.xml\"), tmppath)\n    assert_xml_equal(datadir / \"mini-lmf-1.1.xml\", tmppath)\n\n    lmf.dump(lmf.load(datadir / \"mini-lmf-1.4.xml\"), tmppath)\n    assert_xml_equal(datadir / \"mini-lmf-1.4.xml\", tmppath)\n"
  },
  {
    "path": "tests/morphy_test.py",
    "content": "import pytest\n\nimport wn\nfrom wn import morphy\n\n\ndef test_morphy_uninitialized():\n    # An unintialized Morphy isn't very bright, but it starts up\n    # fast. It relies on the database to filter bad items.\n    m = morphy.Morphy()\n    assert m(\"example\", \"n\") == {\"n\": {\"example\"}}\n    assert m(\"examples\", \"n\") == {\"n\": {\"examples\", \"example\"}}\n    assert m(\"examples\", \"v\") == {\"v\": {\"examples\", \"example\", \"exampl\"}}\n    assert m(\"exemplifying\", \"n\") == {\"n\": {\"exemplifying\"}}\n    assert m(\"exemplifying\", \"v\") == {\"v\": {\"exemplifying\", \"exemplify\", \"exemplifye\"}}\n    assert m(\"data\", \"n\") == {\"n\": {\"data\"}}\n    assert m(\"datums\", \"n\") == {\"n\": {\"datums\", \"datum\"}}  # expected false positive\n    assert m(\"examples\", None) == {\n        None: {\"examples\"},\n        \"n\": {\"example\"},\n        \"v\": {\"example\", \"exampl\"},\n    }\n    assert m(\"exemplifying\", None) == {\n        None: {\"exemplifying\"},\n        \"v\": {\"exemplify\", \"exemplifye\"},\n    }\n    assert m(\"data\", None) == {None: {\"data\"}}\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_morphy_initialized():\n    w = wn.Wordnet(\"test-en:1\")\n    m = morphy.Morphy(wordnet=w)\n    assert m(\"example\", \"n\") == {\"n\": {\"example\"}}\n    assert m(\"examples\", \"n\") == {\"n\": {\"example\"}}\n    assert m(\"examples\", \"v\") == {}\n    assert m(\"exemplifying\", \"n\") == {}\n    assert m(\"exemplifying\", \"v\") == {\"v\": {\"exemplify\"}}\n    assert m(\"data\", \"n\") == {\"n\": {\"datum\"}}\n    assert m(\"datums\", \"n\") == {\"n\": {\"datum\"}}  # expected false positive\n    assert m(\"examples\", None) == {\"n\": {\"example\"}}\n    assert m(\"exemplifying\", None) == {\"v\": {\"exemplify\"}}\n    assert m(\"data\", None) == {\"n\": {\"datum\"}}\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_issue_154():\n    # https://github.com/goodmami/wn/issues/154\n    w = wn.Wordnet(\"test-en:1\")\n    assert w.words(\"exemplifies\") == [w.word(\"test-en-exemplify-v\")]\n    assert w.words(\"samples\") == []\n    w = wn.Wordnet(\"test-en:1\", lemmatizer=morphy.Morphy())\n    assert w.words(\"exemplifies\") == [w.word(\"test-en-exemplify-v\")]\n    assert w.words(\"samples\") == [w.word(\"test-en-sample-n\")]\n"
  },
  {
    "path": "tests/primary_query_test.py",
    "content": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_lexicons_uninitialized():\n    assert len(wn.lexicons()) == 0\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_lexicons_empty():\n    assert len(wn.lexicons()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_lexicons_mini():\n    assert len(wn.lexicons()) == 2\n    assert all(isinstance(lex, wn.Lexicon) for lex in wn.lexicons())\n\n    results = wn.lexicons(lang=\"en\")\n    assert len(results) == 1\n    assert results[0].language == \"en\"\n    results = wn.lexicons(lang=\"es\")\n    assert len(results) == 1\n    assert results[0].language == \"es\"\n\n    assert len(wn.lexicons(lexicon=\"*\")) == 2\n    assert len(wn.lexicons(lexicon=\"*:1\")) == 2\n    assert len(wn.lexicons(lexicon=\"test-*\")) == 2\n    assert len(wn.lexicons(lexicon=\"*-en\")) == 1\n    results = wn.lexicons(lexicon=\"test-en\")\n    assert len(results) == 1\n    assert results[0].language == \"en\"\n    results = wn.lexicons(lexicon=\"test-en:1\")\n    assert len(results) == 1\n    assert results[0].language == \"en\"\n    results = wn.lexicons(lexicon=\"test-en:*\")\n    assert len(results) == 1\n    assert results[0].language == \"en\"\n\n    assert wn.lexicons(lexicon=\"test-en\")[0].specifier() == \"test-en:1\"\n    assert wn.lexicons(lexicon=\"test-es\")[0].specifier() == \"test-es:1\"\n\n    assert wn.lexicons(lexicon=\"test-en\")[0].requires() == {}\n    assert wn.lexicons(lexicon=\"test-es\")[0].requires() == {}\n\n    lex = wn.lexicons(lexicon=\"test-en\")[0]  # hashability\n    assert {lex: \"foo\"}[lex] == \"foo\"\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_lexicons_unknown():\n    results = wn.lexicons(lang=\"unk\")\n    assert len(results) == 0\n    results = wn.lexicons(lexicon=\"test-unk\")\n    assert len(results) == 0\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_words_empty():\n    assert len(wn.words()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_words_mini():\n    assert len(wn.words()) == 15\n    assert all(isinstance(w, wn.Word) for w in wn.words())\n\n    words = wn.words(\"information\")  # search lemma\n    assert len(words) == 1\n    assert words[0].lemma() == \"information\"\n\n    lemma = words[0].lemma(data=True)\n    assert lemma.value == \"information\"\n    assert lemma.script == \"Latn\"\n    assert lemma.tags() == [wn.Tag(\"tag-text\", \"tag-category\")]\n\n    words = wn.words(\"exemplifies\")  # search secondary form\n    assert len(words) == 1\n    assert words[0].lemma() == \"exemplify\"\n\n    assert len(wn.words(pos=\"n\")) == 10\n    assert all(w.pos == \"n\" for w in wn.words(pos=\"n\"))\n    assert len(wn.words(pos=\"v\")) == 5\n    assert len(wn.words(pos=\"q\")) == 0  # fake pos\n\n    assert len(wn.words(lang=\"en\")) == 9\n    assert len(wn.words(lang=\"es\")) == 6\n\n    assert len(wn.words(lexicon=\"test-en\")) == 9\n    assert len(wn.words(lexicon=\"test-es\")) == 6\n\n    assert len(wn.words(lang=\"en\", lexicon=\"test-en\")) == 9\n    assert len(wn.words(pos=\"v\", lang=\"en\")) == 3\n    assert len(wn.words(\"information\", lang=\"en\")) == 1\n    assert len(wn.words(\"information\", lang=\"es\")) == 0\n\n    with pytest.raises(wn.Error):\n        wn.words(lang=\"unk\")\n    with pytest.raises(wn.Error):\n        wn.words(lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_lemmas_empty():\n    assert len(wn.lemmas()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db_1_4\")\ndef test_lemmas_mini_1_4():\n    wordnet = wn.Wordnet(lexicon=\"test-1.4\")\n    all_lemmas = wordnet.lemmas()\n    assert len(all_lemmas) == 5\n    assert all(isinstance(lemma, str) for lemma in all_lemmas)\n    assert all_lemmas == [\"Foo Bar\", \"foo bar\", \"baz\", \"BAZ\", \"Baz\"]\n\n    # data=True should return Form objects and should not dedup\n    lemmas_with_data = wordnet.lemmas(data=True)\n    assert len(lemmas_with_data) == 6  # includes duplicate 'baz'\n    assert all(isinstance(lemma, wn.Form) for lemma in lemmas_with_data)\n    assert [f.value for f in lemmas_with_data] == [\n        \"Foo Bar\",\n        \"foo bar\",\n        \"baz\",\n        \"BAZ\",\n        \"Baz\",\n        \"baz\",\n    ]\n\n    # Test deduplication\n    baz_lemmas = wordnet.lemmas(\"baz\", data=False)\n    assert baz_lemmas == [\"baz\", \"BAZ\", \"Baz\"]\n\n    # With data=True, no dedup\n    baz_forms = wordnet.lemmas(\"baz\", data=True)\n    assert [f.value for f in baz_forms] == [\"baz\", \"BAZ\", \"Baz\", \"baz\"]\n\n    # Filter by POS\n    assert len(wordnet.lemmas(pos=\"n\")) == 5  # Foo Bar, foo bar, baz, BAZ, Baz\n    assert len(wordnet.lemmas(pos=\"v\")) == 1  # baz\n    assert len(wordnet.lemmas(pos=\"q\")) == 0  # fake pos\n\n    # Verify lemmas() returns same results as words() + .lemma()\n    words = wordnet.words()\n    lemmas_from_words = [w.lemma() for w in words]\n    lemmas_direct = wordnet.lemmas()\n    assert set(lemmas_from_words) == set(lemmas_direct)\n\n    # Test wn module function to wordnet instance method\n    assert wn.lemmas(lexicon=\"test-1.4\") == wordnet.lemmas()\n    assert wn.lemmas(data=True, lexicon=\"test-1.4\") == wordnet.lemmas(data=True)\n\n    with pytest.raises(wn.Error):\n        wn.lemmas(lang=\"unk\")\n    with pytest.raises(wn.Error):\n        wn.lemmas(lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_word_empty():\n    with pytest.raises(wn.Error):\n        assert wn.word(\"test-es-información-n\")\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_mini():\n    assert wn.word(\"test-es-información-n\")\n    assert wn.word(\"test-es-información-n\", lang=\"es\")\n    assert wn.word(\"test-es-información-n\", lexicon=\"test-es\")\n    with pytest.raises(wn.Error):\n        assert wn.word(\"test-es-información-n\", lang=\"en\")\n    with pytest.raises(wn.Error):\n        assert wn.word(\"test-es-información-n\", lexicon=\"test-en\")\n    with pytest.raises(wn.Error):\n        assert wn.word(\"test-es-información-n\", lang=\"unk\")\n    with pytest.raises(wn.Error):\n        assert wn.word(\"test-es-información-n\", lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_senses_empty():\n    assert len(wn.senses()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_senses_mini():\n    assert len(wn.senses()) == 16\n    assert all(isinstance(s, wn.Sense) for s in wn.senses())\n\n    senses = wn.senses(\"information\")  # search lemma\n    assert len(senses) == 1\n    assert senses[0].word().lemma() == \"information\"\n    assert senses[0].counts() == [3]\n\n    senses = wn.senses(\"exemplifies\")  # search secondary form\n    assert len(senses) == 1\n    assert senses[0].word().lemma() == \"exemplify\"\n    assert senses[0].word().lemma() in {\"exemplify\"}\n    assert \"exemplify\" in {senses[0].word().lemma()}\n\n    assert len(wn.senses(pos=\"n\")) == 11\n    assert len(wn.senses(pos=\"v\")) == 5\n    assert len(wn.senses(pos=\"q\")) == 0  # fake pos\n\n    assert len(wn.senses(lang=\"en\")) == 10\n    assert len(wn.senses(lang=\"es\")) == 6\n\n    assert len(wn.senses(lexicon=\"test-en\")) == 10\n    assert len(wn.senses(lexicon=\"test-es\")) == 6\n\n    assert len(wn.senses(lang=\"en\", lexicon=\"test-en\")) == 10\n    assert len(wn.senses(pos=\"v\", lang=\"en\")) == 3\n    assert len(wn.senses(\"information\", lang=\"en\")) == 1\n    assert len(wn.senses(\"information\", lang=\"es\")) == 0\n\n    with pytest.raises(wn.Error):\n        wn.senses(lang=\"unk\")\n    with pytest.raises(wn.Error):\n        wn.senses(lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_sense_empty():\n    with pytest.raises(wn.Error):\n        assert wn.sense(\"test-es-información-n-0001-01\")\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_mini():\n    assert wn.sense(\"test-es-información-n-0001-01\")\n    assert wn.sense(\"test-es-información-n-0001-01\", lang=\"es\")\n    assert wn.sense(\"test-es-información-n-0001-01\", lexicon=\"test-es\")\n    with pytest.raises(wn.Error):\n        assert wn.sense(\"test-es-información-n-0001-01\", lang=\"en\")\n    with pytest.raises(wn.Error):\n        assert wn.sense(\"test-es-información-n-0001-01\", lexicon=\"test-en\")\n    with pytest.raises(wn.Error):\n        assert wn.sense(\"test-es-información-n-0001-01\", lang=\"unk\")\n    with pytest.raises(wn.Error):\n        assert wn.sense(\"test-es-información-n-0001-01\", lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_synsets_empty():\n    assert len(wn.synsets()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synsets_mini():\n    assert len(wn.synsets()) == 12\n    assert all(isinstance(ss, wn.Synset) for ss in wn.synsets())\n\n    synsets = wn.synsets(\"information\")  # search lemma\n    assert len(synsets) == 1\n    assert \"information\" in synsets[0].lemmas()\n\n    synsets = wn.synsets(\"exemplifies\")  # search secondary form\n    assert len(synsets) == 1\n    assert \"exemplify\" in synsets[0].lemmas()\n\n    assert len(wn.synsets(pos=\"n\")) == 9\n    assert len(wn.synsets(pos=\"v\")) == 3\n    assert len(wn.synsets(pos=\"q\")) == 0  # fake pos\n\n    assert len(wn.synsets(ili=\"i67469\")) == 2\n    assert len(wn.synsets(ili=\"i67468\")) == 0\n\n    assert len(wn.synsets(lang=\"en\")) == 8\n    assert len(wn.synsets(lang=\"es\")) == 4\n\n    assert len(wn.synsets(lexicon=\"test-en\")) == 8\n    assert len(wn.synsets(lexicon=\"test-es\")) == 4\n\n    assert len(wn.synsets(lang=\"en\", lexicon=\"test-en\")) == 8\n    assert len(wn.synsets(pos=\"v\", lang=\"en\")) == 2\n    assert len(wn.synsets(\"information\", lang=\"en\")) == 1\n    assert len(wn.synsets(\"information\", lang=\"es\")) == 0\n    assert len(wn.synsets(ili=\"i67469\", lang=\"es\")) == 1\n\n    with pytest.raises(wn.Error):\n        wn.synsets(lang=\"unk\")\n    with pytest.raises(wn.Error):\n        wn.synsets(lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"empty_db\")\ndef test_synset_empty():\n    with pytest.raises(wn.Error):\n        assert wn.synset(\"test-es-0001-n\")\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_mini():\n    assert wn.synset(\"test-es-0001-n\")\n    assert wn.synset(\"test-es-0001-n\", lang=\"es\")\n    assert wn.synset(\"test-es-0001-n\", lexicon=\"test-es\")\n    with pytest.raises(wn.Error):\n        assert wn.synset(\"test-es-0001-n\", lang=\"en\")\n    with pytest.raises(wn.Error):\n        assert wn.synset(\"test-es-0001-n\", lexicon=\"test-en\")\n    with pytest.raises(wn.Error):\n        assert wn.synset(\"test-es-0001-n\", lang=\"unk\")\n    with pytest.raises(wn.Error):\n        assert wn.synset(\"test-es-0001-n\", lexicon=\"test-unk\")\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_mini_1_1():\n    assert len(wn.lexicons()) == 4\n    assert len(wn.lexicons(lang=\"en\")) == 2\n    assert len(wn.lexicons(lang=\"ja\")) == 1\n    assert wn.lexicons(lang=\"ja\")[0].logo == \"logo.svg\"\n\n    w = wn.Wordnet(lang=\"en\")\n    assert len(w.lexicons()) == 2\n    assert len(w.expanded_lexicons()) == 0\n    assert len(w.word(\"test-en-exemplify-v\").lemma(data=True).tags()) == 1\n\n    w = wn.Wordnet(lang=\"ja\")\n    assert len(w.lexicons()) == 1\n    assert len(w.expanded_lexicons()) == 1\n    assert len(w.synsets(\"例え\")[0].hypernyms()) == 1\n    assert w.synsets(\"例え\")[0].lexfile() == \"noun.cognition\"\n    assert len(w.word(\"test-ja-例え-n\").lemma(data=True).pronunciations()) == 1\n    assert w.word(\"test-ja-例え-n\").forms(data=True)[1].id == \"test-ja-例え-n-たとえ\"\n    p = w.word(\"test-ja-例え-n\").lemma(data=True).pronunciations()[0]\n    assert p.value == \"tatoe\"\n    assert p.variety == \"standard\"\n    assert p.notation == \"ipa\"\n    assert p.phonemic\n    assert p.audio == \"tatoe.wav\"\n\n    w = wn.Wordnet(lang=\"ja\", expand=\"\")\n    assert len(w.lexicons()) == 1\n    assert len(w.expanded_lexicons()) == 0\n    assert len(w.synsets(\"例え\")[0].hypernyms()) == 0\n\n    w = wn.Wordnet(lexicon=\"test-en test-en-ext\")\n    assert len(w.lexicons()) == 2\n    assert len(w.expanded_lexicons()) == 0\n    assert len(w.synsets(\"fire\")[0].hyponyms()) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_mini_1_1_lexicons():\n    lex = wn.lexicons(lexicon=\"test-en\")[0]\n    assert lex.specifier() == \"test-en:1\"\n    assert not lex.requires()\n    assert lex.extends() is None\n    assert len(lex.extensions()) == 1\n    assert lex.extensions()[0].specifier() == \"test-en-ext:1\"\n\n    lex = wn.lexicons(lexicon=\"test-es\")[0]\n    assert lex.specifier() == \"test-es:1\"\n    assert not lex.requires()\n    assert lex.extends() is None\n    assert len(lex.extensions()) == 0\n\n    lex = wn.lexicons(lexicon=\"test-en-ext\")[0]\n    assert lex.specifier() == \"test-en-ext:1\"\n    assert not lex.requires()\n    assert lex.extends() is not None\n    assert lex.extends().specifier() == \"test-en:1\"\n    assert len(lex.extensions()) == 0\n\n    lex = wn.lexicons(lexicon=\"test-ja\")[0]\n    assert lex.specifier() == \"test-ja:1\"\n    assert \"test-en:1\" in lex.requires()\n    assert lex.extends() is None\n    assert len(lex.extensions()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db_1_4\")\ndef test_mini_1_4():\n    w = wn.Wordnet(\"test-1.4:1\", normalizer=None)\n    # even without a normalizer, entries sharing an index are matched\n    assert len(w.words(\"Foo Bar\")) == 2\n    assert len(w.words(\"foo bar\")) == 2\n    # if the index is missing, the lemma is used; normalization doesn't happen\n    assert len(w.words(\"baz\")) == 3\n    assert len(w.words(\"Baz\")) == 1\n    # sense order follows values of 'n'\n    assert [s.id for s in w.senses(\"foo bar\")] == [\n        \"test-1.4-foo_bar-n-2\",\n        \"test-1.4-foo_bar-n-1\",\n        \"test-1.4-Foo_Bar-n-1\",\n    ]\n    assert [s.id for s in w.senses(\"baz\")] == [\n        \"test-1.4-baz-n-1\",\n        \"test-1.4-BAZ-n-1\",\n        \"test-1.4-baz-v-1\",\n    ]\n    assert [s.id for s in w.senses(\"baz\", pos=\"v\")] == [\n        \"test-1.4-baz-v-1\",\n    ]\n    # order is undecided when implicit or explicit valus of n are overlapping\n    assert {s.id for s in w.senses(\"Baz\")} == {\n        \"test-1.4-Baz-n-1\",\n        \"test-1.4-Baz-n-2\",\n    }\n    # synset order also follows index\n    assert [ss.id for ss in w.synsets(\"foo bar\")] == [\n        \"test-1.4-2\",\n        \"test-1.4-1\",\n    ]\n"
  },
  {
    "path": "tests/project_test.py",
    "content": "from wn import project\n\n\ndef test_is_package_directory(datadir):\n    assert project.is_package_directory(datadir / \"test-package\")\n    assert not project.is_package_directory(datadir)\n\n\ndef test_is_collection_directory(datadir):\n    # not really, but it is a directory containing a package\n    assert project.is_collection_directory(datadir)\n    assert not project.is_collection_directory(datadir / \"test-package\")\n\n\ndef test_get_project(datadir):\n    proj = project.get_project(path=datadir / \"test-package\")\n    assert proj.type == \"wordnet\"\n    assert proj.resource_file() == datadir / \"test-package\" / \"test-wn.xml\"\n    assert proj.readme() == datadir / \"test-package\" / \"README.md\"\n    assert proj.license() == datadir / \"test-package\" / \"LICENSE\"\n    assert proj.citation() == datadir / \"test-package\" / \"citation.bib\"\n\n    proj = project.get_project(path=datadir / \"mini-lmf-1.0.xml\")\n    assert proj.type == \"wordnet\"\n    assert proj.resource_file() == datadir / \"mini-lmf-1.0.xml\"\n    assert proj.readme() is None\n    assert proj.license() is None\n    assert proj.citation() is None\n\n\ndef test_iterpackages(datadir):\n    # for now, collection.packages() does not return contained resource files\n    pkg_names = {pkg.resource_file().name for pkg in project.iterpackages(datadir)}\n    assert \"mini-lmf-1.0.xml\" not in pkg_names\n    assert \"test-wn.xml\" in pkg_names\n\n    # explicitly giving a resource file path works, though\n    pkg_names = {\n        pkg.resource_file().name\n        for pkg in project.iterpackages(datadir / \"mini-lmf-1.0.xml\")\n    }\n    assert \"mini-lmf-1.0.xml\" in pkg_names\n    assert \"test-wn.xml\" not in pkg_names\n\n\ndef test_compressed_iterpackages(mini_lmf_compressed):\n    for pkg in project.iterpackages(mini_lmf_compressed):\n        assert pkg.type == \"wordnet\"\n        assert pkg.resource_file().exists()\n    # ensure cleanup of temporary data\n    assert not pkg.resource_file().exists()\n    # ensure original file not deleted\n    assert mini_lmf_compressed.exists()\n"
  },
  {
    "path": "tests/relations_test.py",
    "content": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_derived_words():\n    assert len(wn.word(\"test-en-example-n\").derived_words()) == 1\n    assert len(wn.word(\"test-es-ejemplo-n\").derived_words()) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_hypernyms():\n    assert wn.synset(\"test-en-0002-n\").hypernyms() == [wn.synset(\"test-en-0001-n\")]\n    assert wn.synset(\"test-en-0001-n\").hypernyms() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_hypernyms_expand_default():\n    assert wn.synset(\"test-es-0002-n\").hypernyms() == [wn.synset(\"test-es-0001-n\")]\n    assert wn.synset(\"test-es-0001-n\").hypernyms() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_hypernyms_expand_empty():\n    w = wn.Wordnet(lang=\"es\", expand=\"\")\n    assert w.synset(\"test-es-0002-n\").hypernyms() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_hypernyms_expand_specified():\n    w = wn.Wordnet(lang=\"es\", expand=\"test-en\")\n    assert w.synset(\"test-es-0002-n\").hypernyms() == [w.synset(\"test-es-0001-n\")]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_relations():\n    w = wn.Wordnet(lang=\"en\")\n    assert w.synset(\"test-en-0002-n\").relations() == {\n        \"hypernym\": [w.synset(\"test-en-0001-n\")],\n        \"hyponym\": [w.synset(\"test-en-0004-n\")],\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_get_related():\n    w = wn.Wordnet(\"test-en\")\n    assert w.sense(\"test-en-example-n-0002-01\").get_related() == [\n        w.sense(\"test-en-exemplify-v-0003-01\")\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_relations():\n    w = wn.Wordnet(\"test-en\")\n    assert w.sense(\"test-en-example-n-0002-01\").relations() == {\n        \"derivation\": [w.sense(\"test-en-exemplify-v-0003-01\")]\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_extension_relations():\n    # default mode\n    assert wn.synset(\"test-en-0007-v\").hypernyms() == [wn.synset(\"test-en-ext-0009-v\")]\n    assert wn.synset(\"test-en-ext-0009-v\").hyponyms() == [wn.synset(\"test-en-0007-v\")]\n    assert wn.sense(\"test-en-information-n-0001-01\").get_related(\"pertainym\") == [\n        wn.sense(\"test-en-ext-info-n-0001-01\")\n    ]\n    assert wn.sense(\"test-en-ext-info-n-0001-01\").get_related(\"pertainym\") == [\n        wn.sense(\"test-en-information-n-0001-01\")\n    ]\n\n    # restricted to base\n    w = wn.Wordnet(lexicon=\"test-en\")\n    assert w.synset(\"test-en-0007-v\").hypernyms() == []\n    assert w.sense(\"test-en-information-n-0001-01\").get_related(\"pertainym\") == []\n\n    # base and extension\n    w = wn.Wordnet(lexicon=\"test-en test-en-ext\")\n    assert w.synset(\"test-en-0007-v\").hypernyms() == [w.synset(\"test-en-ext-0009-v\")]\n    assert w.synset(\"test-en-ext-0009-v\").hyponyms() == [w.synset(\"test-en-0007-v\")]\n    assert w.sense(\"test-en-information-n-0001-01\").get_related(\"pertainym\") == [\n        w.sense(\"test-en-ext-info-n-0001-01\")\n    ]\n    assert w.sense(\"test-en-ext-info-n-0001-01\").get_related(\"pertainym\") == [\n        w.sense(\"test-en-information-n-0001-01\")\n    ]\n\n    # restricted to extension\n    w = wn.Wordnet(lexicon=\"test-en-ext\")\n    assert w.synset(\"test-en-ext-0009-v\").hyponyms() == []\n    assert w.sense(\"test-en-ext-info-n-0001-01\").get_related(\"pertainym\") == []\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_sense_synset_issue_168():\n    # https://github.com/goodmami/wn/issues/168\n    ja = wn.Wordnet(lexicon=\"test-ja\", expand=\"\")\n    assert ja.synset(\"test-ja-0001-n\").get_related() == []\n    assert ja.sense(\"test-ja-情報-n-0001-01\").synset().get_related() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_relations_issue_169():\n    # https://github.com/goodmami/wn/issues/169\n    en = wn.Wordnet(\"test-en\")\n    assert list(en.synset(\"test-en-0001-n\").relations(\"hyponym\")) == [\"hyponym\"]\n    es = wn.Wordnet(\"test-es\", expand=\"test-en\")\n    assert list(es.synset(\"test-es-0001-n\").relations(\"hyponym\")) == [\"hyponym\"]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_relations_issue_177():\n    # https://github.com/goodmami/wn/issues/177\n    assert \"hyponym\" in wn.synset(\"test-es-0001-n\").relations()\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_relation_data_true():\n    en = wn.Wordnet(\"test-en\")\n    assert en.sense(\"test-en-information-n-0001-01\").relations(data=True) == {}\n    relmap = en.sense(\"test-en-illustrate-v-0003-01\").relations(data=True)\n    # only sense-sense relations by default\n    assert len(relmap) == 3\n    assert all(isinstance(tgt, wn.Sense) for tgt in relmap.values())\n    assert {rel.name for rel in relmap} == {\"derivation\", \"other\"}\n    assert {rel.target_id for rel in relmap} == {\"test-en-illustration-n-0002-01\"}\n    # sense relations targets should always have same ids as resolved targets\n    assert all(rel.target_id == tgt.id for rel, tgt in relmap.items())\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_relations_data_true():\n    en = wn.Wordnet(\"test-en\")\n    assert en.synset(\"test-en-0003-v\").relations(data=True) == {}\n    relmap = en.synset(\"test-en-0002-n\").relations(data=True)\n    assert len(relmap) == 2\n    assert {rel.name for rel in relmap} == {\"hypernym\", \"hyponym\"}\n    assert {rel.target_id for rel in relmap} == {\"test-en-0001-n\", \"test-en-0004-n\"}\n    # synset relation targets have same ids as resolved targets in same lexicon\n    assert all(rel.target_id == tgt.id for rel, tgt in relmap.items())\n    assert all(rel.lexicon().id == \"test-en\" for rel in relmap)\n\n    # interlingual synset relation targets show original target ids\n    es = wn.Wordnet(\"test-es\", expand=\"test-en\")\n    relmap = es.synset(\"test-es-0002-n\").relations(data=True)\n    assert len(relmap) == 2\n    assert {rel.name for rel in relmap} == {\"hypernym\", \"hyponym\"}\n    assert {rel.target_id for rel in relmap} == {\"test-en-0001-n\", \"test-en-0004-n\"}\n    assert all(rel.target_id != tgt.id for rel, tgt in relmap.items())\n    assert all(rel.lexicon().id == \"test-en\" for rel in relmap)\n"
  },
  {
    "path": "tests/secondary_query_test.py",
    "content": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_senses():\n    assert len(wn.word(\"test-en-information-n\").senses()) == 1\n    assert len(wn.word(\"test-es-información-n\").senses()) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_synsets():\n    assert len(wn.word(\"test-en-information-n\").synsets()) == 1\n    assert len(wn.word(\"test-es-información-n\").synsets()) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_translate():\n    assert len(wn.word(\"test-en-example-n\").translate(lang=\"es\")) == 1\n    assert len(wn.word(\"test-es-ejemplo-n\").translate(lang=\"en\")) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_translate_issue_316():\n    # https://github.com/goodmami/wn/issues/316\n    es = wn.Wordnet(\"test-es\")\n    es_w = es.word(\"test-es-información-n\")\n    translations = es_w.translate(lexicon=\"test-en\")\n    assert len(translations) == 1\n    assert next(iter(translations.values()))[0].forms() == [\"information\"]\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_word_lemma_tags():\n    en = wn.Wordnet(\"test-en\")\n    assert en.word(\"test-en-exemplify-v\").lemma(data=True).tags() == []\n    ext = wn.Wordnet(\"test-en test-en-ext\")\n    assert ext.word(\"test-en-exemplify-v\").lemma(data=True).tags() == [\n        wn.Tag(tag=\"INF\", category=\"tense\")\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_word_lemma_pronunciations():\n    en = wn.Wordnet(\"test-en\")\n    assert en.word(\"test-en-information-n\").lemma(data=True).pronunciations() == []\n    ext = wn.Wordnet(\"test-en test-en-ext\")\n    assert ext.word(\"test-en-information-n\").lemma(data=True).pronunciations() == [\n        wn.Pronunciation(value=\"ˌɪnfəˈmeɪʃən\", variety=\"GB\"),  # noqa: RUF001\n        wn.Pronunciation(value=\"ˌɪnfɚˈmeɪʃən\", variety=\"US\"),  # noqa: RUF001\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_word():\n    assert wn.sense(\"test-en-information-n-0001-01\").word() == wn.word(\n        \"test-en-information-n\"\n    )\n    assert wn.sense(\"test-es-información-n-0001-01\").word() == wn.word(\n        \"test-es-información-n\"\n    )\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_synset():\n    assert wn.sense(\"test-en-information-n-0001-01\").synset() == wn.synset(\n        \"test-en-0001-n\"\n    )\n    assert wn.sense(\"test-es-información-n-0001-01\").synset() == wn.synset(\n        \"test-es-0001-n\"\n    )\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_issue_157():\n    # https://github.com/goodmami/wn/issues/157\n    sense = wn.sense(\"test-en-information-n-0001-01\")\n    # This test uses non-public members, which is not ideal, but there\n    # is currently no better alternative.\n    assert sense._lexconf is sense.word()._lexconf\n    assert sense._lexconf is sense.synset()._lexconf\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_examples():\n    assert wn.sense(\"test-en-information-n-0001-01\").examples() == []\n    assert wn.sense(\"test-es-información-n-0001-01\").examples() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_counts():\n    assert wn.sense(\"test-en-information-n-0001-01\").counts() == [3]\n    counts = wn.sense(\"test-en-information-n-0001-01\").counts(data=True)\n    assert counts[0].value == 3\n    assert counts[0].lexicon().specifier() == \"test-en:1\"\n    assert wn.sense(\"test-es-información-n-0001-01\").counts() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_lexicalized():\n    assert wn.sense(\"test-en-information-n-0001-01\").lexicalized()\n    assert wn.sense(\"test-es-información-n-0001-01\").lexicalized()\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_frames():\n    assert wn.sense(\"test-en-illustrate-v-0003-01\").frames() == [\n        \"Somebody ----s something\",\n        \"Something ----s something\",\n    ]\n    assert wn.sense(\"test-es-ilustrar-v-0003-01\").frames() == []\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_sense_frames_issue_156():\n    # https://github.com/goodmami/wn/issues/156\n    assert wn.sense(\"test-ja-示す-v-0003-01\").frames() == [\n        \"ある人が何かを----\",\n    ]\n    assert wn.sense(\"test-ja-事例-n-0002-01\").frames() == []\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_translate():\n    assert len(wn.sense(\"test-en-information-n-0001-01\").translate(lang=\"es\")) == 1\n    assert len(wn.sense(\"test-es-información-n-0001-01\").translate(lang=\"en\")) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_sense_translate_issue_316():\n    # https://github.com/goodmami/wn/issues/316\n    es = wn.Wordnet(\"test-es\")\n    es_s = es.sense(\"test-es-información-n-0001-01\")\n    assert es_s.translate(lexicon=\"test-en\")[0].counts() == [3]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_senses():\n    assert len(wn.synset(\"test-en-0003-v\").senses()) == 2\n    assert len(wn.synset(\"test-es-0003-v\").senses()) == 2\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_words():\n    assert len(wn.synset(\"test-en-0003-v\").words()) == 2\n    assert len(wn.synset(\"test-es-0003-v\").words()) == 2\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_lemmas():\n    assert wn.synset(\"test-en-0003-v\").lemmas() == [\"exemplify\", \"illustrate\"]\n    assert wn.synset(\"test-es-0003-v\").lemmas() == [\"ejemplificar\", \"ilustrar\"]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_ili():\n    # Synset ILIs are now just strings; see ili_test.py for wn.ili tests\n    assert isinstance(wn.synset(\"test-en-0001-n\").ili, str)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_definition():\n    assert wn.synset(\"test-en-0001-n\").definition() == \"something that informs\"\n    defn = wn.synset(\"test-en-0001-n\").definition(data=True)\n    assert defn.source_sense_id == \"test-en-information-n-0001-01\"\n    assert defn.lexicon().specifier() == \"test-en:1\"\n    assert wn.synset(\"test-es-0001-n\").definition() == \"algo que informa\"\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_definitions():\n    assert wn.synset(\"test-en-0001-n\").definitions() == [\"something that informs\"]\n    defns = wn.synset(\"test-en-0001-n\").definitions(data=True)\n    assert defns[0].source_sense_id == \"test-en-information-n-0001-01\"\n    assert wn.synset(\"test-es-0001-n\").definitions() == [\"algo que informa\"]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_examples():\n    assert wn.synset(\"test-en-0001-n\").examples() == ['\"this is information\"']\n    ex = wn.synset(\"test-en-0001-n\").examples(data=True)[0]\n    assert ex.text == '\"this is information\"'\n    assert ex.lexicon().specifier() == \"test-en:1\"\n    assert wn.synset(\"test-es-0001-n\").examples() == ['\"este es la información\"']\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_lexicalized():\n    assert wn.synset(\"test-en-0001-n\").lexicalized()\n    assert wn.synset(\"test-es-0001-n\").lexicalized()\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_translate():\n    assert len(wn.synset(\"test-en-0001-n\").translate(lang=\"es\")) == 1\n    assert len(wn.synset(\"test-es-0001-n\").translate(lang=\"en\")) == 1\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_synset_translate_issue_316():\n    # https://github.com/goodmami/wn/issues/316\n    es = wn.Wordnet(\"test-es\")\n    en = wn.Wordnet(\"test-en\")\n    es_ss = es.synset(\"test-es-0001-n\")\n    en_ss = en.synset(\"test-en-0001-n\")\n    assert es_ss.translate(lexicon=\"test-en\")[0].definition() == en_ss.definition()\n    assert en_ss.translate(lexicon=\"test-es\")[0].definition() == es_ss.definition()\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_word_sense_order(datadir):\n    wn.add(datadir / \"sense-member-order.xml\")\n    assert [s.id for s in wn.word(\"test-foo-n\").senses()] == [\n        \"test-01-foo-n\",\n        \"test-02-foo-n\",\n    ]\n    assert [s.id for s in wn.word(\"test-bar-n\").senses()] == [\n        \"test-02-bar-n\",\n        \"test-01-bar-n\",\n    ]\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_synset_member_order(datadir):\n    wn.add(datadir / \"sense-member-order.xml\")\n    assert [s.id for s in wn.synset(\"test-01-n\").senses()] == [\n        \"test-01-bar-n\",\n        \"test-01-foo-n\",\n    ]\n    assert [s.id for s in wn.synset(\"test-02-n\").senses()] == [\n        \"test-02-bar-n\",\n        \"test-02-foo-n\",\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_confidence():\n    # default for unmarked lexicon is 1.0\n    assert wn.lexicons(lexicon=\"test-es\")[0].confidence() == 1.0\n    # explicitly set lexicon confidence becomes the default for sub-elements\n    assert wn.lexicons(lexicon=\"test-en\")[0].confidence() == 0.9\n    assert wn.word(\"test-en-information-n\").confidence() == 0.9\n    assert wn.sense(\"test-en-information-n-0001-01\").confidence() == 0.9\n    assert (\n        wn.sense(\"test-en-information-n-0001-01\").counts(data=True)[0].confidence()\n    ) == 0.9\n    assert (\n        wn.sense(\"test-en-exemplify-v-0003-01\")\n        .relations(data=True)\n        .popitem()[0]\n        .confidence()\n    ) == 0.9\n    # explicit value overrides default\n    assert wn.word(\"test-en-example-n\").confidence() == 1.0\n    assert (\n        wn.sense(\"test-en-example-n-0002-01\")\n        .relations(data=True)\n        .popitem()[0]\n        .confidence()\n    ) == 0.5\n    # values on parents don't override default on children\n    assert wn.sense(\"test-en-example-n-0002-01\").confidence() == 0.9\n    # check values on other elements\n    assert wn.synset(\"test-en-0001-n\").confidence() == 1.0\n    assert wn.synset(\"test-en-0001-n\").definition(data=True).confidence() == 0.95\n    assert (\n        wn.synset(\"test-en-0001-n\").relations(data=True).popitem()[0].confidence()\n    ) == 0.8\n    assert wn.synset(\"test-en-0001-n\").examples(data=True)[0].confidence() == 0.7\n"
  },
  {
    "path": "tests/similarity_test.py",
    "content": "from math import log\n\nimport pytest\n\nimport wn\nfrom wn import similarity as sim\nfrom wn.ic import information_content as infocont\nfrom wn.taxonomy import taxonomy_depth\n\n\ndef get_synsets(w):\n    return {\n        \"information\": w.synset(\"test-en-0001-n\"),\n        \"example\": w.synset(\"test-en-0002-n\"),\n        \"sample\": w.synset(\"test-en-0004-n\"),\n        \"random sample\": w.synset(\"test-en-0005-n\"),\n        \"random sample2\": w.synset(\"test-en-0008-n\"),\n        \"datum\": w.synset(\"test-en-0006-n\"),\n        \"exemplify\": w.synset(\"test-en-0003-v\"),\n    }\n\n\n# some fake information content; computed using:\n#   words = ['example', 'example', 'sample', 'random sample', 'illustrate']\n#   ic = compute(words, wn.Wordnet('test-en'), distribute_weight=False)\n\nic = {\n    \"n\": {\n        \"test-en-0001-n\": 5.0,  # information\n        \"test-en-0002-n\": 5.0,  # example, illustration\n        \"test-en-0004-n\": 3.0,  # sample\n        \"test-en-0005-n\": 2.0,  # random sample\n        \"test-en-0008-n\": 2.0,  # random sample 2\n        \"test-en-0006-n\": 1.0,  # datum\n        None: 6.0,\n    },\n    \"v\": {\n        \"test-en-0003-v\": 2.0,  # exemplify, illustrate\n        \"test-en-0007-v\": 1.0,  # resignate\n        None: 2.0,\n    },\n    \"a\": {None: 1.0},\n    \"r\": {None: 1.0},\n}\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_path():\n    ss = get_synsets(wn.Wordnet(\"test-en\"))\n    assert sim.path(ss[\"information\"], ss[\"information\"]) == 1 / 1\n    assert sim.path(ss[\"information\"], ss[\"example\"]) == 1 / 2\n    assert sim.path(ss[\"information\"], ss[\"sample\"]) == 1 / 3\n    assert sim.path(ss[\"information\"], ss[\"random sample\"]) == 1 / 4\n    assert sim.path(ss[\"random sample\"], ss[\"datum\"]) == 1 / 5\n    assert sim.path(ss[\"random sample2\"], ss[\"datum\"]) == 0\n    assert sim.path(ss[\"random sample2\"], ss[\"datum\"], simulate_root=True) == 1 / 4\n    assert (\n        sim.path(ss[\"random sample\"], ss[\"random sample2\"], simulate_root=True) == 1 / 6\n    )\n    with pytest.raises(wn.Error):\n        sim.path(ss[\"example\"], ss[\"exemplify\"])\n    with pytest.raises(wn.Error):\n        sim.wup(ss[\"example\"], ss[\"exemplify\"], simulate_root=True)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_wup():\n    ss = get_synsets(wn.Wordnet(\"test-en\"))\n    assert sim.wup(ss[\"information\"], ss[\"information\"]) == (2 * 1) / (0 + 0 + 2 * 1)\n    assert sim.wup(ss[\"information\"], ss[\"example\"]) == (2 * 1) / (0 + 1 + 2 * 1)\n    assert sim.wup(ss[\"information\"], ss[\"sample\"]) == (2 * 1) / (0 + 2 + 2 * 1)\n    assert sim.wup(ss[\"information\"], ss[\"random sample\"]) == (2 * 1) / (0 + 3 + 2 * 1)\n    assert sim.wup(ss[\"random sample\"], ss[\"datum\"]) == (2 * 1) / (3 + 1 + 2 * 1)\n    with pytest.raises(wn.Error):\n        assert sim.wup(ss[\"random sample2\"], ss[\"datum\"])\n    assert sim.wup(ss[\"random sample2\"], ss[\"datum\"], simulate_root=True) == (2 * 1) / (\n        1 + 2 + 2 * 1\n    )\n    assert sim.wup(ss[\"random sample\"], ss[\"random sample2\"], simulate_root=True) == (\n        2 * 1\n    ) / (4 + 1 + 2 * 1)\n    with pytest.raises(wn.Error):\n        sim.wup(ss[\"example\"], ss[\"exemplify\"])\n    with pytest.raises(wn.Error):\n        sim.wup(ss[\"example\"], ss[\"exemplify\"], simulate_root=True)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_lch():\n    w = wn.Wordnet(\"test-en\")\n    ss = get_synsets(w)\n    d_n = taxonomy_depth(w, \"n\")\n    assert sim.lch(ss[\"information\"], ss[\"information\"], d_n) == -log(\n        (0 + 1) / (2 * d_n)\n    )\n    assert sim.lch(ss[\"information\"], ss[\"example\"], d_n) == -log((1 + 1) / (2 * d_n))\n    assert sim.lch(ss[\"information\"], ss[\"sample\"], d_n) == -log((2 + 1) / (2 * d_n))\n    assert sim.lch(ss[\"information\"], ss[\"random sample\"], d_n) == -log(\n        (3 + 1) / (2 * d_n)\n    )\n    assert sim.lch(ss[\"random sample\"], ss[\"datum\"], d_n) == -log((4 + 1) / (2 * d_n))\n    with pytest.raises(wn.Error):\n        assert sim.lch(ss[\"random sample2\"], ss[\"datum\"], d_n)\n    assert sim.lch(ss[\"random sample2\"], ss[\"datum\"], d_n, simulate_root=True) == -log(\n        (3 + 1) / (2 * d_n)\n    )\n    assert sim.lch(\n        ss[\"random sample\"], ss[\"random sample2\"], d_n, simulate_root=True\n    ) == -log((5 + 1) / (2 * d_n))\n    with pytest.raises(wn.Error):\n        sim.lch(ss[\"example\"], ss[\"exemplify\"], d_n)\n    with pytest.raises(wn.Error):\n        sim.lch(ss[\"example\"], ss[\"exemplify\"], d_n, simulate_root=True)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_res():\n    w = wn.Wordnet(\"test-en\")\n    ss = get_synsets(w)\n    assert sim.res(ss[\"information\"], ss[\"information\"], ic) == infocont(\n        ss[\"information\"], ic\n    )\n    assert sim.res(ss[\"information\"], ss[\"example\"], ic) == infocont(\n        ss[\"information\"], ic\n    )\n    assert sim.res(ss[\"information\"], ss[\"sample\"], ic) == infocont(\n        ss[\"information\"], ic\n    )\n    assert sim.res(ss[\"information\"], ss[\"random sample\"], ic) == infocont(\n        ss[\"information\"], ic\n    )\n    assert sim.res(ss[\"random sample\"], ss[\"datum\"], ic) == infocont(\n        ss[\"information\"], ic\n    )\n    with pytest.raises(wn.Error):\n        sim.res(ss[\"random sample2\"], ss[\"datum\"], ic)\n    with pytest.raises(wn.Error):\n        sim.res(ss[\"example\"], ss[\"exemplify\"], ic)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_jcn():\n    w = wn.Wordnet(\"test-en\")\n    ss = get_synsets(w)\n    info_ic = infocont(ss[\"information\"], ic)\n    assert sim.jcn(ss[\"information\"], ss[\"information\"], ic) == float(\"inf\")\n    assert sim.jcn(ss[\"information\"], ss[\"example\"], ic) == float(\"inf\")\n    assert sim.jcn(ss[\"information\"], ss[\"sample\"], ic) == 1 / (\n        (info_ic + infocont(ss[\"sample\"], ic)) - 2 * info_ic\n    )\n    assert sim.jcn(ss[\"information\"], ss[\"random sample\"], ic) == 1 / (\n        (info_ic + infocont(ss[\"random sample\"], ic)) - 2 * info_ic\n    )\n    assert sim.jcn(ss[\"random sample\"], ss[\"datum\"], ic) == 1 / (\n        (infocont(ss[\"random sample\"], ic) + infocont(ss[\"datum\"], ic)) - 2 * info_ic\n    )\n    with pytest.raises(wn.Error):\n        sim.jcn(ss[\"random sample2\"], ss[\"datum\"], ic)\n    with pytest.raises(wn.Error):\n        sim.jcn(ss[\"example\"], ss[\"exemplify\"], ic)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_lin():\n    w = wn.Wordnet(\"test-en\")\n    ss = get_synsets(w)\n    info_ic = infocont(ss[\"information\"], ic)\n    assert sim.lin(ss[\"information\"], ss[\"information\"], ic) == 1.0\n    assert sim.lin(ss[\"information\"], ss[\"example\"], ic) == 1.0\n    assert sim.lin(ss[\"information\"], ss[\"sample\"], ic) == (2 * info_ic) / (\n        info_ic + infocont(ss[\"sample\"], ic)\n    )\n    assert sim.lin(ss[\"information\"], ss[\"random sample\"], ic) == (2 * info_ic) / (\n        info_ic + infocont(ss[\"random sample\"], ic)\n    )\n    assert sim.lin(ss[\"random sample\"], ss[\"datum\"], ic) == (\n        (2 * info_ic) / (infocont(ss[\"random sample\"], ic) + infocont(ss[\"datum\"], ic))\n    )\n    with pytest.raises(wn.Error):\n        sim.lin(ss[\"random sample2\"], ss[\"datum\"], ic)\n    with pytest.raises(wn.Error):\n        sim.lin(ss[\"example\"], ss[\"exemplify\"], ic)\n"
  },
  {
    "path": "tests/taxonomy_test.py",
    "content": "import pytest\n\nimport wn\nfrom wn.taxonomy import (\n    hypernym_paths,\n    leaves,\n    max_depth,\n    min_depth,\n    roots,\n    shortest_path,\n    taxonomy_depth,\n)\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_roots():\n    en = wn.Wordnet(\"test-en\")\n    assert set(roots(en, pos=\"n\")) == {\n        en.synset(\"test-en-0001-n\"),\n        en.synset(\"test-en-0008-n\"),\n    }\n    assert set(roots(en, pos=\"v\")) == {\n        en.synset(\"test-en-0003-v\"),\n        en.synset(\"test-en-0007-v\"),\n    }\n    assert roots(en, pos=\"a\") == []\n    assert set(roots(en)) == set(roots(en, pos=\"n\") + roots(en, pos=\"v\"))\n\n    # with no expand relations and no relation of its own, every\n    # synset looks like a root\n    es = wn.Wordnet(\"test-es\")\n    assert set(roots(es, pos=\"n\")) == {\n        es.synset(\"test-es-0001-n\"),\n        es.synset(\"test-es-0002-n\"),\n        es.synset(\"test-es-0005-n\"),\n    }\n\n    es = wn.Wordnet(\"test-es\", expand=\"test-en\")\n    assert roots(es, pos=\"n\") == [es.synset(\"test-es-0001-n\")]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_leaves():\n    en = wn.Wordnet(\"test-en\")\n    assert set(leaves(en, pos=\"n\")) == {\n        en.synset(\"test-en-0005-n\"),\n        en.synset(\"test-en-0006-n\"),\n        en.synset(\"test-en-0008-n\"),\n    }\n    assert set(leaves(en, pos=\"v\")) == {\n        en.synset(\"test-en-0003-v\"),\n        en.synset(\"test-en-0007-v\"),\n    }\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_taxonomy_depth():\n    en = wn.Wordnet(\"test-en\")\n    assert taxonomy_depth(en, pos=\"n\") == 3\n    assert taxonomy_depth(en, pos=\"v\") == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_hypernym_paths():\n    information = wn.synsets(\"information\")[0]\n    example = wn.synsets(\"example\")[0]\n    sample = wn.synsets(\"sample\")[0]\n    random_sample = wn.synsets(\"random sample\")[0]\n    assert hypernym_paths(information) == []\n    assert hypernym_paths(example) == [[information]]\n    assert hypernym_paths(sample) == [[example, information]]\n    assert hypernym_paths(random_sample) == [[sample, example, information]]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_interlingual_hypernym_paths():\n    información = wn.synsets(\"información\")[0]\n    ejemplo = wn.synsets(\"ejemplo\")[0]\n    sample = wn.synsets(\"sample\", lexicon=\"test-en:1\")[0]\n    inferred = wn.Synset.empty(\"*INFERRED*\", ili=sample.ili, _lexicon=\"test-es:1\")\n    muestra_aleatoria = wn.synsets(\"muestra aleatoria\")[0]\n    assert hypernym_paths(información) == []\n    assert hypernym_paths(ejemplo) == [[información]]\n    assert hypernym_paths(muestra_aleatoria) == [[inferred, ejemplo, información]]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_shortest_path():\n    information = wn.synsets(\"information\")[0]\n    example = wn.synsets(\"example\")[0]\n    sample = wn.synsets(\"sample\")[0]\n    random_sample = wn.synsets(\"random sample\")[0]\n    datum = wn.synsets(\"datum\")[0]\n    exemplify = wn.synsets(\"exemplify\")[0]\n    inferred_root = wn.Synset.empty(\"*ROOT*\", _lexicon=\"test-en:1\")\n    assert shortest_path(information, information) == []\n    assert shortest_path(information, datum) == [datum]\n    assert shortest_path(information, sample) == [example, sample]\n    assert shortest_path(sample, information) == [example, information]\n    assert shortest_path(random_sample, datum) == [sample, example, information, datum]\n    with pytest.raises(wn.Error):\n        shortest_path(example, exemplify)\n    assert shortest_path(example, exemplify, simulate_root=True) == [\n        information,\n        inferred_root,\n        exemplify,\n    ]\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_min_depth():\n    assert min_depth(wn.synsets(\"information\")[0]) == 0\n    assert min_depth(wn.synsets(\"example\")[0]) == 1\n    assert min_depth(wn.synsets(\"sample\")[0]) == 2\n    assert min_depth(wn.synsets(\"random sample\")[0]) == 3\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_max_depth():\n    assert max_depth(wn.synsets(\"information\")[0]) == 0\n    assert max_depth(wn.synsets(\"example\")[0]) == 1\n    assert max_depth(wn.synsets(\"sample\")[0]) == 2\n    assert max_depth(wn.synsets(\"random sample\")[0]) == 3\n"
  },
  {
    "path": "tests/util_test.py",
    "content": "from wn import util\n\n\ndef test_synset_id_formatter():\n    f = util.synset_id_formatter\n    assert f()(prefix=\"xyz\", offset=123, pos=\"n\") == \"xyz-00000123-n\"\n    assert f(prefix=\"xyz\")(offset=123, pos=\"n\") == \"xyz-00000123-n\"\n    assert f(prefix=\"xyz\", pos=\"n\")(offset=123) == \"xyz-00000123-n\"\n    assert f(\"abc-{offset}-{pos}\")(offset=1, pos=\"v\") == \"abc-1-v\"\n"
  },
  {
    "path": "tests/validate_test.py",
    "content": "import pytest\n\nfrom wn import lmf\nfrom wn.validate import validate\n\ntests = [\n    (\"E101\", 0),\n    (\"E101\", 1),\n    (\"E101\", 2),\n    (\"E101\", 3),\n    (\"W305\", 0),\n    (\"W306\", 0),\n    (\"W307\", 0),\n]\ntest_ids = [f\"{code}-{i}\" for code, i in tests]\n\n\n@pytest.mark.parametrize((\"code\", \"i\"), tests, ids=test_ids)\ndef test_validate(datadir, code: str, i: int) -> None:\n    path = datadir / f\"{code}-{i}.xml\"\n    lex = lmf.load(path, progress_handler=None)[\"lexicons\"][0]\n    report = validate(lex, select=[code], progress_handler=None)\n    print(report)\n    assert len(report[code][\"items\"]) > 0\n"
  },
  {
    "path": "tests/wordnet_test.py",
    "content": "from pathlib import Path\n\nimport pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_wordnet_lexicons():\n    en = wn.Wordnet(\"test-en\")\n    assert len(en.lexicons()) == 1\n    assert len(en.expanded_lexicons()) == 0\n\n    en1 = wn.Wordnet(\"test-en:1\")\n    assert en.lexicons() == en1.lexicons()\n    assert en.expanded_lexicons() == en1.expanded_lexicons()\n\n    en2 = wn.Wordnet(lang=\"en\")\n    assert len(en2.lexicons()) == 2\n    assert len(en2.expanded_lexicons()) == 0\n\n    es = wn.Wordnet(\"test-es\")\n    assert len(es.lexicons()) == 1\n    assert len(es.expanded_lexicons()) == 0\n\n    es2 = wn.Wordnet(\"test-es\", expand=\"test-en\")\n    assert len(es2.lexicons()) == 1\n    assert len(es2.expanded_lexicons()) == 1\n\n    ja = wn.Wordnet(\"test-ja\")\n    assert len(ja.lexicons()) == 1\n    assert len(ja.expanded_lexicons()) == 1\n\n    ja2 = wn.Wordnet(\"test-ja\", expand=\"\")\n    assert len(ja2.lexicons()) == 1\n    assert len(ja2.expanded_lexicons()) == 0\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_wordnet_normalize():\n    es = wn.Wordnet(\"test-es\")\n    assert es.words(\"Informacion\") == es.words(\"información\")\n    assert es.words(\"ínfórmácíón\") == es.words(\"información\")\n    es = wn.Wordnet(\"test-es\", normalizer=None)\n    assert es.words(\"informacion\") == []\n    assert es.words(\"Información\") == []\n\n    # The following doesn't necessarily work because any non-None\n    # normalizer causes the normalized form column to be tested with\n    # the original form\n    # es = wn.Wordnet('test-es', normalizer=str.lower)\n    # assert es.words('informacion') == []\n    # assert es.words('Información') == es.words('información')\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_wordnet_lemmatize():\n    # default lemmatizer compares alternative forms\n    en = wn.Wordnet(\"test-en\")\n    assert en.words(\"examples\") == []\n    assert en.words(\"exemplifying\") == en.words(\"exemplify\")\n    assert en.words(\"data\") == en.words(\"datum\")\n\n    en = wn.Wordnet(\"test-en\", search_all_forms=False)\n    assert en.words(\"examples\") == []\n    assert en.words(\"exemplifying\") == []\n    assert en.words(\"data\") == []\n\n    def morphy_lite(form, pos):\n        result = {pos: {form}}\n        if pos in (\"n\", None) and form.endswith(\"s\"):\n            result.setdefault(\"n\", set()).add(form[:-1])\n        return result\n\n    en = wn.Wordnet(\"test-en\", lemmatizer=morphy_lite, search_all_forms=False)\n    assert en.words(\"examples\", pos=\"n\") == en.words(\"example\")\n    assert en.words(\"examples\") == en.words(\"example\")\n    assert en.words(\"exemplifying\") == []\n    assert en.words(\"data\") == []\n\n    en = wn.Wordnet(\"test-en\", lemmatizer=morphy_lite, search_all_forms=True)\n    assert en.words(\"data\") == en.words(\"datum\")\n    assert en.words(\"exemplifying\") == en.words(\"exemplify\")\n\n\ndef test_portable_entities_issue_226(monkeypatch, tmp_path, datadir):\n    dir = tmp_path / \"wn_issue_226\"\n    with monkeypatch.context() as m:\n        m.setattr(wn.config, \"data_directory\", Path(dir))\n        wn.add(datadir / \"mini-lmf-1.0.xml\")\n        en = wn.Wordnet(\"test-en\")\n        info1 = en.synsets(\"information\")[0]\n        wn.remove(\"test-en\")\n        wn.add(datadir / \"mini-lmf-1.0.xml\")\n        info2 = en.synsets(\"information\")[0]  # en Wordnet object still works\n        assert info1 == info2  # synsets are equivalent\n        wn._db.clear_connections()\n"
  },
  {
    "path": "wn/__init__.py",
    "content": "\"\"\"\nWordnet Interface.\n\"\"\"\n\n__all__ = (\n    \"ConfigurationError\",\n    \"Count\",\n    \"DatabaseError\",\n    \"Definition\",\n    \"Error\",\n    \"Example\",\n    \"Form\",\n    \"Lexicon\",\n    \"ProjectError\",\n    \"Pronunciation\",\n    \"Relation\",\n    \"Sense\",\n    \"Synset\",\n    \"Tag\",\n    \"WnWarning\",\n    \"Word\",\n    \"Wordnet\",\n    \"__version__\",\n    \"add\",\n    \"add_lexical_resource\",\n    \"download\",\n    \"export\",\n    \"lemmas\",\n    \"lexicons\",\n    \"projects\",\n    \"remove\",\n    \"reset_database\",\n    \"sense\",\n    \"senses\",\n    \"synset\",\n    \"synsets\",\n    \"word\",\n    \"words\",\n)\n__version__ = \"1.1.0\"\n\nfrom wn._add import add, add_lexical_resource, remove\nfrom wn._config import config  # noqa: F401\nfrom wn._core import (\n    Count,\n    Definition,\n    Example,\n    Form,\n    Pronunciation,\n    Relation,\n    Sense,\n    Synset,\n    Tag,\n    Word,\n)\nfrom wn._download import download\nfrom wn._exceptions import (\n    ConfigurationError,\n    DatabaseError,\n    Error,\n    ProjectError,\n    WnWarning,\n)\nfrom wn._export import export\nfrom wn._lexicon import Lexicon\nfrom wn._module_functions import (\n    lemmas,\n    lexicons,\n    projects,\n    reset_database,\n    sense,\n    senses,\n    synset,\n    synsets,\n    word,\n    words,\n)\nfrom wn._wordnet import Wordnet\n"
  },
  {
    "path": "wn/__main__.py",
    "content": "import argparse\nimport json\nimport logging\nimport sys\nfrom pathlib import Path\n\nimport wn\nfrom wn import lmf\nfrom wn._util import format_lexicon_specifier\nfrom wn.project import iterpackages\nfrom wn.validate import validate\n\n\ndef _download(args: argparse.Namespace) -> None:\n    if args.index:\n        wn.config.load_index(args.index)\n    for target in args.target:\n        wn.download(target, add=args.add)\n\n\ndef _cache(args: argparse.Namespace) -> None:\n    cache_entries = wn.config.list_cache_entries(args.ARG)\n    if args.full_paths_only:\n        for cache_entry in cache_entries:\n            print(str(cache_entry[\"path\"]))\n    else:\n        for cache_entry in cache_entries:\n            print(\n                \"\\t\".join(\n                    [\n                        str(cache_entry[\"path\"].name),\n                        cache_entry[\"id\"] or \"?\",\n                        cache_entry[\"version\"] or \"?\",\n                        cache_entry[\"url\"] or \"?\",\n                    ]\n                )\n            )\n\n\ndef _lexicons(args: argparse.Namespace) -> None:\n    for lex in wn.lexicons(lang=args.lang, lexicon=args.lexicon):\n        print(\"\\t\".join((lex.id, lex.version, f\"[{lex.language}]\", lex.label)))\n\n\ndef _projects(args: argparse.Namespace) -> None:\n    for info in wn.projects():\n        key = \"i\"\n        key += \"c\" if info[\"cache\"] else \"-\"\n        # key += 'a' if False else '-'  # TODO: check if project is added to db\n        print(\n            \"\\t\".join(\n                (\n                    key,\n                    info[\"id\"],\n                    info[\"version\"],\n                    f\"[{info['language'] or '---'}]\",\n                    info[\"label\"] or \"---\",\n                )\n            )\n        )\n\n\ndef _validate(args: argparse.Namespace) -> None:\n    all_valid = True\n    selectseq = [check.strip() for check in args.select.split(\",\")]\n    for package in iterpackages(args.FILE):\n        resource = lmf.load(package.resource_file())\n        for lexicon in resource[\"lexicons\"]:\n            spec = format_lexicon_specifier(lexicon[\"id\"], lexicon[\"version\"])\n            print(f\"{spec:<20}\", end=\"\")\n            report = validate(lexicon, select=selectseq)\n            if not any(check.get(\"items\", []) for check in report.values()):\n                print(\"passed\")\n            else:\n                print(\"failed\")\n                all_valid = False\n                # clean up report\n                for code in list(report):\n                    if not report[code].get(\"items\"):\n                        del report[code]\n                if args.output_file:\n                    with open(args.output_file, \"w\") as outfile:\n                        json.dump(report, outfile, indent=2)\n                else:\n                    for _code, check in report.items():\n                        if not check[\"items\"]:\n                            continue\n                        print(f\"  {check['message']}\")\n                        for id, context in check[\"items\"].items():\n                            print(f\"    {id}: {context}\" if context else f\"    {id}\")\n\n    sys.exit(0 if all_valid else 1)\n\n\ndef _path_type(arg):\n    return Path(arg)\n\n\ndef _file_path_type(arg):\n    path = Path(arg)\n    if not path.is_file():\n        raise argparse.ArgumentTypeError(f\"cannot file file: {arg}\")\n    return path\n\n\nparser = argparse.ArgumentParser(\n    prog=\"python3 -m wn\",\n    description=\"Manage Wn's wordnet data from the command line.\",\n)\nparser.add_argument(\"-V\", \"--version\", action=\"version\", version=f\"Wn {wn.__version__}\")\nparser.add_argument(\n    \"-v\",\n    \"--verbose\",\n    action=\"count\",\n    dest=\"verbosity\",\n    default=0,\n    help=\"increase verbosity (can repeat: -vv, -vvv)\",\n)\nparser.add_argument(\n    \"-d\",\n    \"--dir\",\n    type=_path_type,\n    help=\"data directory for Wn's database and cache\",\n)\nparser.set_defaults(func=lambda _: parser.print_help())\nsub_parsers = parser.add_subparsers(title=\"subcommands\")\n\n\nparser_download = sub_parsers.add_parser(\n    \"download\",\n    description=\"Download wordnets and add them to Wn's database.\",\n    help=\"download wordnets\",\n)\nparser_download.add_argument(\"target\", nargs=\"+\", help=\"project specifiers or URLs\")\nparser_download.add_argument(\n    \"--index\", type=_file_path_type, help=\"project index to use for downloading\"\n)\nparser_download.add_argument(\n    \"--no-add\",\n    action=\"store_false\",\n    dest=\"add\",\n    help=\"download and cache without adding to the database\",\n)\nparser_download.set_defaults(func=_download)\n\n\nparser_cache = sub_parsers.add_parser(\n    \"cache\",\n    description=\"View Wn's download cache.\",\n    help=\"view the download cache\",\n)\nparser_cache.add_argument(\n    \"ARG\",\n    help=\"project specifier or URL\",\n    nargs=\"?\",\n    default=\"*\",\n)\nparser_cache.add_argument(\n    \"--full-paths-only\",\n    action=\"store_true\",\n    help=\"print the full paths of cache entries without other data\",\n)\nparser_cache.set_defaults(func=_cache)\n\nparser_lexicons = sub_parsers.add_parser(\n    \"lexicons\",\n    description=\"Display a list of installed lexicons.\",\n    help=\"list installed lexicons\",\n)\nparser_lexicons.add_argument(\"-l\", \"--lang\", help=\"BCP 47 language code\")\nparser_lexicons.add_argument(\"--lexicon\", help=\"lexicon specifiers\")\nparser_lexicons.set_defaults(func=_lexicons)\n\n\nparser_projects = sub_parsers.add_parser(\n    \"projects\",\n    description=(\n        \"Display a list of known projects. The first column shows the \"\n        \"status for a project (i=indexed, c=cached).\"\n    ),\n    help=\"list known projects\",\n)\nparser_projects.set_defaults(func=_projects)\n\n\nparser_validate = sub_parsers.add_parser(\n    \"validate\",\n    description=(\"Validate a WN-LMF lexicon\"),\n    help=\"validate a lexicon\",\n)\nparser_validate.add_argument(\n    \"FILE\", type=_file_path_type, help=\"WN-LMF (XML) lexicon file to validate\"\n)\nparser_validate.add_argument(\n    \"--select\",\n    metavar=\"CHECKS\",\n    default=\"E,W\",\n    help=\"comma-separated list of checks to run (default: E,W)\",\n)\nparser_validate.add_argument(\n    \"--output-file\", metavar=\"FILE\", help=\"write report to a JSON file\"\n)\nparser_validate.set_defaults(func=_validate)\n\n\nargs = parser.parse_args()\n\nlogging.basicConfig(level=logging.ERROR - (min(args.verbosity, 3) * 10))\n\nif args.dir:\n    wn.config.data_directory = args.dir\n\nargs.func(args)\n"
  },
  {
    "path": "wn/_add.py",
    "content": "\"\"\"\nAdding and removing lexicons to/from the database.\n\"\"\"\n\nimport logging\nimport sqlite3\nfrom collections.abc import Iterable, Iterator, Sequence\nfrom itertools import islice\nfrom pathlib import Path\nfrom typing import TypeVar, cast\n\nfrom wn import ili as _ili\nfrom wn import lmf\nfrom wn._config import ResourceType, config\nfrom wn._db import connect\nfrom wn._exceptions import Error\nfrom wn._queries import (\n    get_lexicon_extensions,\n    resolve_lexicon_specifiers,\n)\nfrom wn._types import AnyPath\nfrom wn._util import format_lexicon_specifier, normalize_form\nfrom wn.project import iterpackages\nfrom wn.util import ProgressBar, ProgressHandler\n\nlog = logging.getLogger(\"wn\")\n\n\nBATCH_SIZE = 1000\nDEFAULT_MEMBER_RANK = 127  # synset member rank when not specified by 'members'\n\nENTRY_QUERY = \"\"\"\n    SELECT e.rowid\n      FROM entries AS e\n     WHERE e.id = ?\n       AND e.lexicon_rowid = ?\n\"\"\"\n# forms don't have reliable ids, so also consider rank; this depends\n# on each form having a unique rank, and this doesn't work for lexicon\n# extensions\nFORM_QUERY = \"\"\"\n    SELECT f.rowid\n      FROM forms AS f\n      JOIN entries AS e ON f.entry_rowid = e.rowid\n     WHERE e.id = ?\n       AND e.lexicon_rowid = ?\n       AND (f.id = ? OR f.rank = ?)\n\"\"\"\nSENSE_QUERY = \"\"\"\n    SELECT s.rowid\n      FROM senses AS s\n     WHERE s.id = ?\n       AND s.lexicon_rowid = ?\n\"\"\"\nSYNSET_QUERY = \"\"\"\n    SELECT ss.rowid\n      FROM synsets AS ss\n     WHERE ss.id = ?\n       AND ss.lexicon_rowid = ?\n\"\"\"\nRELTYPE_QUERY = \"\"\"\n    SELECT rt.rowid\n      FROM relation_types AS rt\n     WHERE rt.type = ?\n\"\"\"\nILISTAT_QUERY = \"\"\"\n    SELECT ist.rowid\n      FROM ili_statuses AS ist\n     WHERE ist.status = ?\n\"\"\"\nLEXFILE_QUERY = \"\"\"\n    SELECT lf.rowid\n      FROM lexfiles AS lf\n     WHERE lf.name = ?\n\"\"\"\n\n_AnyLexicon = lmf.Lexicon | lmf.LexiconExtension\n_AnyEntry = lmf.LexicalEntry | lmf.ExternalLexicalEntry\n_AnyLemma = lmf.Lemma | lmf.ExternalLemma\n_AnyForm = lmf.Form | lmf.ExternalForm\n_AnySense = lmf.Sense | lmf.ExternalSense\n_AnySynset = lmf.Synset | lmf.ExternalSynset\n\n\ndef add(\n    source: AnyPath,\n    progress_handler: type[ProgressHandler] | None = ProgressBar,\n) -> None:\n    \"\"\"Add the LMF or ILI file at *source* to the database.\n\n    The file at *source* may be gzip-compressed or plain text file.\n\n    >>> wn.add(\"english-wordnet-2020.xml\")\n    Added ewn:2020 (English WordNet)\n\n    The *progress_handler* parameter takes a subclass of\n    :class:`wn.util.ProgressHandler`. An instance of the class will be\n    created, used, and closed by this function.\n    \"\"\"\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n    progress = progress_handler(message=\"Database\")\n\n    log.info(\"adding project to database\")\n    log.info(\"  database: %s\", config.database_path)\n    log.info(\"  project file: %s\", source)\n\n    try:\n        for package in iterpackages(source):\n            match package.type:\n                case ResourceType.WORDNET:\n                    _add_lmf(package.resource_file(), progress, progress_handler)\n                case ResourceType.ILI:\n                    _add_ili(package.resource_file(), progress)\n                case _:\n                    raise Error(f\"unknown package type: {package.type}\")\n    finally:\n        progress.close()\n\n\ndef _add_lmf(\n    source: Path,\n    progress: ProgressHandler,\n    progress_handler: type[ProgressHandler],\n) -> None:\n    # abort if lexicons in *source* are already added\n    progress.flash(f\"Checking {source!s}\")\n    infos = lmf.scan_lexicons(source)\n    if not infos:\n        progress.flash(f\"{source}: No lexicons found\")\n        return\n\n    skipmap = _precheck(infos, progress)\n    if all(skipmap.values()):\n        return  # nothing to do\n\n    # all clear, try to add them\n    progress.flash(f\"Reading {source!s}\")\n    resource = lmf.load(source, progress_handler)\n    _add_lexical_resource(resource, skipmap, progress)\n\n\ndef add_lexical_resource(\n    resource: lmf.LexicalResource,\n    progress_handler: type[ProgressHandler] | None = ProgressBar,\n) -> None:\n    \"\"\"Add the lexical resource *resource* to the database.\n\n    The *resource* argument is an in-memory lexical resource as from\n    :func:`wn.lmf.load` and not a file on disk.\n\n    >>> resource = wn.lmf.load(\"english-wordnet-2024.xml\")\n    >>> wn.add_lexical_resource(resource)\n    Added ewn:2020 (English WordNet)\n\n    The *progress_handler* parameter takes a subclass of\n    :class:`wn.util.ProgressHandler`. An instance of the class will be\n    created, used, and closed by this function.\n    \"\"\"\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n    progress = progress_handler(message=\"Database\")\n\n    try:\n        progress.flash(\"Checking resource\")\n        if not resource[\"lexicons\"]:\n            progress.flash(\"No lexicons found\")\n            return\n\n        skipmap = _precheck(resource[\"lexicons\"], progress)\n        if all(skipmap.values()):\n            return  # nothing to do\n\n        _add_lexical_resource(resource, skipmap, progress)\n\n    finally:\n        progress.close()\n\n\ndef _add_lexical_resource(\n    resource: lmf.LexicalResource,\n    skipmap: dict[str, bool],\n    progress: ProgressHandler,\n) -> None:\n    with connect() as conn:\n        cur = conn.cursor()\n        # these two settings increase the risk of database corruption\n        # if the system crashes during a write, but they should also\n        # make inserts much faster\n        cur.execute(\"PRAGMA synchronous = OFF\")\n        cur.execute(\"PRAGMA journal_mode = MEMORY\")\n\n        for lexicon in resource[\"lexicons\"]:\n            spec = format_lexicon_specifier(lexicon[\"id\"], lexicon[\"version\"])\n            if skipmap[spec]:\n                continue  # _precheck() says this should be skipped\n\n            progress.flash(\"Updating lookup tables\")\n            _update_lookup_tables(lexicon, cur)\n\n            progress.set(count=0, total=_sum_counts(lexicon))\n            synsets: Sequence[_AnySynset] = _synsets(lexicon)\n            entries: Sequence[_AnyEntry] = _entries(lexicon)\n            synbhrs: Sequence[lmf.SyntacticBehaviour] = _collect_frames(lexicon)\n\n            lexid, baseid = _insert_lexicon(lexicon, cur, progress)\n\n            lexidmap = _build_lexid_map(lexicon, lexid, baseid)\n\n            _insert_synsets(synsets, lexid, cur, progress)\n            _insert_entries(entries, lexid, cur, progress)\n            _insert_index(entries, lexid, cur, progress)\n            _insert_forms(entries, lexid, lexidmap, cur, progress)\n            _insert_pronunciations(entries, lexid, lexidmap, cur, progress)\n            _insert_tags(entries, lexid, lexidmap, cur, progress)\n            _insert_senses(entries, synsets, lexid, lexidmap, cur, progress)\n            _insert_adjpositions(entries, lexid, lexidmap, cur, progress)\n            _insert_counts(entries, lexid, lexidmap, cur, progress)\n            _insert_syntactic_behaviours(synbhrs, lexid, lexidmap, cur, progress)\n\n            _insert_synset_relations(synsets, lexid, lexidmap, cur, progress)\n            _insert_sense_relations(lexicon, lexid, lexidmap, cur, progress)\n\n            _insert_synset_definitions(synsets, lexid, lexidmap, cur, progress)\n            _insert_examples(\n                [sense for e in entries for sense in _senses(e)],\n                lexid,\n                lexidmap,\n                \"sense_examples\",\n                cur,\n                progress,\n            )\n            _insert_examples(synsets, lexid, lexidmap, \"synset_examples\", cur, progress)\n\n            progress.set(status=\"\")  # clear type string\n            progress.flash(f\"Added {spec} ({lexicon['label']})\\n\")\n\n\ndef _precheck(\n    infos: Sequence[lmf.ScanInfo | lmf.Lexicon | lmf.LexiconExtension],\n    progress: ProgressHandler,\n) -> dict[str, bool]:\n    skipmap: dict[str, bool] = {}\n    lexqry = \"SELECT * FROM lexicons WHERE id = :id AND version = :version\"\n    with connect() as conn:\n        cur = conn.cursor()\n        for info in infos:\n            key = format_lexicon_specifier(info[\"id\"], info[\"version\"])\n\n            base: lmf.LexiconSpecifier | None = info.get(\"extends\")  # type: ignore\n\n            skipmap[key] = False\n            reason = \"\"\n\n            # can't have two lexicons with the same specifier in the db\n            if cur.execute(lexqry, info).fetchone():\n                skipmap[key] = True\n                reason = \"already added\"\n\n            # can't have an extension without the base\n            elif base and cur.execute(lexqry, base).fetchone() is None:\n                skipmap[key] = True\n                base_key = format_lexicon_specifier(base[\"id\"], base[\"version\"])\n                reason = f\"base lexicon ({base_key}) not available\"\n\n            if reason:\n                progress.flash(f\"Skipping {key} ({info['label']}); {reason}\\n\")\n\n    return skipmap\n\n\ndef _sum_counts(lex: _AnyLexicon) -> int:\n    ents = _entries(lex)\n    locs = _local_entries(ents)\n    lems = [e[\"lemma\"] for e in locs if e.get(\"lemma\")]\n    frms = [f for e in ents for f in _forms(e)]\n    sens = [s for e in ents for s in _senses(e)]\n    syns = _synsets(lex)\n    return sum(\n        [\n            # index (every entry must be processed; not all use index)\n            len(ents),\n            # lexical entries\n            len(ents),\n            len(lems),\n            sum(len(lem.get(\"pronunciations\", [])) for lem in lems),\n            sum(len(lem.get(\"tags\", [])) for lem in lems),\n            len(frms),\n            sum(len(frm.get(\"pronunciations\", [])) for frm in frms),\n            sum(len(frm.get(\"tags\", [])) for frm in frms),\n            # senses\n            len(sens),\n            sum(len(sen.get(\"relations\", [])) for sen in sens),\n            sum(len(sen.get(\"examples\", [])) for sen in sens),\n            sum(len(sen.get(\"counts\", [])) for sen in sens),\n            # synsets\n            len(syns),\n            sum(len(syn.get(\"definitions\", [])) for syn in syns),\n            sum(len(syn.get(\"relations\", [])) for syn in syns),\n            sum(len(syn.get(\"examples\", [])) for syn in syns),\n            # syntactic behaviours\n            sum(len(ent.get(\"frames\", [])) for ent in locs),\n            len(lex.get(\"frames\", [])),\n        ]\n    )\n\n\ndef _update_lookup_tables(lexicon: _AnyLexicon, cur: sqlite3.Cursor) -> None:\n    reltypes = {\n        rel[\"relType\"] for ss in _synsets(lexicon) for rel in ss.get(\"relations\", [])\n    }\n    reltypes.update(\n        rel[\"relType\"]\n        for e in _entries(lexicon)\n        for s in _senses(e)\n        for rel in s.get(\"relations\", [])\n    )\n    cur.executemany(\n        \"INSERT OR IGNORE INTO relation_types VALUES (null,?)\",\n        [(rt,) for rt in sorted(reltypes)],\n    )\n    lexfiles: set[str] = {\n        ss.get(\"lexfile\", \"\")\n        for ss in _local_synsets(_synsets(lexicon))\n        if ss.get(\"lexfile\")\n    }\n    cur.executemany(\n        \"INSERT OR IGNORE INTO lexfiles VALUES (null,?)\",\n        [(lf,) for lf in sorted(lexfiles)],\n    )\n\n\ndef _insert_lexicon(\n    lexicon: _AnyLexicon, cur: sqlite3.Cursor, progress: ProgressHandler\n) -> tuple[int, int]:\n    progress.set(status=\"Lexicon Info\")\n    cur.execute(\n        \"INSERT INTO lexicons VALUES (null,?,?,?,?,?,?,?,?,?,?,?,?)\",\n        (\n            f\"{lexicon['id']}:{lexicon['version']}\",\n            lexicon[\"id\"],\n            lexicon[\"label\"],\n            lexicon[\"language\"],\n            lexicon[\"email\"],\n            lexicon[\"license\"],\n            lexicon[\"version\"],\n            lexicon.get(\"url\"),\n            lexicon.get(\"citation\"),\n            lexicon.get(\"logo\"),\n            lexicon.get(\"meta\"),\n            False,\n        ),\n    )\n    lexid = cur.lastrowid\n\n    if not isinstance(lexid, int):\n        raise Error(\"failed to insert lexicon\")\n\n    query = \"\"\"\n        UPDATE lexicon_dependencies\n           SET provider_rowid = ?\n         WHERE provider_id = ? AND provider_version = ?\n    \"\"\"\n    cur.execute(query, (lexid, lexicon[\"id\"], lexicon[\"version\"]))\n\n    query = \"\"\"\n        INSERT INTO {table}\n        VALUES (:lid,\n                :id,\n                :version,\n                :url,\n                (SELECT rowid FROM lexicons WHERE id=:id AND version=:version))\n    \"\"\"\n    params = []\n    for dep in lexicon.get(\"requires\", []):\n        param_dict = dict(dep)\n        param_dict.setdefault(\"url\", None)\n        param_dict[\"lid\"] = lexid\n        params.append(param_dict)\n    if params:\n        cur.executemany(query.format(table=\"lexicon_dependencies\"), params)\n\n    if lexicon.get(\"extends\"):\n        lexicon = cast(\"lmf.LexiconExtension\", lexicon)\n        param_dict = dict(lexicon[\"extends\"])\n        param_dict.setdefault(\"url\", None)\n        param_dict[\"lid\"] = lexid\n        cur.execute(query.format(table=\"lexicon_extensions\"), param_dict)\n        baseid = cur.execute(\n            \"SELECT rowid FROM lexicons WHERE id=? AND version=?\",\n            (param_dict[\"id\"], param_dict[\"version\"]),\n        ).fetchone()[0]\n    else:\n        baseid = lexid\n\n    return lexid, baseid\n\n\n_LexIdMap = dict[str, int]\n\n\ndef _build_lexid_map(lexicon: _AnyLexicon, lexid: int, extid: int) -> _LexIdMap:\n    \"\"\"Build a mapping of entity IDs to extended lexicon rowid.\"\"\"\n    lexidmap: _LexIdMap = {}\n    if lexid != extid:\n        lexidmap.update((e[\"id\"], extid) for e in _entries(lexicon) if _is_external(e))\n        lexidmap.update(\n            (s[\"id\"], extid)\n            for e in _entries(lexicon)\n            for s in _senses(e)\n            if _is_external(s)\n        )\n        lexidmap.update(\n            (ss[\"id\"], extid) for ss in _synsets(lexicon) if _is_external(ss)\n        )\n    return lexidmap\n\n\nT = TypeVar(\"T\")\n\n\ndef _batch(sequence: Iterable[T]) -> Iterator[list[T]]:\n    it = iter(sequence)\n    batch = list(islice(it, 0, BATCH_SIZE))\n    while len(batch):\n        yield batch\n        batch = list(islice(it, 0, BATCH_SIZE))\n\n\ndef _insert_synsets(\n    synsets: Sequence[_AnySynset],\n    lexid: int,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Synsets\")\n    # synsets\n    ss_query = f\"\"\"\n        INSERT INTO synsets\n        VALUES (null,?,?,(SELECT rowid FROM ilis WHERE id=?),?,({LEXFILE_QUERY}),?)\n    \"\"\"\n    # presupposed ILIs\n    pre_ili_query = f\"\"\"\n        INSERT OR IGNORE INTO ilis\n        VALUES (null,?,({ILISTAT_QUERY}),?,?)\n    \"\"\"\n    # proposed ILIs\n    pro_ili_query = \"\"\"\n        INSERT INTO proposed_ilis\n        VALUES (null,\n               (SELECT ss.rowid\n                  FROM synsets AS ss\n                 WHERE ss.id=? AND lexicon_rowid=?),\n               ?,\n               ?)\n    \"\"\"\n\n    for batch in _batch(_local_synsets(synsets)):\n        # first add presupposed ILIs\n        pre_ili_data = []\n        for ss in batch:\n            ili = ss[\"ili\"]\n            if ili and ili != \"in\":\n                defn = ss.get(\"ili_definition\")  # normally null\n                text = defn[\"text\"] if defn else None\n                meta = defn.get(\"meta\") if defn else None\n                pre_ili_data.append((ili, \"presupposed\", text, meta))\n        cur.executemany(pre_ili_query, pre_ili_data)\n\n        # then add synsets\n        ss_data = (\n            (\n                ss[\"id\"],\n                lexid,\n                ss[\"ili\"] if ss[\"ili\"] and ss[\"ili\"] != \"in\" else None,\n                ss.get(\"partOfSpeech\"),\n                ss.get(\"lexfile\"),\n                ss.get(\"meta\"),\n            )\n            for ss in batch\n        )\n        cur.executemany(ss_query, ss_data)\n\n        # finally add proposed ILIs\n        pro_ili_data = []\n        for ss in batch:\n            ili = ss[\"ili\"]\n            if ili == \"in\":\n                defn = ss.get(\"ili_definition\")\n                text = defn[\"text\"] if defn else None\n                meta = defn.get(\"meta\") if defn else None\n                pro_ili_data.append((ss[\"id\"], lexid, text, meta))\n        cur.executemany(pro_ili_query, pro_ili_data)\n\n        progress.update(len(batch))\n\n    # only store when lexicalized=False\n    unlexicalized_data = [\n        (synset[\"id\"], lexid)\n        for synset in _local_synsets(synsets)\n        if not synset.get(\"lexicalized\", True)\n    ]\n    query = f\"\"\"\n        INSERT INTO unlexicalized_synsets (synset_rowid) {SYNSET_QUERY}\n    \"\"\"\n    cur.executemany(query, unlexicalized_data)\n\n\ndef _insert_synset_definitions(\n    synsets: Sequence[_AnySynset],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Definitions\")\n    query = f\"\"\"\n        INSERT INTO definitions\n        VALUES (null,?,({SYNSET_QUERY}),?,?,({SENSE_QUERY}),?)\n    \"\"\"\n    for batch in _batch(synsets):\n        data = [\n            (\n                lexid,\n                synset[\"id\"],\n                lexidmap.get(synset[\"id\"], lexid),\n                definition[\"text\"],\n                definition.get(\"language\"),\n                definition.get(\"sourceSense\"),\n                lexidmap.get(definition.get(\"sourceSense\", \"\"), lexid),\n                definition.get(\"meta\"),\n            )\n            for synset in batch\n            for definition in synset.get(\"definitions\", [])\n        ]\n        cur.executemany(query, data)\n        progress.update(len(data))\n\n\ndef _insert_synset_relations(\n    synsets: Sequence[_AnySynset],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Synset Relations\")\n    query = f\"\"\"\n        INSERT INTO synset_relations\n        VALUES (null,?,({SYNSET_QUERY}),({SYNSET_QUERY}),({RELTYPE_QUERY}),?)\n    \"\"\"\n    for batch in _batch(synsets):\n        data = [\n            (\n                lexid,\n                synset[\"id\"],\n                lexidmap.get(synset[\"id\"], lexid),\n                relation[\"target\"],\n                lexidmap.get(relation[\"target\"], lexid),\n                relation[\"relType\"],\n                relation.get(\"meta\"),\n            )\n            for synset in batch\n            for relation in synset.get(\"relations\", [])\n        ]\n        cur.executemany(query, data)\n        progress.update(len(data))\n\n\ndef _insert_entries(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Words\")\n    query = \"INSERT INTO entries VALUES (null,?,?,?,?)\"\n    for batch in _batch(_local_entries(entries)):\n        data = (\n            (entry[\"id\"], lexid, entry[\"lemma\"][\"partOfSpeech\"], entry.get(\"meta\"))\n            for entry in batch\n        )\n        cur.executemany(query, data)\n        progress.update(len(batch))\n\n\ndef _insert_index(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Index\")\n    query = f\"INSERT INTO entry_index VALUES (({ENTRY_QUERY}),?)\"\n    for batch in _batch(_local_entries(entries)):\n        data = (\n            (\n                entry[\"id\"],\n                lexid,\n                entry[\"index\"],\n            )\n            for entry in batch\n            if entry.get(\"index\")\n        )\n        cur.executemany(query, data)\n        progress.update(len(batch))\n\n\ndef _insert_forms(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Word Forms\")\n    query = f\"INSERT INTO forms VALUES (null,?,?,({ENTRY_QUERY}),?,?,?,?)\"\n    for batch in _batch(entries):\n        forms: list[\n            tuple[str | None, int, str, int, str, str | None, str | None, int]\n        ] = []\n        for entry in batch:\n            eid = entry[\"id\"]\n            lid = lexidmap.get(eid, lexid)\n            if not _is_external(entry):\n                entry = cast(\"lmf.LexicalEntry\", entry)\n                written_form = entry[\"lemma\"][\"writtenForm\"]\n                norm = normalize_form(written_form)\n                forms.append(\n                    (\n                        None,\n                        lexid,\n                        eid,\n                        lid,\n                        written_form,\n                        norm if norm != written_form else None,\n                        entry[\"lemma\"].get(\"script\"),\n                        0,\n                    )\n                )\n            for i, form in enumerate(_forms(entry), 1):\n                if _is_external(form):\n                    continue\n                form = cast(\"lmf.Form\", form)\n                written_form = form[\"writtenForm\"]\n                norm = normalize_form(written_form)\n                forms.append(\n                    (\n                        form.get(\"id\"),\n                        lexid,\n                        eid,\n                        lid,\n                        written_form,\n                        norm if norm != written_form else None,\n                        form.get(\"script\"),\n                        i,\n                    )\n                )\n        cur.executemany(query, forms)\n        progress.update(len(forms))\n\n\ndef _insert_pronunciations(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Pronunciations\")\n    query = f\"INSERT INTO pronunciations VALUES (({FORM_QUERY}),?,?,?,?,?,?)\"\n    for batch in _batch(entries):\n        prons: list[\n            tuple[\n                # FORM_QUERY args\n                str,  # entry id\n                int,  # entry lexid\n                str | None,  # optional form id\n                int,  # rank\n                # pronunciation fields\n                int,  # pronunciation lexid\n                str,  # text\n                str | None,  # variety\n                str | None,  # notation\n                bool,  # phonemic\n                str | None,  # audio\n            ]\n        ] = []\n        for entry in batch:\n            eid = entry[\"id\"]\n            lid = lexidmap.get(eid, lexid)\n            if lemma := entry.get(\"lemma\"):\n                for p in lemma.get(\"pronunciations\", []):\n                    prons.append(\n                        (\n                            eid,\n                            lid,\n                            None,\n                            0,\n                            lexid,\n                            p[\"text\"],\n                            p.get(\"variety\"),\n                            p.get(\"notation\"),\n                            p.get(\"phonemic\", True),\n                            p.get(\"audio\"),\n                        )\n                    )\n            for i, form in enumerate(_forms(entry), 1):\n                # rank is not valid in FORM_QUERY for external forms\n                rank = -1 if _is_external(form) else i\n                for p in form.get(\"pronunciations\", []):\n                    prons.append(\n                        (\n                            eid,\n                            lid,\n                            form.get(\"id\"),\n                            rank,\n                            lexid,\n                            p[\"text\"],\n                            p.get(\"variety\"),\n                            p.get(\"notation\"),\n                            p.get(\"phonemic\", True),\n                            p.get(\"audio\"),\n                        )\n                    )\n        cur.executemany(query, prons)\n        progress.update(len(prons))\n\n\ndef _insert_tags(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Word Form Tags\")\n    query = f\"INSERT INTO tags VALUES (({FORM_QUERY}),?,?,?)\"\n    for batch in _batch(entries):\n        tags: list[tuple[str, int, str | None, int, int, str, str]] = []\n        for entry in batch:\n            eid = entry[\"id\"]\n            lid = lexidmap.get(eid, lexid)\n            if lemma := entry.get(\"lemma\"):\n                for tag in lemma.get(\"tags\", []):\n                    tags.append(\n                        (\n                            eid,\n                            lid,\n                            None,\n                            0,\n                            lexid,\n                            tag[\"text\"],\n                            tag[\"category\"],\n                        )\n                    )\n            for i, form in enumerate(_forms(entry), 1):\n                # rank is not valid in FORM_QUERY for external forms\n                rank = -1 if _is_external(form) else i\n                for tag in form.get(\"tags\", []):\n                    tags.append(\n                        (\n                            eid,\n                            lid,\n                            form.get(\"id\"),\n                            rank,\n                            lexid,\n                            tag[\"text\"],\n                            tag[\"category\"],\n                        )\n                    )\n        cur.executemany(query, tags)\n        progress.update(len(tags))\n\n\ndef _insert_senses(\n    entries: Sequence[_AnyEntry],\n    synsets: Sequence[_AnySynset],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Senses\")\n    ssrank = {\n        (ss[\"id\"], _id): i\n        for ss in _local_synsets(synsets)\n        for i, _id in enumerate(ss.get(\"members\", []))\n    }\n    query = f\"\"\"\n        INSERT INTO senses\n        VALUES (null,\n                ?,\n                ?,\n                ({ENTRY_QUERY}),\n                ?,\n                ({SYNSET_QUERY}),\n                ?,\n                ?)\n    \"\"\"\n    for batch in _batch(entries):\n        data = [\n            (\n                sense[\"id\"],\n                lexid,\n                entry[\"id\"],\n                lexidmap.get(entry[\"id\"], lexid),\n                sense.get(\"n\", i),\n                sense[\"synset\"],\n                lexidmap.get(sense[\"synset\"], lexid),\n                # members can be sense or entry IDs\n                ssrank.get(\n                    (sense[\"synset\"], sense[\"id\"]),\n                    ssrank.get((sense[\"synset\"], entry[\"id\"]), DEFAULT_MEMBER_RANK),\n                ),\n                sense.get(\"meta\"),\n            )\n            for entry in batch\n            for i, sense in enumerate(_local_senses(_senses(entry)), 1)\n        ]\n        cur.executemany(query, data)\n        progress.update(len(data))\n\n    # only store when lexicalized=False\n    unlexicalized_data = [\n        (sense[\"id\"], lexid)\n        for entry in entries\n        for sense in _local_senses(_senses(entry))\n        if not sense.get(\"lexicalized\", True)\n    ]\n    query = f\"\"\"\n        INSERT INTO unlexicalized_senses (sense_rowid) {SENSE_QUERY}\n    \"\"\"\n    cur.executemany(query, unlexicalized_data)\n\n\ndef _insert_adjpositions(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n):\n    progress.set(status=\"Sense Adjpositions\")\n    data = [\n        (s[\"id\"], lexidmap.get(s[\"id\"], lexid), s[\"adjposition\"])\n        for e in entries\n        for s in _local_senses(_senses(e))\n        if s.get(\"adjposition\")\n    ]\n    query = f\"INSERT INTO adjpositions VALUES (({SENSE_QUERY}),?)\"\n    cur.executemany(query, data)\n\n\ndef _insert_counts(\n    entries: Sequence[_AnyEntry],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Counts\")\n    data = [\n        (\n            lexid,\n            sense[\"id\"],\n            lexidmap.get(sense[\"id\"], lexid),\n            count[\"value\"],\n            count.get(\"meta\"),\n        )\n        for entry in entries\n        for sense in _senses(entry)\n        for count in sense.get(\"counts\", [])\n    ]\n    query = f\"INSERT INTO counts VALUES (null,?,({SENSE_QUERY}),?,?)\"\n    cur.executemany(query, data)\n    progress.update(len(data))\n\n\ndef _collect_frames(lexicon: _AnyLexicon) -> list[lmf.SyntacticBehaviour]:\n    # WN-LMF 1.0 syntactic behaviours are on lexical entries, and in\n    # WN-LMF 1.1 they are at the lexticon level with IDs. This\n    # function normalizes the two variants.\n\n    # IDs are not required and frame strings must be unique in a\n    # lexicon, so lookup syntactic behaviours by the frame string\n    synbhrs: dict[str, lmf.SyntacticBehaviour] = {\n        frame[\"subcategorizationFrame\"]: lmf.SyntacticBehaviour(\n            id=frame[\"id\"],\n            subcategorizationFrame=frame[\"subcategorizationFrame\"],\n            senses=frame.get(\"senses\", []),\n        )\n        for frame in lexicon.get(\"frames\", [])\n    }\n    # all relevant senses are collected into the 'senses' key\n    id_senses_map = {sb[\"id\"]: sb[\"senses\"] for sb in synbhrs.values() if sb.get(\"id\")}\n    for entry in _entries(lexicon):\n        # for WN-LMF 1.1\n        for sense in _local_senses(_senses(entry)):\n            for sbid in sense.get(\"subcat\", []):\n                id_senses_map[sbid].append(sense[\"id\"])\n        # for WN-LMF 1.0\n        if _is_external(entry) or not entry.get(\"frames\"):\n            continue\n        entry = cast(\"lmf.LexicalEntry\", entry)\n        all_senses = [s[\"id\"] for s in _senses(entry)]\n        for frame in entry.get(\"frames\", []):\n            subcat_frame = frame[\"subcategorizationFrame\"]\n            if subcat_frame not in synbhrs:\n                synbhrs[subcat_frame] = lmf.SyntacticBehaviour(\n                    subcategorizationFrame=subcat_frame,\n                    senses=[],\n                )\n            senses = frame.get(\"senses\", []) or all_senses\n            synbhrs[subcat_frame][\"senses\"].extend(senses)\n    return list(synbhrs.values())\n\n\ndef _insert_syntactic_behaviours(\n    synbhrs: Sequence[lmf.SyntacticBehaviour],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Syntactic Behaviours\")\n\n    query = \"INSERT INTO syntactic_behaviours VALUES (null,?,?,?)\"\n    sbdata = [\n        (sb.get(\"id\") or None, lexid, sb[\"subcategorizationFrame\"]) for sb in synbhrs\n    ]\n    cur.executemany(query, sbdata)\n\n    # syntactic behaviours don't have a required ID; index on frame\n    framemap: dict[str, list[str]] = {\n        sb[\"subcategorizationFrame\"]: sb.get(\"senses\", []) for sb in synbhrs\n    }\n    query = f\"\"\"\n        INSERT INTO syntactic_behaviour_senses\n        VALUES ((SELECT rowid\n                   FROM syntactic_behaviours\n                  WHERE lexicon_rowid=? AND frame=?),\n                ({SENSE_QUERY}))\n    \"\"\"\n    sbsdata = [\n        (lexid, frame, sid, lexidmap.get(sid, lexid))\n        for frame in framemap\n        for sid in framemap[frame]\n    ]\n    cur.executemany(query, sbsdata)\n\n    progress.update(len(synbhrs))\n\n\ndef _insert_sense_relations(\n    lexicon: _AnyLexicon,\n    lexid: int,\n    lexidmap: _LexIdMap,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Sense Relations\")\n    # need to separate relations into those targeting senses vs synsets\n    synset_ids = {ss[\"id\"] for ss in _synsets(lexicon)}\n    sense_ids = {s[\"id\"] for e in _entries(lexicon) for s in _senses(e)}\n    s_s_rels = []\n    s_ss_rels = []\n    for entry in _entries(lexicon):\n        for sense in _senses(entry):\n            slid = lexidmap.get(sense[\"id\"], lexid)\n            for relation in sense.get(\"relations\", []):\n                target_id = relation[\"target\"]\n                tlid = lexidmap.get(target_id, lexid)\n                if target_id in sense_ids:\n                    s_s_rels.append((sense[\"id\"], slid, tlid, relation))\n                elif target_id in synset_ids:\n                    s_ss_rels.append((sense[\"id\"], slid, tlid, relation))\n                else:\n                    raise Error(\n                        f\"relation target is not a known sense or synset: {target_id}\"\n                    )\n    hyperparams = [\n        (\"sense_relations\", SENSE_QUERY, s_s_rels),\n        (\"sense_synset_relations\", SYNSET_QUERY, s_ss_rels),\n    ]\n    for table, target_query, rels in hyperparams:\n        query = f\"\"\"\n            INSERT INTO {table}\n            VALUES (null,?,({SENSE_QUERY}),({target_query}),({RELTYPE_QUERY}),?)\n        \"\"\"\n        for batch in _batch(rels):\n            data = [\n                (\n                    lexid,\n                    sense_id,\n                    slid,\n                    relation[\"target\"],\n                    tlid,\n                    relation[\"relType\"],\n                    relation.get(\"meta\"),\n                )\n                for sense_id, slid, tlid, relation in batch\n            ]\n            cur.executemany(query, data)\n            progress.update(len(data))\n\n\ndef _insert_examples(\n    objs: Sequence[lmf.Sense | lmf.ExternalSense | lmf.Synset | lmf.ExternalSynset],\n    lexid: int,\n    lexidmap: _LexIdMap,\n    table: str,\n    cur: sqlite3.Cursor,\n    progress: ProgressHandler,\n) -> None:\n    progress.set(status=\"Examples\")\n    if table == \"sense_examples\":\n        query = f\"INSERT INTO {table} VALUES (null,?,({SENSE_QUERY}),?,?,?)\"\n    else:\n        query = f\"INSERT INTO {table} VALUES (null,?,({SYNSET_QUERY}),?,?,?)\"\n    for batch in _batch(objs):\n        data = [\n            (\n                lexid,\n                obj[\"id\"],\n                lexidmap.get(obj[\"id\"], lexid),\n                example[\"text\"],\n                example.get(\"language\"),\n                example.get(\"meta\"),\n            )\n            for obj in batch\n            for example in obj.get(\"examples\", [])\n        ]\n        # be careful of SQL injection here\n        cur.executemany(query, data)\n        progress.update(len(data))\n\n\ndef _add_ili(\n    source: Path,\n    progress: ProgressHandler,\n) -> None:\n    query = f\"\"\"\n        INSERT INTO ilis\n        VALUES (null,?,({ILISTAT_QUERY}),?,null)\n            ON CONFLICT(id) DO\n               UPDATE SET status_rowid=excluded.status_rowid,\n                          definition=excluded.definition\n    \"\"\"\n    with connect() as conn:\n        cur = conn.cursor()\n\n        progress.flash(f\"Reading ILI file: {source!s}\")\n        ili = list(_ili.load_tsv(source))\n\n        progress.flash(\"Updating ILI Status Names\")\n        statuses = {info.get(\"status\", \"active\") for info in ili}\n        cur.executemany(\n            \"INSERT OR IGNORE INTO ili_statuses VALUES (null,?)\",\n            [(stat,) for stat in sorted(statuses)],\n        )\n\n        progress.set(count=0, total=len(ili), status=\"ILI\")\n        for batch in _batch(ili):\n            data = [\n                (info[\"ili\"], info.get(\"status\", \"active\"), info.get(\"definition\"))\n                for info in batch\n            ]\n            cur.executemany(query, data)\n            progress.update(len(data))\n\n\ndef remove(lexicon: str, progress_handler: type[ProgressHandler] = ProgressBar) -> None:\n    \"\"\"Remove lexicon(s) from the database.\n\n    The *lexicon* argument is a :ref:`lexicon specifier\n    <lexicon-specifiers>`. Note that this removes a lexicon and not a\n    project, so the lexicons of projects containing multiple lexicons\n    will need to be removed individually or, if applicable, a star\n    specifier.\n\n    The *progress_handler* parameter takes a subclass of\n    :class:`wn.util.ProgressHandler`. An instance of the class will be\n    created, used, and closed by this function.\n\n    >>> wn.remove(\"ewn:2019\")  # removes a single lexicon\n    >>> wn.remove(\"*:1.3+omw\")  # removes all lexicons with version 1.3+omw\n\n    \"\"\"\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n    progress = progress_handler(message=\"Removing\", unit=\"\\be5 operations\")\n\n    conn = connect()\n    conn.set_progress_handler(progress.update, 100000)\n    try:\n        for lexspec in resolve_lexicon_specifiers(lexicon=lexicon):\n            extensions = get_lexicon_extensions(lexspec)\n\n            with conn:\n                for ext_spec in reversed(extensions):\n                    progress.set(status=f\"{ext_spec} (extension)\")\n                    conn.execute(\n                        \"DELETE FROM lexicons WHERE specifier = ?\",\n                        (ext_spec,),\n                    )\n                    progress.flash(f\"Removed {ext_spec}\\n\")\n\n                extra = f\" (and {len(extensions)} extension(s))\" if extensions else \"\"\n                progress.set(status=f\"{lexspec}\", count=0)\n                conn.execute(\n                    \"DELETE FROM lexicons WHERE specifier = ?\",\n                    (lexspec,),\n                )\n                progress.flash(f\"Removed {lexspec}{extra}\\n\")\n\n    finally:\n        progress.close()\n        conn.set_progress_handler(None, 0)\n\n\ndef _entries(lex: _AnyLexicon) -> Sequence[_AnyEntry]:\n    return lex.get(\"entries\", [])\n\n\ndef _forms(e: _AnyEntry) -> Sequence[_AnyForm]:\n    return e.get(\"forms\", [])\n\n\ndef _senses(e: _AnyEntry) -> Sequence[_AnySense]:\n    return e.get(\"senses\", [])\n\n\ndef _synsets(lex: _AnyLexicon) -> Sequence[_AnySynset]:\n    return lex.get(\"synsets\", [])\n\n\ndef _is_external(x: _AnyForm | _AnyLemma | _AnyEntry | _AnySense | _AnySynset) -> bool:\n    return x.get(\"external\", False) is True\n\n\ndef _local_synsets(synsets: Sequence[_AnySynset]) -> Iterator[lmf.Synset]:\n    for ss in synsets:\n        if _is_external(ss):\n            continue\n        yield cast(\"lmf.Synset\", ss)\n\n\ndef _local_entries(entries: Sequence[_AnyEntry]) -> Iterator[lmf.LexicalEntry]:\n    for e in entries:\n        if _is_external(e):\n            continue\n        yield cast(\"lmf.LexicalEntry\", e)\n\n\ndef _local_senses(senses: Sequence[_AnySense]) -> Iterator[lmf.Sense]:\n    for s in senses:\n        if _is_external(s):\n            continue\n        yield cast(\"lmf.Sense\", s)\n"
  },
  {
    "path": "wn/_config.py",
    "content": "\"\"\"\nLocal configuration settings.\n\"\"\"\n\nimport os\nfrom collections.abc import Sequence\nfrom enum import Enum\nfrom fnmatch import fnmatch\nfrom importlib.resources import as_file, files\nfrom pathlib import Path\nfrom typing import Any, TypedDict\n\ntry:\n    # python_version >= 3.11\n    import tomllib  # type: ignore\nexcept ImportError:\n    import tomli as tomllib  # type: ignore\n\nfrom wn._exceptions import ConfigurationError, ProjectError\nfrom wn._types import AnyPath\nfrom wn._util import (\n    format_lexicon_specifier,\n    is_str_key_dict,\n    short_hash,\n    split_lexicon_specifier,\n)\n\n# The index file is a project file of Wn\nwith as_file(files(\"wn\") / \"index.toml\") as index_file:\n    INDEX_FILE_PATH = index_file\n# The directory where downloaded and added data will be stored.\nDEFAULT_DATA_DIR = Path.home() / \".wn_data\"\nDATABASE_FILENAME = \"wn.db\"\n\n\nclass ResourceType(str, Enum):\n    WORDNET = \"wordnet\"\n    ILI = \"ili\"\n\n\nclass VersionInfo(TypedDict):\n    resource_urls: list[str]\n    license: str | None\n    error: str | None\n\n\nclass ProjectInfo(TypedDict):\n    type: ResourceType\n    label: str | None\n    language: str | None\n    license: str | None\n    error: str | None\n    versions: dict[str, VersionInfo]\n\n\nclass ResolvedProjectInfo(TypedDict):\n    id: str\n    version: str\n    type: ResourceType\n    label: str | None\n    language: str | None\n    license: str | None\n    resource_urls: list[str]\n    cache: Path | None\n\n\nclass CacheEntry(TypedDict):\n    path: Path\n    id: str | None\n    version: str | None\n    url: str | None\n\n\nclass WNConfig:\n    _projects: dict[str, ProjectInfo]\n\n    def __init__(self):\n        self._data_directory = Path(os.getenv(\"WN_DATA_DIR\", default=DEFAULT_DATA_DIR))\n        self._projects = {}\n        self._dbpath = self._data_directory / DATABASE_FILENAME\n        self.allow_multithreading = False\n\n    @property\n    def data_directory(self) -> Path:\n        \"\"\"The file system directory where Wn's data is stored.\n\n        Assign a new path to change where the database and downloads\n        are stored.\n\n        >>> wn.config.data_directory = \"~/.cache/wn\"\n        >>> wn.config.database_path\n        PosixPath('/home/username/.cache/wn/wn.db')\n        >>> wn.config.downloads_directory\n        PosixPath('/home/username/.cache/wn/downloads')\n\n        \"\"\"\n        dir = self._data_directory\n        dir.mkdir(exist_ok=True)\n        return dir\n\n    @data_directory.setter\n    def data_directory(self, path: AnyPath) -> None:\n        dir = Path(path).expanduser()\n        if dir.exists() and not dir.is_dir():\n            raise ConfigurationError(f\"path exists and is not a directory: {dir}\")\n        self._data_directory = dir\n        self._dbpath = dir / DATABASE_FILENAME\n\n    @property\n    def database_path(self) -> Path:\n        \"\"\"The path to the database file.\n\n        The database path is derived from :attr:`data_directory` and\n        cannot be changed directly.\n\n        \"\"\"\n        return self._dbpath\n\n    @property\n    def downloads_directory(self) -> Path:\n        \"\"\"The file system directory where downloads are cached.\n\n        The downloads directory is derived from :attr:`data_directory`\n        and cannot be changed directly.\n\n        \"\"\"\n        dir = self.data_directory / \"downloads\"\n        dir.mkdir(exist_ok=True)\n        return dir\n\n    @property\n    def index(self) -> dict[str, ProjectInfo]:\n        \"\"\"The project index.\"\"\"\n        return self._projects\n\n    def add_project(\n        self,\n        id: str,\n        type: ResourceType = ResourceType.WORDNET,\n        label: str | None = None,\n        language: str | None = None,\n        license: str | None = None,\n        error: str | None = None,\n    ) -> None:\n        \"\"\"Add a new wordnet project to the index.\n\n        Arguments:\n            id: short identifier of the project\n            type: project type (default 'wordnet')\n            label: full name of the project\n            language: `BCP 47`_ language code of the resource\n            license: link or name of the project's default license\n            error: if set, the error message to use when the project\n              is accessed\n\n        .. _BCP 47: https://en.wikipedia.org/wiki/IETF_language_tag\n        \"\"\"\n        if id in self._projects:\n            raise ValueError(f\"project already added: {id}\")\n        self._projects[id] = ProjectInfo(\n            type=ResourceType(type),\n            label=label,\n            language=language,\n            license=license,\n            error=error,\n            versions={},\n        )\n\n    def add_project_version(\n        self,\n        id: str,\n        version: str,\n        url: str | None = None,\n        error: str | None = None,\n        license: str | None = None,\n    ) -> None:\n        \"\"\"Add a new resource version for a project.\n\n        Exactly one of *url* or *error* must be specified.\n\n        Arguments:\n            id: short identifier of the project\n            version: version string of the resource\n            url: space-separated list of web addresses for the resource\n            license: link or name of the resource's license; if not\n              given, the project's default license will be used.\n            error: if set, the error message to use when the project\n              is accessed\n\n        \"\"\"\n        if url and error:\n            spec = format_lexicon_specifier(id, version)\n            raise ConfigurationError(f\"{spec} specifies both url and redirect\")\n\n        version_data = VersionInfo(\n            resource_urls=url.split() if (url and not error) else [],\n            license=license,\n            error=error,\n        )\n        project = self._projects[id]\n        project[\"versions\"][version] = version_data\n\n    def get_project_info(self, arg: str) -> ResolvedProjectInfo:\n        \"\"\"Return information about an indexed project version.\n\n        If the project has been downloaded and cached, the ``\"cache\"``\n        key will point to the path of the cached file, otherwise its\n        value is ``None``.\n\n        Arguments:\n            arg: a project specifier\n\n        Example:\n\n            >>> info = wn.config.get_project_info(\"oewn:2021\")\n            >>> info[\"label\"]\n            'Open English WordNet'\n\n        \"\"\"\n        id, version = split_lexicon_specifier(arg)\n        if id not in self._projects:\n            raise ProjectError(f\"no such project id: {id}\")\n        project: ProjectInfo = self._projects[id]\n        if project[\"error\"]:\n            raise ProjectError(project[\"error\"])\n\n        versions: dict = project[\"versions\"]\n        if not version or version == \"*\":\n            version = next(iter(versions), \"\")\n        if not version:\n            raise ProjectError(f\"no versions available for {id}\")\n        elif version not in versions:\n            raise ProjectError(f\"no such version: {version!r} ({id})\")\n        info = versions[version]\n        if info[\"error\"]:\n            raise ProjectError(info[\"error\"])\n\n        urls = info.get(\"resource_urls\", [])\n\n        return ResolvedProjectInfo(\n            id=id,\n            version=version,\n            type=project[\"type\"],\n            label=project[\"label\"],\n            language=project[\"language\"],\n            license=info.get(\"license\", project.get(\"license\")),\n            resource_urls=urls,\n            cache=_get_cache_path_for_urls(self, urls),\n        )\n\n    def get_cache_path(self, url: str) -> Path:\n        \"\"\"Return the path for caching *url*.\n\n        Note that this is just a path operation and does\n        not signify that the file exists in the file system.\n\n        \"\"\"\n        filename = short_hash(url)\n        return self.downloads_directory / filename\n\n    def list_cache_entries(self, arg: str = \"*\") -> list[CacheEntry]:\n        \"\"\"Return a list of cached resources.\n\n        Use *arg* as a pattern to match project specifiers. It\n        defaults to `\"*\"` to select all cached entries.\n\n        Each entry on the list is a dictionary with the keys:\n        * `\"path\"` -- the path of the cached file\n        * `\"id\"`  -- the ID of the cached resource\n        * `\"version\"` -- the version of the cached resource\n        * `\"url\"` -- the URL of the cached resource\n\n        Note that cached files are stored with a hash of their URL as\n        the filename and that it is not feasible to recover the URL\n        from the hash alone. Therefore, for lexicons downloaded with a\n        URL that does not appear in the index, the ID, version, and URL\n        and will be :python:`None` instead.\n        \"\"\"\n        arg = arg.strip()\n        cache_map = _cache_map(self)\n        entries: list[CacheEntry] = []\n        for cache_path in self.downloads_directory.iterdir():\n            if cache_path in cache_map:\n                id, version, url = cache_map[cache_path]\n                specifier = format_lexicon_specifier(id, version)\n                if not (fnmatch(specifier, arg) or url == arg):\n                    continue\n                entries.append(\n                    CacheEntry(path=cache_path, id=id, version=version, url=url)\n                )\n            elif arg in (\"*\", \"*:*\"):\n                entries.append(\n                    CacheEntry(path=cache_path, id=None, version=None, url=None)\n                )\n        return entries\n\n    def update(self, data: dict[str, Any]) -> None:\n        \"\"\"Update the configuration with items in *data*.\n\n        Items are only inserted or replaced, not deleted. If a project\n        index is provided in the ``\"index\"`` key, then either the\n        project must not already be indexed or any project fields\n        (label, language, or license) that are specified must be equal\n        to the indexed project.\n\n        \"\"\"\n        if datadir := data.get(\"data_directory\"):\n            if not isinstance(datadir, (str, Path)):\n                raise ConfigurationError(\n                    \"data_directory must be a str or Path, \"\n                    f\"not {type(datadir).__name__}\"\n                )\n            self.data_directory = datadir\n        if index := data.get(\"index\", {}):\n            if not is_str_key_dict(index):\n                raise ConfigurationError(\"index must be a dict with str keys\")\n            self._update_index(index)\n\n    def _update_index(self, index: dict[str, Any]) -> None:\n        for id, project in index.items():\n            if not is_str_key_dict(project):\n                raise ConfigurationError(f\"invalid project: {project}\")\n            if id in self._projects:\n                # validate that they are the same\n                _project = self._projects[id]\n                for attr in (\"label\", \"language\", \"license\"):\n                    if attr in project and project[attr] != _project[attr]:\n                        raise ConfigurationError(f\"{attr} mismatch for {id}\")\n            else:\n                self.add_project(\n                    id,\n                    type=project.get(\"type\", ResourceType.WORDNET),\n                    label=project.get(\"label\"),\n                    language=project.get(\"language\"),\n                    license=project.get(\"license\"),\n                    error=project.get(\"error\"),\n                )\n            for version, info in project.get(\"versions\", {}).items():\n                if info.get(\"url\") and project.get(\"error\"):\n                    spec = format_lexicon_specifier(id, version)\n                    raise ConfigurationError(f\"{spec} url specified with default error\")\n                self.add_project_version(\n                    id,\n                    version,\n                    url=info.get(\"url\"),\n                    license=info.get(\"license\"),\n                    error=info.get(\"error\"),\n                )\n\n    def load_index(self, path: AnyPath) -> None:\n        \"\"\"Load and update with the project index at *path*.\n\n        The project index is a TOML_ file containing project and\n        version information. For example:\n\n        .. code-block:: toml\n\n           [ewn]\n             label = \"Open English WordNet\"\n             language = \"en\"\n             license = \"https://creativecommons.org/licenses/by/4.0/\"\n             [ewn.versions.2019]\n               url = \"https://en-word.net/static/english-wordnet-2019.xml.gz\"\n             [ewn.versions.2020]\n               url = \"https://en-word.net/static/english-wordnet-2020.xml.gz\"\n\n        .. _TOML: https://toml.io\n\n        \"\"\"\n        path = Path(path).expanduser()\n        with path.open(\"rb\") as indexfile:\n            try:\n                index = tomllib.load(indexfile)\n            except tomllib.TOMLDecodeError as exc:\n                raise ConfigurationError(\"malformed index file\") from exc\n            if not is_str_key_dict(index):\n                raise ConfigurationError(\"invalid index file\")\n        self.update({\"index\": index})\n\n\ndef _get_cache_path_for_urls(\n    config: WNConfig,\n    urls: Sequence[str],\n) -> Path | None:\n    for url in urls:\n        path = config.get_cache_path(url)\n        if path.is_file():\n            return path\n    return None\n\n\ndef _cache_map(config: WNConfig) -> dict[Path, tuple[str, str, str]]:\n    \"\"\"Return a dict of cache hashes to resource info tuples.\n\n    Each tuple contains the id, version, and URL of the indexed\n    resource. The hash is based on the URL and the tuple only contains\n    information from the index. They do not indicate whether the\n    resource has been cached.\n    \"\"\"\n    return {\n        config.get_cache_path(url): (id, version, url)\n        for id, p_info in config.index.items()\n        for version, v_info in p_info[\"versions\"].items()\n        for url in v_info[\"resource_urls\"]\n    }\n\n\nconfig = WNConfig()\nconfig.load_index(INDEX_FILE_PATH)\n"
  },
  {
    "path": "wn/_core.py",
    "content": "from __future__ import annotations\n\nimport enum\nfrom dataclasses import dataclass, field\nfrom typing import TYPE_CHECKING, Literal, TypeVar, overload\n\nfrom wn import taxonomy\nfrom wn._lexicon import (\n    LexiconConfiguration,\n    LexiconElement,\n    LexiconElementWithMetadata,\n)\nfrom wn._queries import Pronunciation as PronunciationTuple\nfrom wn._queries import Tag as TagTuple\nfrom wn._queries import (\n    find_entries,\n    find_synsets,\n    get_adjposition,\n    get_definitions,\n    get_entry_forms,\n    get_entry_senses,\n    get_examples,\n    get_expanded_synset_relations,\n    get_lexfile,\n    get_lexicalized,\n    get_lexicon_extension_bases,\n    get_lexicon_extensions,\n    get_metadata,\n    get_sense_counts,\n    get_sense_relations,\n    get_sense_synset_relations,\n    get_synset_members,\n    get_synset_relations,\n    get_synsets_for_ilis,\n    get_syntactic_behaviours,\n    resolve_lexicon_specifiers,\n)\nfrom wn._util import unique_list\n\nif TYPE_CHECKING:\n    from collections.abc import Iterator, Sequence\n\n    from wn._metadata import Metadata\n\n\n_INFERRED_SYNSET = \"*INFERRED*\"\n\n\nclass _EntityType(str, enum.Enum):\n    \"\"\"Identifies the database table of an entity.\"\"\"\n\n    LEXICONS = \"lexicons\"\n    ENTRIES = \"entries\"\n    SENSES = \"senses\"\n    SYNSETS = \"synsets\"\n    SENSE_RELATIONS = \"sense_relations\"\n    SENSE_SYNSET_RELATIONS = \"sense_synset_relations\"\n    SYNSET_RELATIONS = \"synset_relations\"\n    UNSET = \"\"\n\n\n_EMPTY_LEXCONFIG = LexiconConfiguration(\n    lexicons=(),\n    expands=(),\n    default_mode=False,\n)\n\n\nclass _LexiconDataElement(LexiconElementWithMetadata):\n    \"\"\"Base class for Words, Senses, and Synsets.\n\n    These elements always have a required ID and are used as the\n    starting point of secondary queries, so they also store the\n    configuration of lexicons used in the original query.\n    \"\"\"\n\n    __slots__ = \"_lexconf\", \"id\"\n\n    id: str\n    _lexconf: LexiconConfiguration\n\n    def __init__(\n        self,\n        id: str,\n        _lexicon: str = \"\",\n        _lexconf: LexiconConfiguration = _EMPTY_LEXCONFIG,\n    ) -> None:\n        self.id = id\n        self._lexicon = _lexicon\n        self._lexconf = _lexconf\n\n    def __eq__(self, other) -> bool:\n        if isinstance(other, type(self)) or isinstance(self, type(other)):\n            return self.id == other.id and self._lexicon == other._lexicon\n        return NotImplemented\n\n    def __hash__(self) -> int:\n        return hash((self.id, self._lexicon))\n\n    def _get_lexicons(self) -> tuple[str, ...]:\n        if self._lexconf.default_mode:\n            return (\n                self._lexicon,\n                *get_lexicon_extension_bases(self._lexicon),\n                *get_lexicon_extensions(self._lexicon),\n            )\n        else:\n            return self._lexconf.lexicons\n\n\n@dataclass(frozen=True, slots=True)\nclass Pronunciation(LexiconElement):\n    \"\"\"A class for word form pronunciations.\"\"\"\n\n    __module__ = \"wn\"\n\n    value: str\n    variety: str | None = None\n    notation: str | None = None\n    phonemic: bool = True\n    audio: str | None = None\n    _lexicon: str = field(default=\"\", repr=False, compare=False)\n\n\n@dataclass(frozen=True, slots=True)\nclass Tag(LexiconElement):\n    \"\"\"A general-purpose tag class for word forms.\"\"\"\n\n    __module__ = \"wn\"\n\n    tag: str\n    category: str\n    _lexicon: str = field(default=\"\", repr=False, compare=False)\n\n\n@dataclass(frozen=True, slots=True)\nclass Form(LexiconElement):\n    \"\"\"A word-form.\"\"\"\n\n    __module__ = \"wn\"\n\n    value: str\n    id: str | None = field(default=None, repr=False, compare=False)\n    script: str | None = field(default=None, repr=False)\n    _lexicon: str = field(default=\"\", repr=False, compare=False)\n    _pronunciations: tuple[Pronunciation, ...] = field(\n        default_factory=tuple, repr=False, compare=False\n    )\n    _tags: tuple[Tag, ...] = field(default_factory=tuple, repr=False, compare=False)\n\n    def pronunciations(self) -> list[Pronunciation]:\n        return list(self._pronunciations)\n\n    def tags(self) -> list[Tag]:\n        return list(self._tags)\n\n\ndef _make_form(\n    form: str,\n    id: str | None,\n    script: str | None,\n    lexicon: str,\n    prons: list[PronunciationTuple],\n    tags: list[TagTuple],\n) -> Form:\n    return Form(\n        form,\n        id=id,\n        script=script,\n        _lexicon=lexicon,\n        _pronunciations=tuple(Pronunciation(*data) for data in prons),\n        _tags=tuple(Tag(*data) for data in tags),\n    )\n\n\nclass Word(_LexiconDataElement):\n    \"\"\"A class for words (also called lexical entries) in a wordnet.\"\"\"\n\n    __slots__ = (\"pos\",)\n    __module__ = \"wn\"\n\n    _ENTITY_TYPE = _EntityType.ENTRIES\n\n    pos: str\n\n    def __init__(\n        self,\n        id: str,\n        pos: str,\n        _lexicon: str = \"\",\n        _lexconf: LexiconConfiguration = _EMPTY_LEXCONFIG,\n    ):\n        super().__init__(id=id, _lexicon=_lexicon, _lexconf=_lexconf)\n        self.pos = pos\n\n    def __repr__(self) -> str:\n        return f\"Word({self.id!r})\"\n\n    @overload\n    def lemma(self, *, data: Literal[False] = False) -> str: ...\n    @overload\n    def lemma(self, *, data: Literal[True] = True) -> Form: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def lemma(self, *, data: bool) -> str | Form: ...\n\n    def lemma(self, *, data: bool = False) -> str | Form:\n        \"\"\"Return the canonical form of the word.\n\n        If the *data* argument is :python:`False` (the default), the\n        lemma is returned as a :class:`str` type. If it is\n        :python:`True`, a :class:`wn.Form` object is used instead.\n\n        Example:\n\n            >>> wn.words(\"wolves\")[0].lemma()\n            'wolf'\n            >>> wn.words(\"wolves\")[0].lemma(data=True)\n            Form(value='wolf')\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        lemma_data = next(get_entry_forms(self.id, lexicons))\n        if data:\n            return _make_form(*lemma_data)\n        else:\n            return lemma_data[0]\n\n    @overload\n    def forms(self, *, data: Literal[False] = False) -> list[str]: ...\n    @overload\n    def forms(self, *, data: Literal[True] = True) -> list[Form]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def forms(self, *, data: bool) -> list[str] | list[Form]: ...\n\n    def forms(self, *, data: bool = False) -> list[str] | list[Form]:\n        \"\"\"Return the list of all encoded forms of the word.\n\n        If the *data* argument is :python:`False` (the default), the\n        forms are returned as :class:`str` types. If it is\n        :python:`True`, :class:`wn.Form` objects are used instead.\n\n        Example:\n\n            >>> wn.words(\"wolf\")[0].forms()\n            ['wolf', 'wolves']\n            >>> wn.words(\"wolf\")[0].forms(data=True)\n            [Form(value='wolf'), Form(value='wolves')]\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        form_data = list(get_entry_forms(self.id, lexicons))\n        if data:\n            return [_make_form(*data) for data in form_data]\n        else:\n            return [form for form, *_ in form_data]\n\n    def senses(self) -> list[Sense]:\n        \"\"\"Return the list of senses of the word.\n\n        Example:\n\n            >>> wn.words(\"zygoma\")[0].senses()\n            [Sense('ewn-zygoma-n-05292350-01')]\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        iterable = get_entry_senses(self.id, lexicons)\n        return [Sense(*sense_data, _lexconf=self._lexconf) for sense_data in iterable]\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the word's metadata.\"\"\"\n        return get_metadata(self.id, self._lexicon, \"entries\")\n\n    def synsets(self) -> list[Synset]:\n        \"\"\"Return the list of synsets of the word.\n\n        Example:\n\n            >>> wn.words(\"addendum\")[0].synsets()\n            [Synset('ewn-06411274-n')]\n\n        \"\"\"\n        return [sense.synset() for sense in self.senses()]\n\n    def derived_words(self) -> list[Word]:\n        \"\"\"Return the list of words linked through derivations on the senses.\n\n        Example:\n\n            >>> wn.words(\"magical\")[0].derived_words()\n            [Word('ewn-magic-n'), Word('ewn-magic-n')]\n\n        \"\"\"\n        return [\n            derived_sense.word()\n            for sense in self.senses()\n            for derived_sense in sense.get_related(\"derivation\")\n        ]\n\n    def translate(\n        self,\n        lexicon: str | None = None,\n        *,\n        lang: str | None = None,\n    ) -> dict[Sense, list[Word]]:\n        \"\"\"Return a mapping of word senses to lists of translated words.\n\n        Arguments:\n            lexicon: lexicon specifier of translated words\n            lang: BCP-47 language code of translated words\n\n        Example:\n\n            >>> w = wn.words(\"water bottle\", pos=\"n\")[0]\n            >>> for sense, words in w.translate(lang=\"ja\").items():\n            ...     print(sense, [jw.lemma() for jw in words])\n            Sense('ewn-water_bottle-n-04564934-01') ['水筒']\n\n        \"\"\"\n        result = {}\n        for sense in self.senses():\n            result[sense] = [\n                t_sense.word()\n                for t_sense in sense.translate(lang=lang, lexicon=lexicon)\n            ]\n        return result\n\n\nclass Relation(LexiconElementWithMetadata):\n    \"\"\"A class to model relations between senses or synsets.\"\"\"\n\n    __slots__ = \"_lexicon\", \"_metadata\", \"name\", \"source_id\", \"target_id\"\n    __module__ = \"wn\"\n\n    name: str\n    source_id: str\n    target_id: str\n    _metadata: Metadata | None\n\n    def __init__(\n        self,\n        name: str,\n        source_id: str,\n        target_id: str,\n        lexicon: str,\n        *,\n        metadata: Metadata | None = None,\n    ):\n        self.name = name\n        self.source_id = source_id\n        self.target_id = target_id\n        self._lexicon = lexicon\n        self._metadata = metadata\n\n    def __repr__(self) -> str:\n        return (\n            self.__class__.__name__\n            + f\"({self.name!r}, {self.source_id!r}, {self.target_id!r})\"\n        )\n\n    def __eq__(self, other) -> bool:\n        if not isinstance(other, Relation):\n            return NotImplemented\n        return (\n            self.name == other.name\n            and self.source_id == other.source_id\n            and self.target_id == other.target_id\n            and self._lexicon == other._lexicon\n            and self.subtype == other.subtype\n        )\n\n    def __hash__(self) -> int:\n        datum = self.name, self.source_id, self.target_id, self._lexicon, self.subtype\n        return hash(datum)\n\n    @property\n    def subtype(self) -> str | None:\n        \"\"\"\n        The value of the ``dc:type`` metadata.\n\n        If ``dc:type`` is not specified in the metadata, ``None`` is\n        returned instead.\n        \"\"\"\n        return self.metadata().get(\"type\")\n\n\nT = TypeVar(\"T\", bound=\"_Relatable\")\n\n\nclass _Relatable(_LexiconDataElement):\n    @overload\n    def relations(\n        self: T, *args: str, data: Literal[False] = False\n    ) -> dict[str, list[T]]: ...\n    @overload\n    def relations(\n        self: T, *args: str, data: Literal[True] = True\n    ) -> dict[Relation, T]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def relations(\n        self: T, *args: str, data: bool = False\n    ) -> dict[str, list[T]] | dict[Relation, T]: ...\n\n    def relations(\n        self: T, *args: str, data: bool = False\n    ) -> dict[str, list[T]] | dict[Relation, T]:\n        raise NotImplementedError\n\n    def get_related(self: T, *args: str) -> list[T]:\n        raise NotImplementedError\n\n    def closure(self: T, *args: str) -> Iterator[T]:\n        visited = set()\n        queue = self.get_related(*args)\n        while queue:\n            relatable = queue.pop(0)\n            if relatable.id not in visited:\n                visited.add(relatable.id)\n                yield relatable\n                queue.extend(relatable.get_related(*args))\n\n    def relation_paths(self: T, *args: str, end: T | None = None) -> Iterator[list[T]]:\n        agenda: list[tuple[list[T], set[T]]] = [\n            ([target], {self, target})\n            for target in self.get_related(*args)\n            if target != self  # avoid self loops?\n        ]\n        while agenda:\n            path, visited = agenda.pop()\n            if end is not None and path[-1] == end:\n                yield path\n            else:\n                related = [\n                    target\n                    for target in path[-1].get_related(*args)\n                    if target not in visited\n                ]\n                if related:\n                    for synset in reversed(related):\n                        new_path = [*path, synset]\n                        new_visited = visited | {synset}\n                        agenda.append((new_path, new_visited))\n                elif end is None:\n                    yield path\n\n\n@dataclass(frozen=True, slots=True)\nclass Example(LexiconElementWithMetadata):\n    \"\"\"Class for modeling Sense and Synset examples.\"\"\"\n\n    __module__ = \"wn\"\n\n    text: str\n    language: str | None = None\n    _lexicon: str = \"\"\n    _metadata: Metadata | None = field(default=None, repr=False, compare=False)\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the example's metadata.\"\"\"\n        return self._metadata if self._metadata is not None else {}\n\n\n@dataclass(frozen=True, slots=True)\nclass Definition(LexiconElementWithMetadata):\n    \"\"\"Class for modeling Synset definitions.\"\"\"\n\n    __module__ = \"wn\"\n\n    text: str\n    language: str | None = None\n    source_sense_id: str | None = field(default=None, compare=False)\n    _lexicon: str = \"\"\n    _metadata: Metadata | None = field(default=None, compare=False, repr=False)\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the example's metadata.\"\"\"\n        return self._metadata if self._metadata is not None else {}\n\n\nclass Synset(_Relatable):\n    \"\"\"Class for modeling wordnet synsets.\"\"\"\n\n    __slots__ = \"_ili\", \"pos\"\n    __module__ = \"wn\"\n\n    _ENTITY_TYPE = _EntityType.SYNSETS\n\n    pos: str\n    _ili: str | None\n\n    def __init__(\n        self,\n        id: str,\n        pos: str,\n        ili: str | None = None,\n        _lexicon: str = \"\",\n        _lexconf: LexiconConfiguration = _EMPTY_LEXCONFIG,\n    ):\n        super().__init__(id=id, _lexicon=_lexicon, _lexconf=_lexconf)\n        self.pos = pos\n        self._ili = ili\n\n    @classmethod\n    def empty(\n        cls,\n        id: str,\n        ili: str | None = None,\n        _lexicon: str = \"\",\n        _lexconf: LexiconConfiguration = _EMPTY_LEXCONFIG,\n    ):\n        return cls(id, pos=\"\", ili=ili, _lexicon=_lexicon, _lexconf=_lexconf)\n\n    def __eq__(self, other) -> bool:\n        # include ili in the hash so inferred synsets don't hash the same\n        if isinstance(other, Synset):\n            return (\n                self.id == other.id\n                and self._ili == other._ili\n                and self._lexicon == other._lexicon\n            )\n        return NotImplemented\n\n    def __hash__(self) -> int:\n        return hash((self.id, self._ili, self._lexicon))\n\n    def __repr__(self) -> str:\n        return f\"Synset({self.id!r})\"\n\n    @property\n    def ili(self) -> str | None:\n        return self._ili\n\n    @overload\n    def definition(self, *, data: Literal[False] = False) -> str | None: ...\n    @overload\n    def definition(self, *, data: Literal[True] = True) -> Definition | None: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def definition(self, *, data: bool) -> str | Definition | None: ...\n\n    def definition(self, *, data: bool = False) -> str | Definition | None:\n        \"\"\"Return the first definition found for the synset.\n\n        If the *data* argument is :python:`False` (the default), the\n        definition is returned as a :class:`str` type. If it is\n        :python:`True`, a :class:`wn.Definition` object is used instead.\n\n        Example:\n\n            >>> wn.synsets(\"cartwheel\", pos=\"n\")[0].definition()\n            'a wheel that has wooden spokes and a metal rim'\n            >>> wn.synsets(\"cartwheel\", pos=\"n\")[0].definition(data=True)\n            [Definition(text='a wheel that has wooden spokes and a metal rim',\n              language=None, source_sense_id=None)]\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        if defns := get_definitions(self.id, lexicons):\n            text, lang, sense_id, lex, meta = defns[0]\n            if data:\n                return Definition(\n                    text,\n                    language=lang,\n                    source_sense_id=sense_id,\n                    _lexicon=lex,\n                    _metadata=meta,\n                )\n            else:\n                return text\n        return None\n\n    @overload\n    def definitions(self, *, data: Literal[False] = False) -> list[str]: ...\n    @overload\n    def definitions(self, *, data: Literal[True] = True) -> list[Definition]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def definitions(self, *, data: bool) -> list[str] | list[Definition]: ...\n\n    def definitions(self, *, data: bool = False) -> list[str] | list[Definition]:\n        \"\"\"Return the list of definitions for the synset.\n\n        If the *data* argument is :python:`False` (the default), the\n        definitions are returned as :class:`str` objects. If it is\n        :python:`True`, :class:`wn.Definition` objects are used instead.\n\n        Example:\n\n            >>> wn.synsets(\"tea\", pos=\"n\")[0].definitions()\n            ['a beverage made by steeping tea leaves in water']\n            >>> wn.synsets(\"tea\", pos=\"n\")[0].definitions(data=True)\n            [Definition(text='a beverage made by steeping tea leaves in water',\n              language=None, source_sense_id=None)]\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        defns = get_definitions(self.id, lexicons)\n        if data:\n            return [\n                Definition(\n                    text,\n                    language=lang,\n                    source_sense_id=sid,\n                    _lexicon=lex,\n                    _metadata=meta,\n                )\n                for text, lang, sid, lex, meta in defns\n            ]\n        else:\n            return [text for text, *_ in defns]\n\n    @overload\n    def examples(self, *, data: Literal[False] = False) -> list[str]: ...\n    @overload\n    def examples(self, *, data: Literal[True] = True) -> list[Example]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def examples(self, *, data: bool) -> list[str] | list[Example]: ...\n\n    def examples(self, *, data: bool = False) -> list[str] | list[Example]:\n        \"\"\"Return the list of examples for the synset.\n\n        If the *data* argument is :python:`False` (the default), the\n        examples are returned as :class:`str` types. If it is\n        :python:`True`, :class:`wn.Example` objects are used instead.\n\n        Example:\n\n            >>> wn.synsets(\"orbital\", pos=\"a\")[0].examples()\n            ['\"orbital revolution\"', '\"orbital velocity\"']\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        exs = get_examples(self.id, \"synsets\", lexicons)\n        if data:\n            return [\n                Example(text, language=lang, _lexicon=lex, _metadata=meta)\n                for text, lang, lex, meta in exs\n            ]\n        else:\n            return [text for text, *_ in exs]\n\n    def senses(self) -> list[Sense]:\n        \"\"\"Return the list of sense members of the synset.\n\n        Example:\n\n            >>> wn.synsets(\"umbrella\", pos=\"n\")[0].senses()\n            [Sense('ewn-umbrella-n-04514450-01')]\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        iterable = get_synset_members(self.id, lexicons)\n        return [Sense(*sense_data, _lexconf=self._lexconf) for sense_data in iterable]\n\n    def lexicalized(self) -> bool:\n        \"\"\"Return True if the synset is lexicalized.\"\"\"\n        return get_lexicalized(self.id, self._lexicon, \"synsets\")\n\n    def lexfile(self) -> str | None:\n        \"\"\"Return the lexicographer file name for this synset, if any.\"\"\"\n        return get_lexfile(self.id, self._lexicon)\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the synset's metadata.\"\"\"\n        return get_metadata(self.id, self._lexicon, \"synsets\")\n\n    def words(self) -> list[Word]:\n        \"\"\"Return the list of words linked by the synset's senses.\n\n        Example:\n\n            >>> wn.synsets(\"exclusive\", pos=\"n\")[0].words()\n            [Word('ewn-scoop-n'), Word('ewn-exclusive-n')]\n\n        \"\"\"\n        return [sense.word() for sense in self.senses()]\n\n    @overload\n    def lemmas(self, *, data: Literal[False] = False) -> list[str]: ...\n    @overload\n    def lemmas(self, *, data: Literal[True] = True) -> list[Form]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def lemmas(self, *, data: bool) -> list[str] | list[Form]: ...\n\n    def lemmas(self, *, data: bool = False) -> list[str] | list[Form]:\n        \"\"\"Return the list of lemmas of words for the synset.\n\n        If the *data* argument is :python:`False` (the default), the\n        lemmas are returned as :class:`str` types. If it is\n        :python:`True`, :class:`wn.Form` objects are used instead.\n\n        Example:\n\n            >>> wn.synsets(\"exclusive\", pos=\"n\")[0].lemmas()\n            ['scoop', 'exclusive']\n            >>> wn.synsets(\"exclusive\", pos=\"n\")[0].lemmas(data=True)\n            [Form(value='scoop'), Form(value='exclusive')]\n\n        \"\"\"\n        # exploded instead of data=data due to mypy issue\n        # https://github.com/python/mypy/issues/14764\n        if data:\n            return [w.lemma(data=True) for w in self.words()]\n        else:\n            return [w.lemma(data=False) for w in self.words()]\n\n    @overload\n    def relations(\n        self, *args: str, data: Literal[False] = False\n    ) -> dict[str, list[Synset]]: ...\n    @overload\n    def relations(\n        self, *args: str, data: Literal[True] = True\n    ) -> dict[Relation, Synset]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Synset]] | dict[Relation, Synset]: ...\n\n    def relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Synset]] | dict[Relation, Synset]:\n        \"\"\"Return a mapping of synset relations.\n\n        One or more relation names may be given as positional\n        arguments to restrict the relations returned. If no such\n        arguments are given, all relations starting from the synset\n        are returned.\n\n        If the *data* argument is :python:`False` (default), the\n        returned object maps from the relation name (a :class:`str`)\n        to a list of :class:`Synset` objects. If *data* is\n        :python:`True`, it instead maps from a :class:`Relation` to\n        a single :class:`Synset`.\n\n        See :meth:`get_related` for getting a flat list of related\n        synsets.\n\n        Example:\n\n            >>> button_rels = wn.synsets(\"button\")[0].relations()\n            >>> for relname, sslist in button_rels.items():\n            ...     print(relname, [ss.lemmas() for ss in sslist])\n            hypernym [['fixing', 'holdfast', 'fastener', 'fastening']]\n            hyponym [['coat button'], ['shirt button']]\n\n        \"\"\"\n        if data:\n            return dict(self._iter_relations())\n        else:\n            # inner dict is used as an order-preserving set\n            relmap: dict[str, dict[Synset, bool]] = {}\n            for relation, synset in self._iter_relations(*args):\n                relmap.setdefault(relation.name, {})[synset] = True\n                # now convert inner dicts to lists\n            return {relname: list(ss_dict) for relname, ss_dict in relmap.items()}\n\n    def get_related(self, *args: str) -> list[Synset]:\n        \"\"\"Return the list of related synsets.\n\n        One or more relation names may be given as positional\n        arguments to restrict the relations returned. If no such\n        arguments are given, all relations starting from the synset\n        are returned.\n\n        This method does not preserve the relation names that lead to\n        the related synsets. For a mapping of relation names to\n        related synsets, see :meth:`relations`.\n\n        Example:\n\n            >>> fulcrum = wn.synsets(\"fulcrum\")[0]\n            >>> [ss.lemmas() for ss in fulcrum.get_related()]\n            [['pin', 'pivot'], ['lever']]\n        \"\"\"\n        return unique_list(synset for _, synset in self._iter_relations(*args))\n\n    def _iter_relations(self, *args: str) -> Iterator[tuple[Relation, Synset]]:\n        # first get relations from the current lexicon(s)\n        yield from self._iter_local_relations(args)\n        # then attempt to expand via ILI\n        if self._ili is not None and self._lexconf.expands:\n            yield from self._iter_expanded_relations(args)\n\n    def _iter_local_relations(\n        self,\n        args: Sequence[str],\n    ) -> Iterator[tuple[Relation, Synset]]:\n        _lexconf = self._lexconf\n        lexicons = self._get_lexicons()\n        iterable = get_synset_relations(\n            self.id, self._lexicon, args, lexicons, lexicons\n        )\n        for relname, rellex, metadata, _, ssid, pos, ili, tgtlex in iterable:\n            synset_rel = Relation(relname, self.id, ssid, rellex, metadata=metadata)\n            synset = Synset(\n                ssid,\n                pos,\n                ili,\n                _lexicon=tgtlex,\n                _lexconf=_lexconf,\n            )\n            yield synset_rel, synset\n\n    def _iter_expanded_relations(\n        self,\n        args: Sequence[str],\n    ) -> Iterator[tuple[Relation, Synset]]:\n        assert self._ili is not None, \"cannot get expanded relations without an ILI\"\n        _lexconf = self._lexconf\n        lexicons = self._get_lexicons()\n\n        iterable = get_expanded_synset_relations(self._ili, args, _lexconf.expands)\n        for relname, lexicon, metadata, srcid, ssid, _, ili, *_ in iterable:\n            if ili is None:\n                continue\n            synset_rel = Relation(relname, srcid, ssid, lexicon, metadata=metadata)\n            local_ss_rows = list(get_synsets_for_ilis([ili], lexicons=lexicons))\n\n            if local_ss_rows:\n                for row in local_ss_rows:\n                    yield synset_rel, Synset(*row, _lexconf=_lexconf)\n            else:\n                synset = Synset.empty(\n                    id=_INFERRED_SYNSET,\n                    ili=ili,\n                    _lexicon=self._lexicon,\n                    _lexconf=_lexconf,\n                )\n                yield synset_rel, synset\n\n    def hypernym_paths(self, simulate_root: bool = False) -> list[list[Synset]]:\n        \"\"\"Return the list of hypernym paths to a root synset.\"\"\"\n        return taxonomy.hypernym_paths(self, simulate_root=simulate_root)\n\n    def min_depth(self, simulate_root: bool = False) -> int:\n        \"\"\"Return the minimum taxonomy depth of the synset.\"\"\"\n        return taxonomy.min_depth(self, simulate_root=simulate_root)\n\n    def max_depth(self, simulate_root: bool = False) -> int:\n        \"\"\"Return the maximum taxonomy depth of the synset.\"\"\"\n        return taxonomy.max_depth(self, simulate_root=simulate_root)\n\n    def shortest_path(self, other: Synset, simulate_root: bool = False) -> list[Synset]:\n        \"\"\"Return the shortest path from the synset to the *other* synset.\"\"\"\n        return taxonomy.shortest_path(self, other, simulate_root=simulate_root)\n\n    def common_hypernyms(\n        self, other: Synset, simulate_root: bool = False\n    ) -> list[Synset]:\n        \"\"\"Return the common hypernyms for the current and *other* synsets.\"\"\"\n        return taxonomy.common_hypernyms(self, other, simulate_root=simulate_root)\n\n    def lowest_common_hypernyms(\n        self, other: Synset, simulate_root: bool = False\n    ) -> list[Synset]:\n        \"\"\"Return the common hypernyms furthest from the root.\"\"\"\n        return taxonomy.lowest_common_hypernyms(\n            self, other, simulate_root=simulate_root\n        )\n\n    def holonyms(self) -> list[Synset]:\n        \"\"\"Return the list of synsets related by any holonym relation.\n\n        Any of the following relations are traversed: ``holonym``,\n        ``holo_location``, ``holo_member``, ``holo_part``,\n        ``holo_portion``, ``holo_substance``.\n\n        \"\"\"\n        return self.get_related(\n            \"holonym\",\n            \"holo_location\",\n            \"holo_member\",\n            \"holo_part\",\n            \"holo_portion\",\n            \"holo_substance\",\n        )\n\n    def meronyms(self) -> list[Synset]:\n        \"\"\"Return the list of synsets related by any meronym relation.\n\n        Any of the following relations are traversed: ``meronym``,\n        ``mero_location``, ``mero_member``, ``mero_part``,\n        ``mero_portion``, ``mero_substance``.\n\n        \"\"\"\n        return self.get_related(\n            \"meronym\",\n            \"mero_location\",\n            \"mero_member\",\n            \"mero_part\",\n            \"mero_portion\",\n            \"mero_substance\",\n        )\n\n    def hypernyms(self) -> list[Synset]:\n        \"\"\"Return the list of synsets related by any hypernym relation.\n\n        Both the ``hypernym`` and ``instance_hypernym`` relations are\n        traversed.\n\n        \"\"\"\n        return self.get_related(\"hypernym\", \"instance_hypernym\")\n\n    def hyponyms(self) -> list[Synset]:\n        \"\"\"Return the list of synsets related by any hyponym relation.\n\n        Both the ``hyponym`` and ``instance_hyponym`` relations are\n        traversed.\n\n        \"\"\"\n        return self.get_related(\"hyponym\", \"instance_hyponym\")\n\n    def translate(\n        self, lexicon: str | None = None, *, lang: str | None = None\n    ) -> list[Synset]:\n        \"\"\"Return a list of translated synsets.\n\n        Arguments:\n            lexicon: lexicon specifier of translated synsets\n            lang: BCP-47 language code of translated synsets\n\n        Example:\n\n            >>> es = wn.synsets(\"araña\", lang=\"es\")[0]\n            >>> en = es.translate(lexicon=\"ewn\")[0]\n            >>> en.lemmas()\n            ['spider']\n\n        \"\"\"\n        ili = self._ili\n        if not ili:\n            return []\n        lexicons = resolve_lexicon_specifiers(lexicon=(lexicon or \"*\"), lang=lang)\n        lexconf = LexiconConfiguration(\n            lexicons=tuple(lexicons), expands=(), default_mode=False\n        )\n        return [\n            Synset(*data, _lexconf=lexconf)\n            for data in get_synsets_for_ilis((ili,), lexicons)\n        ]\n\n\n@dataclass(frozen=True, slots=True)\nclass Count(LexiconElementWithMetadata):\n    \"\"\"A count of sense occurrences in some corpus.\"\"\"\n\n    __module__ = \"wn\"\n\n    value: int\n    _lexicon: str = \"\"\n    _metadata: Metadata | None = field(default=None, repr=False, compare=False)\n\n\nclass Sense(_Relatable):\n    \"\"\"Class for modeling wordnet senses.\"\"\"\n\n    __slots__ = \"_entry_id\", \"_synset_id\"\n    __module__ = \"wn\"\n\n    _ENTITY_TYPE = _EntityType.SENSES\n\n    def __init__(\n        self,\n        id: str,\n        entry_id: str,\n        synset_id: str,\n        _lexicon: str = \"\",\n        _lexconf: LexiconConfiguration = _EMPTY_LEXCONFIG,\n    ):\n        super().__init__(id=id, _lexicon=_lexicon, _lexconf=_lexconf)\n        self._entry_id = entry_id\n        self._synset_id = synset_id\n\n    def __repr__(self) -> str:\n        return f\"Sense({self.id!r})\"\n\n    def word(self) -> Word:\n        \"\"\"Return the word of the sense.\n\n        Example:\n\n            >>> wn.senses(\"spigot\")[0].word()\n            Word('pwn-spigot-n')\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        id, pos, lex = next(find_entries(id=self._entry_id, lexicons=lexicons))\n        return Word(id, pos, _lexicon=lex, _lexconf=self._lexconf)\n\n    def synset(self) -> Synset:\n        \"\"\"Return the synset of the sense.\n\n        Example:\n\n            >>> wn.senses(\"spigot\")[0].synset()\n            Synset('pwn-03325088-n')\n\n        \"\"\"\n        lexicons = self._get_lexicons()\n        id, pos, ili, lex = next(find_synsets(id=self._synset_id, lexicons=lexicons))\n        return Synset(id, pos, ili=ili, _lexicon=lex, _lexconf=self._lexconf)\n\n    @overload\n    def examples(self, *, data: Literal[False] = False) -> list[str]: ...\n    @overload\n    def examples(self, *, data: Literal[True] = True) -> list[Example]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def examples(self, *, data: bool) -> list[str] | list[Example]: ...\n\n    def examples(self, *, data: bool = False) -> list[str] | list[Example]:\n        \"\"\"Return the list of examples for the sense.\n\n        If the *data* argument is :python:`False` (the default), the\n        examples are returned as :class:`str` types. If it is\n        :python:`True`, :class:`wn.Example` objects are used instead.\n        \"\"\"\n        lexicons = self._get_lexicons()\n        exs = get_examples(self.id, \"senses\", lexicons)\n        if data:\n            return [\n                Example(text, language=lang, _lexicon=lex, _metadata=meta)\n                for text, lang, lex, meta in exs\n            ]\n        else:\n            return [text for text, *_ in exs]\n\n    def lexicalized(self) -> bool:\n        \"\"\"Return True if the sense is lexicalized.\"\"\"\n        return get_lexicalized(self.id, self._lexicon, \"senses\")\n\n    def adjposition(self) -> str | None:\n        \"\"\"Return the adjective position of the sense.\n\n        Values include :python:`\"a\"` (attributive), :python:`\"p\"`\n        (predicative), and :python:`\"ip\"` (immediate\n        postnominal). Note that this is only relevant for adjectival\n        senses. Senses for other parts of speech, or for adjectives\n        that are not annotated with this feature, will return\n        ``None``.\n\n        \"\"\"\n        return get_adjposition(self.id, self._lexicon)\n\n    def frames(self) -> list[str]:\n        \"\"\"Return the list of subcategorization frames for the sense.\"\"\"\n        lexicons = self._get_lexicons()\n        return get_syntactic_behaviours(self.id, lexicons)\n\n    @overload\n    def counts(self, *, data: Literal[False] = False) -> list[int]: ...\n    @overload\n    def counts(self, *, data: Literal[True] = True) -> list[Count]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def counts(self, *, data: bool) -> list[int] | list[Count]: ...\n\n    def counts(self, *, data: bool = False) -> list[int] | list[Count]:\n        \"\"\"Return the corpus counts stored for this sense.\"\"\"\n        lexicons = self._get_lexicons()\n        count_data = list(get_sense_counts(self.id, lexicons))\n        if data:\n            return [\n                Count(value, _lexicon=lex, _metadata=metadata)\n                for value, lex, metadata in count_data\n            ]\n        else:\n            return [value for value, *_ in count_data]\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the sense's metadata.\"\"\"\n        return get_metadata(self.id, self._lexicon, \"senses\")\n\n    @overload\n    def relations(\n        self, *args: str, data: Literal[False] = False\n    ) -> dict[str, list[Sense]]: ...\n    @overload\n    def relations(\n        self, *args: str, data: Literal[True] = True\n    ) -> dict[Relation, Sense]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Sense]] | dict[Relation, Sense]: ...\n\n    def relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Sense]] | dict[Relation, Sense]:\n        \"\"\"Return a mapping of relation names to lists of senses.\n\n        One or more relation names may be given as positional\n        arguments to restrict the relations returned. If no such\n        arguments are given, all relations starting from the sense\n        are returned.\n\n        If the *data* argument is :python:`False` (default), the\n        returned object maps from the relation name (a :class:`str`)\n        to a list of :class:`Sense` objects. If *data* is\n        :python:`True`, it instead maps from a :class:`Relation` to\n        a single :class:`Sense`.\n\n        See :meth:`get_related` for getting a flat list of related\n        senses.\n\n        \"\"\"\n        if data:\n            return dict(self._iter_sense_relations())\n        else:\n            # inner dict is used as an order-preserving set\n            relmap: dict[str, dict[Sense, bool]] = {}\n            for relation, sense in self._iter_sense_relations(*args):\n                relmap.setdefault(relation.name, {})[sense] = True\n            # now convert inner dicts to lists\n            return {relname: list(s_dict) for relname, s_dict in relmap.items()}\n\n    @overload\n    def synset_relations(\n        self, *args: str, data: Literal[False] = False\n    ) -> dict[str, list[Synset]]: ...\n    @overload\n    def synset_relations(\n        self, *args: str, data: Literal[True] = True\n    ) -> dict[Relation, Synset]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def synset_relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Synset]] | dict[Relation, Synset]: ...\n\n    def synset_relations(\n        self, *args: str, data: bool = False\n    ) -> dict[str, list[Synset]] | dict[Relation, Synset]:\n        \"\"\"Return a mapping of relation names to lists of synsets.\n\n        One or more relation names may be given as positional\n        arguments to restrict the relations returned. If no such\n        arguments are given, all relations starting from the sense\n        are returned.\n\n        If the *data* argument is :python:`False` (default), the\n        returned object maps from the relation name (a :class:`str`)\n        to a list of :class:`Synset` objects. If *data* is\n        :python:`True`, it instead maps from a :class:`Relation` to\n        a single :class:`Synset`.\n\n        See :meth:`get_related_synsets` for getting a flat list of\n        related synsets.\n\n        \"\"\"\n        if data:\n            return dict(self._iter_sense_synset_relations())\n        else:\n            # inner dict is used as an order-preserving set\n            relmap: dict[str, dict[Synset, bool]] = {}\n            for relation, synset in self._iter_sense_synset_relations(*args):\n                relmap.setdefault(relation.name, {})[synset] = True\n            # now convert inner dicts to lists\n            return {relname: list(ss_dict) for relname, ss_dict in relmap.items()}\n\n    def get_related(self, *args: str) -> list[Sense]:\n        \"\"\"Return a list of related senses.\n\n        One or more relation types should be passed as arguments which\n        determine the kind of relations returned.\n\n        Example:\n\n            >>> physics = wn.senses(\"physics\", lexicon=\"ewn\")[0]\n            >>> for sense in physics.get_related(\"has_domain_topic\"):\n            ...     print(sense.word().lemma())\n            coherent\n            chaotic\n            incoherent\n\n        \"\"\"\n        return unique_list(sense for _, sense in self._iter_sense_relations(*args))\n\n    def get_related_synsets(self, *args: str) -> list[Synset]:\n        \"\"\"Return a list of related synsets.\"\"\"\n        return unique_list(\n            synset for _, synset in self._iter_sense_synset_relations(*args)\n        )\n\n    def _iter_sense_relations(self, *args: str) -> Iterator[tuple[Relation, Sense]]:\n        lexicons = self._get_lexicons()\n        iterable = get_sense_relations(self.id, args, lexicons, lexicons)\n        for relname, lexicon, metadata, sid, eid, ssid, lexid in iterable:\n            relation = Relation(relname, self.id, sid, lexicon, metadata=metadata)\n            sense = Sense(sid, eid, ssid, lexid, _lexconf=self._lexconf)\n            yield relation, sense\n\n    def _iter_sense_synset_relations(\n        self,\n        *args: str,\n    ) -> Iterator[tuple[Relation, Synset]]:\n        lexicons = self._get_lexicons()\n        iterable = get_sense_synset_relations(self.id, args, lexicons, lexicons)\n        for relname, lexicon, metadata, _, ssid, pos, ili, lexid in iterable:\n            relation = Relation(relname, self.id, ssid, lexicon, metadata=metadata)\n            synset = Synset(ssid, pos, ili, lexid, _lexconf=self._lexconf)\n            yield relation, synset\n\n    def translate(\n        self, lexicon: str | None = None, *, lang: str | None = None\n    ) -> list[Sense]:\n        \"\"\"Return a list of translated senses.\n\n        Arguments:\n            lexicon: lexicon specifier of translated senses\n            lang: BCP-47 language code of translated senses\n\n        Example:\n\n            >>> en = wn.senses(\"petiole\", lang=\"en\")[0]\n            >>> pt = en.translate(lang=\"pt\")[0]\n            >>> pt.word().lemma()\n            'pecíolo'\n\n        \"\"\"\n        synset = self.synset()\n        return [\n            t_sense\n            for t_synset in synset.translate(lang=lang, lexicon=lexicon)\n            for t_sense in t_synset.senses()\n        ]\n"
  },
  {
    "path": "wn/_db.py",
    "content": "\"\"\"\nStorage back-end interface.\n\"\"\"\n\nimport json\nimport logging\nimport sqlite3\nfrom importlib import resources\nfrom pathlib import Path\n\nfrom wn._config import config\nfrom wn._exceptions import DatabaseError\nfrom wn._types import AnyPath\nfrom wn._util import format_lexicon_specifier, short_hash\n\nlogger = logging.getLogger(\"wn\")\n\n\n# Module Constants\n\nDEBUG = False\n\n# This stores hashes of the schema to check for version differences.\n# When the schema changes, the hash will change. If the new hash is\n# not added here, the 'test_schema_compatibility' test will fail. It\n# is the developer's responsibility to only add compatible schema\n# hashes here. If the schema change is not backwards-compatible, then\n# clear all old hashes and only put the latest hash here. A hash can\n# be generated like this:\n#\n# >>> import sqlite3\n# >>> import wn\n# >>> conn = sqlite3.connect(wn.config.database_path)\n# >>> wn._db.schema_hash(conn)\n#\nCOMPATIBLE_SCHEMA_HASHES = {\n    \"f439c9bd27f809f64ee42896fb0fc20c5d00fd99\",\n}\n\n\n# Optional metadata is stored as a JSON string\n\n\ndef _adapt_dict(d: dict) -> bytes:\n    return json.dumps(d).encode(\"utf-8\")\n\n\ndef _convert_dict(s: bytes) -> dict:\n    return json.loads(s)\n\n\ndef _convert_boolean(s: bytes) -> bool:\n    return bool(int(s))\n\n\nsqlite3.register_adapter(dict, _adapt_dict)\nsqlite3.register_converter(\"meta\", _convert_dict)\nsqlite3.register_converter(\"boolean\", _convert_boolean)\n\n\n# The pool is a cache of open connections. Unless the database path is\n# changed, there should only be zero or one.\npool: dict[AnyPath, sqlite3.Connection] = {}\n\n\n# The connect() function should be used for all connections\n\n\ndef connect(check_schema: bool = True) -> sqlite3.Connection:\n    dbpath = config.database_path\n    if dbpath not in pool:\n        if not config.data_directory.exists():\n            config.data_directory.mkdir(parents=True, exist_ok=True)\n        initialized = dbpath.is_file()\n        conn = sqlite3.connect(\n            str(dbpath),\n            detect_types=sqlite3.PARSE_DECLTYPES,\n            check_same_thread=not config.allow_multithreading,\n        )\n        # foreign key support needs to be enabled for each connection\n        conn.execute(\"PRAGMA foreign_keys = ON\")\n        if DEBUG:\n            conn.set_trace_callback(print)\n        if not initialized:\n            logger.info(\"initializing database: %s\", dbpath)\n            _init_db(conn)\n        if check_schema:\n            _check_schema_compatibility(conn, dbpath)\n\n        pool[dbpath] = conn\n    return pool[dbpath]\n\n\ndef _init_db(conn: sqlite3.Connection) -> None:\n    schema = (resources.files(\"wn\") / \"schema.sql\").read_text()\n    conn.executescript(schema)\n    with conn:\n        conn.executemany(\n            \"INSERT INTO ili_statuses VALUES (null,?)\",\n            [(\"presupposed\",), (\"proposed\",)],\n        )\n\n\ndef _check_schema_compatibility(conn: sqlite3.Connection, dbpath: Path) -> None:\n    hash = schema_hash(conn)\n\n    # if the hash is known, then we're all good here\n    if hash in COMPATIBLE_SCHEMA_HASHES:\n        return\n\n    logger.debug(\"current schema hash:\\n  %s\", hash)\n    logger.debug(\n        \"compatible schema hashes:\\n  %s\", \"\\n  \".join(COMPATIBLE_SCHEMA_HASHES)\n    )\n    # otherwise, try to raise a helpful error message\n    msg = \"Wn's schema has changed and is no longer compatible with the database.\"\n    try:\n        specs = list_lexicons_safe(conn)\n    except DatabaseError as exc:\n        raise DatabaseError(msg) from exc\n    if specs:\n        installed = \"\\n  \".join(specs)\n        msg += (\n            f\"\\nLexicons currently installed:\\n  {installed}\"\n            \"\\nRun wn.reset_database(rebuild=True) to rebuild the database.\"\n        )\n    else:\n        msg += (\n            \"\\nNo lexicons are currently installed.\"\n            \"\\nRun wn.reset_database() to re-initialize the database.\"\n        )\n    raise DatabaseError(msg)\n\n\ndef list_lexicons_safe(conn: sqlite3.Connection | None = None) -> list[str]:\n    \"\"\"Return the list of lexicon specifiers for added lexicons.\"\"\"\n    if conn is None:\n        conn = connect(check_schema=False)\n    try:\n        specs = conn.execute(\"SELECT id, version FROM lexicons\").fetchall()\n    except sqlite3.OperationalError as exc:\n        raise DatabaseError(\"could not list lexicons\") from exc\n    return [format_lexicon_specifier(id, ver) for id, ver in specs]\n\n\ndef schema_hash(conn: sqlite3.Connection) -> str:\n    query = \"\"\"\n        SELECT sql\n          FROM sqlite_schema\n         WHERE NOT sql ISNULL\n           AND name NOT LIKE 'sqlite_stat%'\n         ORDER BY sql ASC\n    \"\"\"\n    schema = \"\\n\\n\".join(row[0] for row in conn.execute(query))\n    return short_hash(schema)\n\n\ndef clear_connections() -> None:\n    \"\"\"Close and delete any open database connections.\"\"\"\n    for path in list(pool):\n        pool[path].close()\n        del pool[path]\n"
  },
  {
    "path": "wn/_download.py",
    "content": "import logging\nfrom collections.abc import Sequence\nfrom pathlib import Path\n\nimport httpx\n\nfrom wn._add import add as add_to_db\nfrom wn._config import config\nfrom wn._exceptions import Error\nfrom wn._util import is_url\nfrom wn.util import ProgressBar, ProgressHandler\n\nCHUNK_SIZE = 8 * 1024  # how many KB to read at a time\nTIMEOUT = 10  # number of seconds to wait for a server response\n\n\nlogger = logging.getLogger(\"wn\")\n\n\ndef download(\n    project_or_url: str,\n    add: bool = True,\n    progress_handler: type[ProgressHandler] | None = ProgressBar,\n) -> Path:\n    \"\"\"Download the resource specified by *project_or_url*.\n\n    First the URL of the resource is determined and then, depending on\n    the parameters, the resource is downloaded and added to the\n    database.  The function then returns the path of the cached file.\n\n    If *project_or_url* starts with `'http://'` or `'https://'`, then\n    it is taken to be the URL for the resource. Otherwise,\n    *project_or_url* is taken as a :ref:`project specifier\n    <lexicon-specifiers>` and the URL is taken from a matching entry\n    in Wn's project index. If no project matches the specifier,\n    :exc:`wn.Error` is raised.\n\n    If the URL has been downloaded and cached before, the cached file\n    is used. Otherwise the URL is retrieved and stored in the cache.\n\n    If the *add* paramter is ``True`` (default), the downloaded\n    resource is added to the database.\n\n    >>> wn.download(\"ewn:2020\")\n    Added ewn:2020 (English WordNet)\n\n    The *progress_handler* parameter takes a subclass of\n    :class:`wn.util.ProgressHandler`. An instance of the class will be\n    created, used, and closed by this function.\n\n    \"\"\"\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n    progress = progress_handler(message=\"Download\", unit=\" bytes\")\n\n    cache_path, urls = _get_cache_path_and_urls(project_or_url)\n\n    try:\n        if cache_path and cache_path.exists():\n            progress.flash(f\"Cached file found: {cache_path!s}\")\n            path = cache_path\n        elif urls:\n            path = _download(urls, progress)\n        else:\n            raise Error(\"no urls to download\")\n    finally:\n        progress.close()\n\n    if add:\n        try:\n            add_to_db(path, progress_handler=progress_handler)\n        except Error as exc:\n            raise Error(\n                f\"could not add downloaded file: {path}\\n  You might try \"\n                \"deleting the cached file and trying the download again.\"\n            ) from exc\n\n    return path\n\n\ndef _get_cache_path_and_urls(project_or_url: str) -> tuple[Path | None, list[str]]:\n    if is_url(project_or_url):\n        return config.get_cache_path(project_or_url), [project_or_url]\n    else:\n        info = config.get_project_info(project_or_url)\n        return info.get(\"cache\"), info[\"resource_urls\"]\n\n\ndef _download(urls: Sequence[str], progress: ProgressHandler) -> Path:\n    client = httpx.Client(timeout=TIMEOUT, follow_redirects=True)\n    try:\n        for i, url in enumerate(urls, 1):\n            path = config.get_cache_path(url)\n            logger.info(\"download url: %s\", url)\n            logger.info(\"download cache path: %s\", path)\n            try:\n                with open(path, \"wb\") as f:\n                    progress.set(status=\"Requesting\", count=0)\n                    with client.stream(\"GET\", url) as response:\n                        response.raise_for_status()\n                        total = int(response.headers.get(\"Content-Length\", 0))\n                        count = response.num_bytes_downloaded\n                        progress.set(count=count, total=total, status=\"Receiving\")\n                        for chunk in response.iter_bytes(chunk_size=CHUNK_SIZE):\n                            if chunk:\n                                f.write(chunk)\n                            progress.update(response.num_bytes_downloaded - count)\n                            count = response.num_bytes_downloaded\n                        progress.set(status=\"Complete\")\n            except httpx.RequestError as exc:\n                path.unlink(missing_ok=True)\n                last_count = progress.kwargs[\"count\"]\n                if i == len(urls):\n                    raise Error(f\"download failed at {last_count} bytes\") from exc\n                else:\n                    logger.info(\n                        \"download failed at %d bytes; trying next url\", last_count\n                    )\n            else:\n                break  # success\n\n    except KeyboardInterrupt as exc:\n        path.unlink(missing_ok=True)\n        last_count = progress.kwargs[\"count\"]\n        raise Error(f\"download cancelled at {last_count} bytes\") from exc\n    except Exception:\n        path.unlink(missing_ok=True)\n        raise\n    finally:\n        client.close()\n\n    return path\n"
  },
  {
    "path": "wn/_exceptions.py",
    "content": "class Error(Exception):\n    \"\"\"Generic error class for invalid wordnet operations.\"\"\"\n\n    # reset the module so the user sees the public name\n    __module__ = \"wn\"\n\n\nclass DatabaseError(Error):\n    \"\"\"Error class for issues with the database.\"\"\"\n\n    __module__ = \"wn\"\n\n\nclass ConfigurationError(Error):\n    \"\"\"Raised on invalid configurations.\"\"\"\n\n    __module__ = \"wn\"\n\n\nclass ProjectError(Error):\n    \"\"\"Raised when a project is not found or on errors defined in the index.\"\"\"\n\n    __module__ = \"wn\"\n\n\nclass WnWarning(Warning):\n    \"\"\"Generic warning class for dubious wordnet operations.\"\"\"\n\n    # reset the module so the user sees the public name\n    __module__ = \"wn\"\n"
  },
  {
    "path": "wn/_export.py",
    "content": "from collections.abc import Iterator, Sequence\nfrom typing import Literal, NamedTuple, overload\n\nfrom wn import lmf\nfrom wn._exceptions import Error\nfrom wn._lexicon import Lexicon\nfrom wn._queries import (\n    Form,\n    Pronunciation,\n    Sense,\n    Tag,\n    find_entries,\n    find_proposed_ilis,\n    find_senses,\n    find_synsets,\n    find_syntactic_behaviours,\n    get_adjposition,\n    get_definitions,\n    get_entry_forms,\n    get_entry_index,\n    get_entry_senses,\n    get_examples,\n    get_lexfile,\n    get_lexicalized,\n    get_lexicon_dependencies,\n    get_metadata,\n    get_proposed_ili_metadata,\n    get_relation_targets,\n    get_sense_counts,\n    get_sense_n,\n    get_sense_relations,\n    get_sense_synset_relations,\n    get_synset_members,\n    get_synset_relations,\n)\nfrom wn._types import AnyPath, VersionInfo\nfrom wn._util import split_lexicon_specifier, version_info\n\nPROPOSED_ILI_ID = \"in\"  # special case for proposed ILIs\n\n\ndef export(\n    lexicons: Sequence[Lexicon], destination: AnyPath, version: str = \"1.4\"\n) -> None:\n    \"\"\"Export lexicons from the database to a WN-LMF file.\n\n    More than one lexicon may be exported in the same file, subject to\n    these conditions:\n\n    - identifiers on wordnet entities must be unique in all lexicons\n    - lexicons extensions may not be exported with their dependents\n\n    >>> w = wn.Wordnet(lexicon=\"omw-cmn:1.4 omw-zsm:1.4\")\n    >>> wn.export(w.lexicons(), \"cmn-zsm.xml\")\n\n    Args:\n        lexicons: sequence of :class:`wn.Lexicon` objects\n        destination: path to the destination file\n        version: LMF version string\n\n    \"\"\"\n    _precheck(lexicons)\n    exporter = _LMFExporter(version)\n    resource: lmf.LexicalResource = {\n        \"lmf_version\": version,\n        \"lexicons\": [exporter.export(lexicon) for lexicon in lexicons],\n    }\n    lmf.dump(resource, destination)\n\n\ndef _precheck(lexicons: Sequence[Lexicon]) -> None:\n    all_ids: set[str] = set()\n    for lex in lexicons:\n        lexspecs = (lex.specifier(),)\n        idset = {lex.id}\n        idset.update(row[0] for row in find_entries(lexicons=lexspecs))\n        idset.update(row[0] for row in find_senses(lexicons=lexspecs))\n        idset.update(row[0] for row in find_synsets(lexicons=lexspecs))\n        # TODO: syntactic behaviours\n        if all_ids.intersection(idset):\n            raise Error(\"cannot export: non-unique identifiers in lexicons\")\n        all_ids |= idset\n\n\n_SBMap = dict[str, list[tuple[str, str]]]\n\n\nclass _LexSpecs(NamedTuple):\n    primary: str  # lexicon or lexicon extension being exported\n    base: str  # base lexicon (when primary is an extension)\n\n\nclass _LMFExporter:\n    version: VersionInfo\n    # ids: set[str]\n    # The following are reset for each lexicon that is exported\n    lexspecs: _LexSpecs\n    sbmap: _SBMap\n    external_sense_ids: set[str]  # necessary external senses\n    external_synset_ids: set[str]  # necessary external synsets\n\n    def __init__(self, version: str) -> None:\n        if version not in lmf.SUPPORTED_VERSIONS:\n            raise Error(f\"WN-LMF version not supported: {version}\")\n        self.version = version_info(version)\n        self.lexspecs = _LexSpecs(\"\", \"\")\n        self.sbmap = {}\n        self.external_sense_ids = set()\n        self.external_synset_ids = set()\n\n    def export(self, lexicon: Lexicon) -> lmf.Lexicon | lmf.LexiconExtension:\n        base = lexicon.extends()\n\n        self.lexspecs = _LexSpecs(lexicon.specifier(), base.specifier() if base else \"\")\n        self.sbmap = _build_sbmap(self.lexspecs)\n\n        if base is None:\n            return self._lexicon(lexicon)\n        else:\n            self.external_sense_ids = _get_external_sense_ids(self.lexspecs)\n            self.external_synset_ids = _get_external_synset_ids(self.lexspecs)\n            return self._lexicon_extension(lexicon, base)\n\n    def _lexicon(self, lexicon: Lexicon) -> lmf.Lexicon:\n        lex = lmf.Lexicon(\n            id=lexicon.id,\n            label=lexicon.label,\n            language=lexicon.language,\n            email=lexicon.email,\n            license=lexicon.license,\n            version=lexicon.version,\n            url=lexicon.url or \"\",\n            citation=lexicon.citation or \"\",\n            entries=list(self._entries(False)),\n            synsets=list(self._synsets(False)),\n            meta=lexicon.metadata(),\n        )\n        if self.version >= (1, 1):\n            lex[\"logo\"] = lexicon.logo or \"\"\n            lex[\"requires\"] = self._requires()\n            lex[\"frames\"] = self._syntactic_behaviours_1_1()\n\n        return lex\n\n    def _requires(self) -> list[lmf.Dependency]:\n        dependencies: list[lmf.Dependency] = []\n        for specifier, url, _ in get_lexicon_dependencies(self.lexspecs.primary):\n            id, version = split_lexicon_specifier(specifier)\n            dependencies.append(self._dependency(id, version, url))\n        return dependencies\n\n    def _dependency(self, id: str, version: str, url: str | None) -> lmf.Dependency:\n        return lmf.Dependency(id=id, version=version, url=url)\n\n    @overload\n    def _entries(\n        self, extension: Literal[True]\n    ) -> Iterator[lmf.LexicalEntry | lmf.ExternalLexicalEntry]: ...\n\n    @overload\n    def _entries(self, extension: Literal[False]) -> Iterator[lmf.LexicalEntry]: ...\n\n    def _entries(\n        self, extension: Literal[True, False]\n    ) -> Iterator[lmf.LexicalEntry | lmf.ExternalLexicalEntry]:\n        lexspec = self.lexspecs.primary\n        lexicons = self.lexspecs if extension else (lexspec,)\n        for id, pos, lex in find_entries(lexicons=lexicons):\n            if lex == lexspec:\n                yield self._entry(id, pos)\n            elif extension and (entry := self._ext_entry(id)):\n                yield entry\n\n    def _entry(self, id: str, pos: str) -> lmf.LexicalEntry:\n        lexspec = self.lexspecs.primary\n        lemma, forms = _get_entry_forms(id, self.lexspecs)\n        index = get_entry_index(id, lexspec)\n        entry = lmf.LexicalEntry(\n            id=id,\n            lemma=self._lemma(lemma, pos),\n            forms=[self._form(form) for form in forms],\n            index=index or \"\",\n            senses=list(self._senses(id, index, False)),\n            meta=self._metadata(id, \"entries\"),\n        )\n        if self.version < (1, 1):\n            # cleanup 1.1+ features\n            entry[\"lemma\"].pop(\"pronunciations\", None)\n            for form in entry[\"forms\"]:\n                form.pop(\"pronunciations\", None)\n            # 1.0 has syntactic behaviours on each entry\n            entry[\"frames\"] = self._syntactic_behaviours_1_0(entry)\n        if self.version < (1, 4) and index:\n            entry.pop(\"index\", None)\n        return entry\n\n    def _lemma(self, form: Form, pos: str) -> lmf.Lemma:\n        return lmf.Lemma(\n            writtenForm=form[0],\n            partOfSpeech=pos,\n            script=(form[2] or \"\"),\n            pronunciations=self._pronunciations(form[4]),\n            tags=self._tags(form[5]),\n        )\n\n    def _form(self, form: Form) -> lmf.Form:\n        return lmf.Form(\n            writtenForm=form[0],\n            id=form[1] or \"\",\n            script=form[2] or \"\",\n            pronunciations=self._pronunciations(form[4]),\n            tags=self._tags(form[5]),\n        )\n\n    def _pronunciations(self, prons: list[Pronunciation]) -> list[lmf.Pronunciation]:\n        lexspec = self.lexspecs.primary\n        return [\n            lmf.Pronunciation(\n                text=text,\n                variety=variety or \"\",\n                notation=notation or \"\",\n                phonemic=phonemic,\n                audio=audio or \"\",\n            )\n            for text, variety, notation, phonemic, audio, lex in prons\n            if lex == lexspec\n        ]\n\n    def _tags(self, tags: list[Tag]) -> list[lmf.Tag]:\n        lexspec = self.lexspecs.primary\n        return [\n            lmf.Tag(text=text, category=category)\n            for text, category, lex in tags\n            if lex == lexspec\n        ]\n\n    @overload\n    def _senses(\n        self, id: str, index: str | None, extension: Literal[True]\n    ) -> Iterator[lmf.Sense | lmf.ExternalSense]: ...\n\n    @overload\n    def _senses(\n        self, id: str, index: str | None, extension: Literal[False]\n    ) -> Iterator[lmf.Sense]: ...\n\n    def _senses(\n        self, id: str, index: str | None, extension: Literal[True, False]\n    ) -> Iterator[lmf.Sense | lmf.ExternalSense]:\n        lexspec = self.lexspecs.primary\n        lexicons = self.lexspecs if extension else (lexspec,)\n        for i, sense in enumerate(get_entry_senses(id, lexicons, False), 1):\n            sid, _, _, lex = sense\n            if lex == lexspec:\n                yield self._sense(sense, index, i)\n            elif extension and (ext_sense := self._ext_sense(sid)):\n                yield ext_sense\n\n    def _sense(self, sense: Sense, index: str | None, i: int) -> lmf.Sense:\n        id, _, synset_id, lexspec = sense\n        lmf_sense = lmf.Sense(\n            id=id,\n            synset=synset_id,\n            n=_get_sense_n(id, lexspec, index, i),\n            relations=self._sense_relations(id),\n            examples=self._examples(id, \"senses\"),\n            counts=self._counts(id),\n            meta=self._metadata(id, \"senses\"),\n            lexicalized=get_lexicalized(id, lexspec, \"senses\"),\n            adjposition=get_adjposition(id, lexspec) or \"\",\n        )\n        if self.version >= (1, 1) and id in self.sbmap:\n            lmf_sense[\"subcat\"] = sorted(sbid for sbid, _ in self.sbmap[id])\n        return lmf_sense\n\n    def _sense_relations(self, sense_id: str) -> list[lmf.Relation]:\n        # only get relations defined for the primary lexicon, but the\n        # relation target can be from a base lexicon\n        lexicons = (self.lexspecs.primary,)\n        relations: list[lmf.Relation] = [\n            lmf.Relation(target=id, relType=type, meta=metadata)\n            for type, _, metadata, id, *_ in get_sense_relations(\n                sense_id, \"*\", lexicons, self.lexspecs\n            )\n        ]\n        relations.extend(\n            lmf.Relation(target=id, relType=type, meta=metadata)\n            for type, _, metadata, _, id, *_ in get_sense_synset_relations(\n                sense_id, \"*\", lexicons, self.lexspecs\n            )\n        )\n        return relations\n\n    def _examples(self, id: str, table: str) -> list[lmf.Example]:\n        lexicons = (self.lexspecs.primary,)  # only for the lexicon being exported\n        return [\n            lmf.Example(text=text, language=language, meta=metadata)\n            for text, language, _, metadata in get_examples(id, table, lexicons)\n        ]\n\n    def _counts(self, sense_id: str) -> list[lmf.Count]:\n        lexicons = (self.lexspecs.primary,)  # only for the lexicon being exported\n        return [\n            lmf.Count(value=val, meta=metadata)\n            for val, _, metadata in get_sense_counts(sense_id, lexicons)\n        ]\n\n    @overload\n    def _synsets(\n        self, extension: Literal[True]\n    ) -> Iterator[lmf.Synset | lmf.ExternalSynset]: ...\n\n    @overload\n    def _synsets(self, extension: Literal[False]) -> Iterator[lmf.Synset]: ...\n\n    def _synsets(\n        self, extension: Literal[True, False]\n    ) -> Iterator[lmf.Synset | lmf.ExternalSynset]:\n        lexspec = self.lexspecs.primary\n        lexicons = self.lexspecs if extension else (lexspec,)\n        for id, pos, ili, lex in find_synsets(lexicons=lexicons):\n            if lex == lexspec:\n                yield self._synset(id, pos, ili)\n            elif extension and (ext_synset := self._ext_synset(id)):\n                yield ext_synset\n\n    def _synset(self, id: str, pos: str, ili: str) -> lmf.Synset:\n        lexspec = self.lexspecs.primary\n        lexicons = (lexspec,)\n        ilidef = self._ili_definition(id)\n        if ilidef and not ili:\n            ili = PROPOSED_ILI_ID\n        ss = lmf.Synset(\n            id=id,\n            ili=ili or \"\",\n            partOfSpeech=pos,\n            definitions=self._definitions(id),\n            relations=self._synset_relations(id, lexspec),\n            examples=self._examples(id, \"synsets\"),\n            lexicalized=get_lexicalized(id, lexspec, \"synsets\"),\n            lexfile=get_lexfile(id, lexspec) or \"\",\n            meta=self._metadata(id, \"synsets\"),\n        )\n        if ilidef:\n            ss[\"ili_definition\"] = ilidef\n        if self.version >= (1, 1):\n            ss[\"members\"] = [row[0] for row in get_synset_members(id, lexicons)]\n        return ss\n\n    def _definitions(self, synset_id: str) -> list[lmf.Definition]:\n        lexicons = (self.lexspecs.primary,)  # only for the lexicon being exported\n        return [\n            lmf.Definition(\n                text=text,\n                language=language,\n                sourceSense=sense_id,\n                meta=metadata,\n            )\n            for text, language, sense_id, _, metadata in get_definitions(\n                synset_id, lexicons\n            )\n        ]\n\n    def _ili_definition(self, synset: str) -> lmf.ILIDefinition | None:\n        lexicons = (self.lexspecs.primary,)  # only for the lexicon being exported\n        _, lexspec, defn, _ = next(\n            find_proposed_ilis(synset_id=synset, lexicons=lexicons),\n            (None, None, None, None),\n        )\n        ilidef: lmf.ILIDefinition | None = None\n        if defn:\n            meta = None\n            if lexspec is not None:\n                meta = get_proposed_ili_metadata(synset, lexspec)\n            ilidef = lmf.ILIDefinition(text=defn, meta=meta)\n        return ilidef\n\n    def _synset_relations(\n        self, synset_id: str, synset_lexicon: str\n    ) -> list[lmf.Relation]:\n        # only get relations defined for the primary lexicon, but the\n        # relation target can be from a base lexicon\n        lexicons = (self.lexspecs.primary,)\n        return [\n            lmf.Relation(target=id, relType=type, meta=metadata)\n            for type, _, metadata, _, id, *_ in get_synset_relations(\n                synset_id, synset_lexicon, \"*\", lexicons, self.lexspecs\n            )\n        ]\n\n    def _syntactic_behaviours_1_0(\n        self,\n        entry: lmf.LexicalEntry,\n    ) -> list[lmf.SyntacticBehaviour]:\n        frames: list[lmf.SyntacticBehaviour] = []\n        sense_ids = {s[\"id\"] for s in entry.get(\"senses\", [])}\n        sbs: dict[str, set[str]] = {}\n        for sid in sense_ids:\n            for _, subcat_frame in self.sbmap.get(sid, []):\n                sbs.setdefault(subcat_frame, set()).add(sid)\n        for subcat_frame, sids in sbs.items():\n            frame: lmf.SyntacticBehaviour = {\n                \"subcategorizationFrame\": subcat_frame,\n                \"senses\": sorted(sids),\n            }\n            frames.append(frame)\n        return frames\n\n    def _syntactic_behaviours_1_1(self) -> list[lmf.SyntacticBehaviour]:\n        lexicons = (self.lexspecs.primary,)  # only for the lexicon being exported\n        return [\n            lmf.SyntacticBehaviour(id=id or \"\", subcategorizationFrame=frame)\n            for id, frame, _ in find_syntactic_behaviours(lexicons=lexicons)\n        ]\n\n    def _metadata(self, id: str, table: str) -> lmf.Metadata:\n        return get_metadata(id, self.lexspecs.primary, table)\n\n    ### Lexicon Extensions ###################################################\n\n    def _lexicon_extension(\n        self, lexicon: Lexicon, base: Lexicon\n    ) -> lmf.LexiconExtension:\n        lexspec = self.lexspecs.primary\n        if self.version < (1, 1):\n            raise Error(\n                f\"cannot export lexicon extension {lexspec} with WN-LMF version < 1.1\"\n            )\n        lex = lmf.LexiconExtension(\n            id=lexicon.id,\n            label=lexicon.label,\n            language=lexicon.language,\n            email=lexicon.email,\n            license=lexicon.license,\n            version=lexicon.version,\n            url=lexicon.url or \"\",\n            citation=lexicon.citation or \"\",\n            logo=lexicon.logo or \"\",\n            extends=self._dependency(base.id, base.version, base.url),\n            requires=self._requires(),\n            entries=list(self._entries(True)),\n            synsets=list(self._synsets(True)),\n            frames=self._syntactic_behaviours_1_1(),\n            meta=lexicon.metadata(),\n        )\n        return lex\n\n    def _ext_entry(self, id: str) -> lmf.ExternalLexicalEntry | None:\n        lexspec = self.lexspecs.primary\n        lemma, forms = _get_entry_forms(id, self.lexspecs)\n        index = get_entry_index(id, lexspec)\n        ext_lemma = self._ext_lemma(lemma)\n        ext_forms = self._ext_forms(forms)\n        ext_senses = list(self._senses(id, index, True))\n        if ext_lemma or ext_forms or ext_senses:\n            return lmf.ExternalLexicalEntry(\n                external=True,\n                id=id,\n                lemma=ext_lemma,\n                forms=ext_forms,\n                senses=ext_senses,\n            )\n        return None\n\n    def _ext_lemma(self, lemma: Form) -> lmf.ExternalLemma | None:\n        _, _, _, _, pronunciations, tags = lemma\n        ext_prons = self._pronunciations(pronunciations)\n        ext_tags = self._tags(tags)\n        if ext_prons or ext_tags:\n            return lmf.ExternalLemma(\n                external=True,\n                pronunciations=ext_prons,\n                tags=ext_tags,\n            )\n        return None\n\n    def _ext_forms(self, forms: list[Form]) -> list[lmf.Form | lmf.ExternalForm]:\n        lexspec = self.lexspecs.primary\n        ext_forms: list[lmf.Form | lmf.ExternalForm] = []\n        for form in forms:\n            if form[3] == lexspec:\n                ext_forms.append(self._form(form))\n            elif ext_form := self._ext_form(form):\n                ext_forms.append(ext_form)\n        return ext_forms\n\n    def _ext_form(self, form: Form) -> lmf.ExternalForm | None:\n        value, id, _, _, prons, tags = form\n        ext_prons = self._pronunciations(prons)\n        ext_tags = self._tags(tags)\n        if ext_prons or ext_tags:\n            if not id:\n                raise Error(f\"cannot export external form {value!r} without an id\")\n            return lmf.ExternalForm(\n                external=True,\n                id=id,\n                pronunciations=ext_prons,\n                tags=ext_tags,\n            )\n        return None\n\n    def _ext_sense(self, id: str) -> lmf.ExternalSense | None:\n        ext_relations = self._sense_relations(id)\n        ext_examples = self._examples(id, \"senses\")\n        ext_counts = self._counts(id)\n        if ext_relations or ext_examples or ext_counts or id in self.external_sense_ids:\n            return lmf.ExternalSense(\n                external=True,\n                id=id,\n                relations=ext_relations,\n                examples=ext_examples,\n                counts=ext_counts,\n            )\n        return None\n\n    def _ext_synset(self, id: str) -> lmf.ExternalSynset | None:\n        ext_definitions = self._definitions(id)\n        ext_relations = self._synset_relations(id, self.lexspecs.base)\n        ext_examples = self._examples(id, \"synsets\")\n        if (\n            ext_definitions\n            or ext_relations\n            or ext_examples\n            or id in self.external_synset_ids\n        ):\n            return lmf.ExternalSynset(\n                external=True,\n                id=id,\n                definitions=ext_definitions,\n                relations=ext_relations,\n                examples=ext_examples,\n            )\n        return None\n\n\n### Helper Functions #########################################################\n\n\ndef _build_sbmap(lexicons: Sequence[str]) -> _SBMap:\n    # WN-LMF 1.0 lexicons put syntactic behaviours on lexical entries\n    # WN-LMF 1.1 lexicons use a 'subcat' IDREFS attribute\n    sbmap: _SBMap = {}\n    for sbid, frame, sids in find_syntactic_behaviours(lexicons=lexicons):\n        for sid in sids:\n            sbmap.setdefault(sid, []).append((sbid, frame))\n    return sbmap\n\n\ndef _get_entry_forms(id: str, lexicons: Sequence[str]) -> tuple[Form, list[Form]]:\n    all_forms: list[Form] = list(get_entry_forms(id, lexicons))\n    # the first result is always the lemma\n    return all_forms[0], all_forms[1:]\n\n\ndef _get_sense_n(id: str, lexspec: str, index: str | None, i: int) -> int:\n    \"\"\"Get the n rank value for a sense.\n\n    The n value is only informative if it is non-None and different\n    from the expected rank i. If an index is used, always return a\n    non-None value of n, even if it is the expected rank.\n    \"\"\"\n    n = get_sense_n(id, lexspec)\n    if n is not None and (index is not None or n != i):\n        return n\n    return 0\n\n\ndef _get_external_sense_ids(lexspecs: _LexSpecs) -> set[str]:\n    \"\"\"Get ids of external senses needed for an extension.\"\"\"\n    return get_relation_targets(\n        \"sense_relations\", \"senses\", (lexspecs.primary,), lexspecs\n    )\n\n\ndef _get_external_synset_ids(lexspecs: _LexSpecs) -> set[str]:\n    \"\"\"Get ids of external synsets needed for an extension.\"\"\"\n    return (\n        get_relation_targets(\n            \"synset_relations\", \"synsets\", (lexspecs.primary,), lexspecs\n        )\n        | get_relation_targets(\n            \"sense_synset_relations\", \"synsets\", (lexspecs.primary,), lexspecs\n        )\n        | {\n            sense[2]\n            for sense in find_senses(lexicons=lexspecs)\n            if sense[3] != lexspecs.base\n        }\n    )\n"
  },
  {
    "path": "wn/_lexicon.py",
    "content": "from __future__ import annotations\n\nfrom dataclasses import dataclass, field\nfrom typing import TYPE_CHECKING, NamedTuple, Protocol, TypeVar\n\nfrom wn._metadata import HasMetadata\nfrom wn._queries import (\n    find_entries,\n    find_ilis,\n    find_senses,\n    find_synsets,\n    get_lexicon,\n    get_lexicon_dependencies,\n    get_lexicon_extension_bases,\n    get_lexicon_extensions,\n    get_modified,\n)\n\nif TYPE_CHECKING:\n    from collections.abc import Callable, Sequence\n\n    from wn._metadata import Metadata\n\nDEFAULT_CONFIDENCE = 1.0\n\n\nSelf = TypeVar(\"Self\", bound=\"Lexicon\")  # typing.Self, python_version>=3.11\n\n\n@dataclass(repr=False, eq=True, frozen=True, slots=True)\nclass Lexicon(HasMetadata):\n    \"\"\"A class representing a wordnet lexicon.\"\"\"\n\n    __module__ = \"wn\"\n\n    _specifier: str\n    id: str\n    label: str\n    language: str\n    email: str\n    license: str\n    version: str\n    url: str | None = None\n    citation: str | None = None\n    logo: str | None = None\n    _metadata: Metadata | None = field(default=None, hash=False)\n\n    @classmethod\n    def from_specifier(cls: type[Self], specifier: str) -> Self:\n        data = get_lexicon(specifier)\n        spec, id, label, lang, email, license, version, url, citation, logo, meta = data\n        return cls(\n            spec,\n            id,\n            label,\n            lang,\n            email,\n            license,\n            version,\n            url=url,\n            citation=citation,\n            logo=logo,\n            _metadata=meta,\n        )\n\n    def __repr__(self):\n        return f\"<Lexicon {self._specifier} [{self.language}]>\"\n\n    def specifier(self) -> str:\n        \"\"\"Return the *id:version* lexicon specifier.\"\"\"\n        return self._specifier\n\n    def confidence(self) -> float:\n        \"\"\"Return the confidence score of the lexicon.\n\n        If the lexicon does not specify a confidence score, it defaults to 1.0.\n        \"\"\"\n        return float(self.metadata().get(\"confidenceScore\", DEFAULT_CONFIDENCE))\n\n    def modified(self) -> bool:\n        \"\"\"Return True if the lexicon has local modifications.\"\"\"\n        return get_modified(self._specifier)\n\n    def requires(self) -> dict[str, Lexicon | None]:\n        \"\"\"Return the lexicon dependencies.\"\"\"\n        return {\n            spec: (None if added is None else Lexicon.from_specifier(spec))\n            for spec, _, added in get_lexicon_dependencies(self._specifier)\n        }\n\n    def extends(self) -> Lexicon | None:\n        \"\"\"Return the lexicon this lexicon extends, if any.\n\n        If this lexicon is not an extension, return None.\n        \"\"\"\n        bases = get_lexicon_extension_bases(self._specifier, depth=1)\n        if bases:\n            return Lexicon.from_specifier(bases[0])\n        return None\n\n    def extensions(self, depth: int = 1) -> list[Lexicon]:\n        \"\"\"Return the list of lexicons extending this one.\n\n        By default, only direct extensions are included. This is\n        controlled by the *depth* parameter, which if you view\n        extensions as children in a tree where the current lexicon is\n        the root, *depth=1* are the immediate extensions. Increasing\n        this number gets extensions of extensions, or setting it to a\n        negative number gets all \"descendant\" extensions.\n\n        \"\"\"\n        return [\n            Lexicon.from_specifier(spec)\n            for spec in get_lexicon_extensions(self._specifier, depth=depth)\n        ]\n\n    def describe(self, full: bool = True) -> str:\n        \"\"\"Return a formatted string describing the lexicon.\n\n        The *full* argument (default: :python:`True`) may be set to\n        :python:`False` to omit word and sense counts.\n\n        Also see: :meth:`Wordnet.describe`\n\n        \"\"\"\n        lexspecs = (self.specifier(),)\n        substrings: list[str] = [\n            f\"{self._specifier}\",\n            f\"  Label  : {self.label}\",\n            f\"  URL    : {self.url}\",\n            f\"  License: {self.license}\",\n        ]\n        if full:\n            substrings.extend(\n                [\n                    f\"  Words  : {_desc_counts(find_entries, lexspecs)}\",\n                    f\"  Senses : {sum(1 for _ in find_senses(lexicons=lexspecs))}\",\n                ]\n            )\n        substrings.extend(\n            [\n                f\"  Synsets: {_desc_counts(find_synsets, lexspecs)}\",\n                f\"  ILIs   : {sum(1 for _ in find_ilis(lexicons=lexspecs)):>6}\",\n            ]\n        )\n        return \"\\n\".join(substrings)\n\n\ndef _desc_counts(query: Callable, lexspecs: Sequence[str]) -> str:\n    count: dict[str, int] = {}\n    for _, pos, *_ in query(lexicons=lexspecs):\n        if pos not in count:\n            count[pos] = 1\n        else:\n            count[pos] += 1\n    subcounts = \", \".join(f\"{pos}: {count[pos]}\" for pos in sorted(count))\n    return f\"{sum(count.values()):>6} ({subcounts})\"\n\n\nclass LexiconElement(Protocol):\n    \"\"\"Protocol for elements defined within a lexicon.\"\"\"\n\n    _lexicon: str  # source lexicon specifier\n\n    def lexicon(self) -> Lexicon:\n        \"\"\"Return the lexicon containing the element.\"\"\"\n        return Lexicon.from_specifier(self._lexicon)\n\n\nclass LexiconElementWithMetadata(LexiconElement, HasMetadata, Protocol):\n    \"\"\"Protocol for lexicon elements with metadata.\"\"\"\n\n    def confidence(self) -> float:\n        \"\"\"Return the confidence score of the element.\n\n        If the element does not have an explicit confidence score, the\n        value defaults to that of the lexicon containing the element.\n        \"\"\"\n        c = self.metadata().get(\"confidenceScore\")\n        if c is None:\n            c = self.lexicon().confidence()\n        return float(c)\n\n\nclass LexiconConfiguration(NamedTuple):\n    lexicons: tuple[str, ...]\n    expands: tuple[str, ...]\n    default_mode: bool\n"
  },
  {
    "path": "wn/_metadata.py",
    "content": "from typing import Protocol, TypedDict\n\n\nclass Metadata(TypedDict, total=False):\n    # For these, see https://globalwordnet.github.io/schemas/dc/\n    contributor: str\n    coverage: str\n    creator: str\n    date: str\n    description: str\n    format: str\n    identifier: str\n    publisher: str\n    relation: str\n    rights: str\n    source: str\n    subject: str\n    title: str\n    type: str\n    # Additional WN-LMF metadata\n    status: str\n    note: str\n    confidenceScore: float\n\n\nclass HasMetadata(Protocol):\n    @property\n    def _metadata(self) -> Metadata | None:\n        return None\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the associated metadata.\"\"\"\n        return self._metadata if self._metadata is not None else Metadata()\n\n    def confidence(self) -> float:\n        \"\"\"Return the confidence score.\n\n        If the confidenceScore metadata is available, return it. If not,\n        use a default confidence value.\n        \"\"\"\n        ...\n"
  },
  {
    "path": "wn/_module_functions.py",
    "content": "from typing import Literal, overload\n\nfrom wn._config import ResolvedProjectInfo, config\nfrom wn._core import Form, Sense, Synset, Word\nfrom wn._db import clear_connections, connect, list_lexicons_safe\nfrom wn._download import download\nfrom wn._exceptions import Error\nfrom wn._lexicon import Lexicon\nfrom wn._util import format_lexicon_specifier\nfrom wn._wordnet import Wordnet\n\n\ndef projects() -> list[ResolvedProjectInfo]:\n    \"\"\"Return the list of indexed projects.\n\n    This returns the same dictionaries of information as\n    :meth:`wn.config.get_project_info\n    <wn._config.WNConfig.get_project_info>`, but for all indexed\n    projects.\n\n    Example:\n\n        >>> infos = wn.projects()\n        >>> len(infos)\n        36\n        >>> infos[0][\"label\"]\n        'Open English WordNet'\n\n    \"\"\"\n    index = config.index\n    return [\n        config.get_project_info(format_lexicon_specifier(project_id, version))\n        for project_id, project_info in index.items()\n        for version in project_info.get(\"versions\", [])\n        if not project_info[\"versions\"][version][\"error\"]\n    ]\n\n\ndef lexicons(*, lexicon: str | None = \"*\", lang: str | None = None) -> list[Lexicon]:\n    \"\"\"Return the lexicons matching a language or lexicon specifier.\n\n    Example:\n\n        >>> wn.lexicons(lang=\"en\")\n        [<Lexicon ewn:2020 [en]>, <Lexicon omw-en:1.4 [en]>]\n\n    \"\"\"\n    try:\n        w = Wordnet(lang=lang, lexicon=lexicon or \"*\")\n    except Error:\n        return []\n    else:\n        return w.lexicons()\n\n\ndef reset_database(rebuild: bool = False) -> None:\n    \"\"\"Delete and recreate the database file.\n\n    If *rebuild* is :python:`True`, Wn will attempt to add all lexicons\n    that are added in the existing database. Note that this will only\n    attempt to add indexed projects via their lexicon specifiers, (using\n    :python:`wn.download(specifier)`) regardless of how they were\n    originally added, and will not attempt to add resources from\n    unindexed URLs or local files (unless those local files are cached\n    versions of indexed resources).\n\n    This function is useful when database schema changes necessitate a\n    rebuild or when testing requires a clean database.\n\n    .. warning::\n       This will completely delete the database and all added resources.\n       It does not delete the download cache. Using ``rebuild=True``\n       does not re-add non-lexicon resources like CILI files or\n       unindexed resources, so you will need to add those manually.\n    \"\"\"\n    specs = list_lexicons_safe()\n    clear_connections()\n    config.database_path.unlink(missing_ok=True)\n    connect()\n    if rebuild:\n        for spec in specs:\n            download(spec)\n    clear_connections()\n\n\ndef word(id: str, *, lexicon: str | None = None, lang: str | None = None) -> Word:\n    \"\"\"Return the word with *id* in *lexicon*.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The *id* argument is then passed to the\n    :meth:`Wordnet.word` method.\n\n    >>> wn.word(\"ewn-cell-n\")\n    Word('ewn-cell-n')\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).word(id)\n\n\ndef words(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[Word]:\n    \"\"\"Return the list of matching words.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The remaining arguments are passed to the\n    :meth:`Wordnet.words` method.\n\n    >>> len(wn.words())\n    282902\n    >>> len(wn.words(pos=\"v\"))\n    34592\n    >>> wn.words(form=\"scurry\")\n    [Word('ewn-scurry-n'), Word('ewn-scurry-v')]\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).words(form=form, pos=pos)\n\n\n@overload\ndef lemmas(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    data: Literal[False] = False,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[str]: ...\n\n\n@overload\ndef lemmas(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    data: Literal[True] = True,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[Form]: ...\n\n\n@overload\ndef lemmas(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    data: bool,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[str] | list[Form]: ...\n\n\ndef lemmas(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    data: bool = False,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[str] | list[Form]:\n    \"\"\"Return the list of lemmas for matching words.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The remaining arguments are passed to the\n    :meth:`Wordnet.lemmas` method.\n\n    If the *data* argument is :python:`False` (the default), the\n    lemmas are returned as :class:`str` types. If it is\n    :python:`True`, :class:`wn.Form` objects are used instead.\n\n    >>> wn.lemmas(\"wolves\")\n    ['wolf']\n    >>> wn.lemmas(\"wolves\", data=True)\n    [Form(value='wolf')]\n    >>> len(wn.lemmas(pos=\"v\"))\n    11617\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).lemmas(form=form, pos=pos, data=data)\n\n\ndef synset(id: str, *, lexicon: str | None = None, lang: str | None = None) -> Synset:\n    \"\"\"Return the synset with *id* in *lexicon*.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The *id* argument is then passed to the\n    :meth:`Wordnet.synset` method.\n\n    >>> wn.synset(\"ewn-03311152-n\")\n    Synset('ewn-03311152-n')\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).synset(id=id)\n\n\ndef synsets(\n    form: str | None = None,\n    pos: str | None = None,\n    ili: str | None = None,\n    *,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[Synset]:\n    \"\"\"Return the list of matching synsets.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The remaining arguments are passed to the\n    :meth:`Wordnet.synsets` method.\n\n    >>> len(wn.synsets(\"couch\"))\n    4\n    >>> wn.synsets(\"couch\", pos=\"v\")\n    [Synset('ewn-00983308-v')]\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).synsets(form=form, pos=pos, ili=ili)\n\n\ndef senses(\n    form: str | None = None,\n    pos: str | None = None,\n    *,\n    lexicon: str | None = None,\n    lang: str | None = None,\n) -> list[Sense]:\n    \"\"\"Return the list of matching senses.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The remaining arguments are passed to the\n    :meth:`Wordnet.senses` method.\n\n    >>> len(wn.senses(\"twig\"))\n    3\n    >>> wn.senses(\"twig\", pos=\"n\")\n    [Sense('ewn-twig-n-13184889-02')]\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).senses(form=form, pos=pos)\n\n\ndef sense(id: str, *, lexicon: str | None = None, lang: str | None = None) -> Sense:\n    \"\"\"Return the sense with *id* in *lexicon*.\n\n    This will create a :class:`Wordnet` object using the *lang* and\n    *lexicon* arguments. The *id* argument is then passed to the\n    :meth:`Wordnet.sense` method.\n\n    >>> wn.sense(\"ewn-flutter-v-01903884-02\")\n    Sense('ewn-flutter-v-01903884-02')\n\n    \"\"\"\n    return Wordnet(lang=lang, lexicon=lexicon).sense(id=id)\n"
  },
  {
    "path": "wn/_queries.py",
    "content": "\"\"\"\nDatabase retrieval queries.\n\"\"\"\n\nimport itertools\nfrom collections.abc import Collection, Iterator, Sequence\nfrom typing import cast\n\nfrom wn._db import connect\nfrom wn._exceptions import Error\nfrom wn._metadata import Metadata\n\n# Local Types\n\nPronunciation = tuple[\n    str,  # value\n    str | None,  # variety\n    str | None,  # notation\n    bool,  # phonemic\n    str | None,  # audio\n    str,  # lexicon specifier\n]\nTag = tuple[str, str, str]  # tag, category, lexicon specifier\nForm = tuple[\n    str,  # form\n    str | None,  # id\n    str | None,  # script\n    str,  # lexicon\n    list[Pronunciation],  # pronunciations\n    list[Tag],  # tags\n]\n_Word = tuple[\n    str,  # id\n    str,  # pos\n    str,  # lexicon specifier\n]\n_Synset = tuple[\n    str,  # id\n    str,  # pos\n    str,  # ili\n    str,  # lexicon specifier\n]\n_Synset_Relation = tuple[\n    str,  # rel_name\n    str,  # lexicon\n    Metadata,  # metadata\n    str,  # srcid\n    str,  # _Synset...\n    str,\n    str,\n    str,\n]\n_Definition = tuple[\n    str,  # text\n    str,  # language\n    str,  # sourceSense\n    str,  # lexicon\n    Metadata | None,  # metadata\n]\n_Example = tuple[\n    str,  # text\n    str,  # language\n    str,  # lexicon\n    Metadata | None,  # metadata\n]\nSense = tuple[\n    str,  # id\n    str,  # entry_id\n    str,  # synset_id\n    str,  # lexicon specifier\n]\n_Sense_Relation = tuple[\n    str,  # rel_name\n    str,  # lexicon\n    Metadata,  # metadata\n    str,  # Sense...\n    str,\n    str,\n    str,\n]\n_Count = tuple[int, str, Metadata]  # count, lexicon, metadata\n_SyntacticBehaviour = tuple[\n    str,  # id\n    str,  # frame\n    list[str],  # sense ids\n]\n_ExistingILI = tuple[\n    str,  # id\n    str,  # status\n    str | None,  # definition\n    Metadata,\n]\n_ProposedILI = tuple[\n    str,  # synset id\n    str,  # lexicon\n    str,  # definition\n    Metadata,\n]\n_Lexicon = tuple[\n    str,  # specifier\n    str,  # id\n    str,  # label\n    str,  # language\n    str,  # email\n    str,  # license\n    str,  # version\n    str,  # url\n    str,  # citation\n    str,  # logo\n    Metadata | None,  # metadata\n]\n\n\ndef resolve_lexicon_specifiers(\n    lexicon: str,\n    lang: str | None = None,\n) -> list[str]:\n    cur = connect().cursor()\n    specifiers: list[str] = []\n    for specifier in lexicon.split():\n        limit = \"-1\" if \"*\" in lexicon else \"1\"\n        if \":\" not in specifier:\n            specifier += \":*\"\n        query = f\"\"\"\n            SELECT DISTINCT specifier\n              FROM lexicons\n             WHERE specifier GLOB :specifier\n               AND (:language ISNULL OR language = :language)\n             LIMIT {limit}\n        \"\"\"\n        params = {\"specifier\": specifier, \"language\": lang}\n        specifiers.extend(row[0] for row in cur.execute(query, params))\n    # only raise an error when the query specifies something\n    if not specifiers and (lexicon != \"*\" or lang is not None):\n        raise Error(f\"no lexicon found with lang={lang!r} and lexicon={lexicon!r}\")\n    return specifiers\n\n\ndef get_lexicon(lexicon: str) -> _Lexicon:\n    query = \"\"\"\n        SELECT DISTINCT specifier, id, label, language, email, license,\n                        version, url, citation, logo, metadata\n        FROM lexicons\n        WHERE specifier = ?\n    \"\"\"\n    row: _Lexicon | None = connect().execute(query, (lexicon,)).fetchone()\n    if row is None:\n        raise LookupError(lexicon)  # should we have a WnLookupError?\n    return row\n\n\ndef get_modified(lexicon: str) -> bool:\n    query = \"SELECT modified FROM lexicons WHERE specifier = ?\"\n    return connect().execute(query, (lexicon,)).fetchone()[0]\n\n\ndef get_lexicon_dependencies(lexicon: str) -> list[tuple[str, str, bool]]:\n    query = \"\"\"\n        SELECT provider_id || \":\" || provider_version, provider_url, provider_rowid\n          FROM lexicon_dependencies\n          JOIN lexicons AS lex ON lex.rowid = dependent_rowid\n         WHERE lex.specifier = ?\n    \"\"\"\n    return [\n        (spec, url, rowid is not None)\n        for spec, url, rowid in connect().execute(query, (lexicon,))\n    ]\n\n\ndef get_lexicon_extension_bases(lexicon: str, depth: int = -1) -> list[str]:\n    query = \"\"\"\n          WITH RECURSIVE ext(x, d) AS\n               (SELECT base_rowid, 1\n                  FROM lexicon_extensions\n                  JOIN lexicons AS lex ON lex.rowid = extension_rowid\n                 WHERE lex.specifier = :specifier\n                 UNION SELECT base_rowid, d+1\n                         FROM lexicon_extensions\n                         JOIN ext ON extension_rowid = x)\n        SELECT baselex.specifier\n          FROM ext\n          JOIN lexicons AS baselex ON baselex.rowid = ext.x\n         WHERE :depth < 0 OR d <= :depth\n         ORDER BY d\n    \"\"\"\n    rows = connect().execute(query, {\"specifier\": lexicon, \"depth\": depth})\n    return [row[0] for row in rows]\n\n\ndef get_lexicon_extensions(lexicon: str, depth: int = -1) -> list[str]:\n    query = \"\"\"\n          WITH RECURSIVE ext(x, d) AS\n               (SELECT extension_rowid, 1\n                  FROM lexicon_extensions\n                  JOIN lexicons AS lex ON lex.rowid = base_rowid\n                 WHERE lex.specifier = :specifier\n                 UNION SELECT extension_rowid, d+1\n                         FROM lexicon_extensions\n                         JOIN ext ON base_rowid = x)\n        SELECT extlex.specifier\n          FROM ext\n          JOIN lexicons AS extlex ON extlex.rowid = ext.x\n         WHERE :depth < 0 OR d <= :depth\n         ORDER BY d\n    \"\"\"\n    rows = connect().execute(query, {\"specifier\": lexicon, \"depth\": depth})\n    return [row[0] for row in rows]\n\n\ndef get_ili(id: str) -> _ExistingILI | None:\n    query = \"\"\"\n        SELECT i.id, ist.status, i.definition, i.metadata\n          FROM ilis AS i\n          JOIN ili_statuses AS ist ON i.status_rowid = ist.rowid\n         WHERE i.id = ?\n         LIMIT 1\n    \"\"\"\n    return connect().execute(query, (id,)).fetchone()\n\n\ndef find_ilis(\n    status: str | None = None,\n    lexicons: Sequence[str] = (),\n) -> Iterator[_ExistingILI]:\n    query = \"\"\"\n        SELECT DISTINCT i.id, ist.status, i.definition, i.metadata\n          FROM ilis AS i\n          JOIN ili_statuses AS ist ON i.status_rowid = ist.rowid\n    \"\"\"\n    conditions: list[str] = []\n    params: list = []\n    if status:\n        conditions.append(\"ist.status = ?\")\n        params.append(status)\n    if lexicons:\n        # this runs much faster than just adding a condition\n        query = \"\"\"\n        SELECT DISTINCT i.id, ist.status, i.definition, i.metadata\n          FROM lexicons as lex\n          JOIN synsets AS ss ON ss.lexicon_rowid = lex.rowid\n          JOIN ilis AS i ON i.rowid = ss.ili_rowid\n          JOIN ili_statuses AS ist ON i.status_rowid = ist.rowid\n        \"\"\"\n        conditions.append(f\"lex.specifier IN ({_qs(lexicons)})\")\n        params.extend(lexicons)\n\n    if conditions:\n        query += \" WHERE \" + \"\\n           AND \".join(conditions)\n\n    yield from connect().execute(query, params)\n\n\ndef find_proposed_ilis(\n    synset_id: str | None = None,\n    lexicons: Sequence[str] = (),\n) -> Iterator[_ProposedILI]:\n    query = \"\"\"\n    SELECT ss.id, lex.specifier, pi.definition, pi.metadata\n      FROM proposed_ilis AS pi\n      JOIN synsets AS ss ON ss.rowid = synset_rowid\n      JOIN lexicons AS lex ON lex.rowid = ss.lexicon_rowid\n    \"\"\"\n    conditions: list[str] = []\n    params: list = []\n    if synset_id is not None:\n        conditions.append(\"ss.id = ?\")\n        params.append(synset_id)\n    if lexicons:\n        conditions.append(f\"lex.specifier IN ({_qs(lexicons)})\")\n        params.extend(lexicons)\n    if conditions:\n        query += \" WHERE \" + \"\\n           AND \".join(conditions)\n    yield from connect().execute(query, params)\n\n\ndef find_entries(\n    id: str | None = None,\n    forms: Sequence[str] = (),\n    pos: str | None = None,\n    lexicons: Sequence[str] = (),\n    normalized: bool = False,\n    search_all_forms: bool = False,\n) -> Iterator[_Word]:\n    conn = connect()\n    cte, cteparams, conditions, condparams = _build_entry_conditions(\n        forms, pos, lexicons, normalized, search_all_forms\n    )\n\n    if id:\n        conditions.insert(0, \"e.id = ?\")\n        condparams.insert(0, id)\n\n    condition = \"\"\n    if conditions:\n        condition = \"WHERE \" + \"\\n           AND \".join(conditions)\n\n    query = f\"\"\"\n          {cte}\n        SELECT DISTINCT e.id, e.pos, lex.specifier\n          FROM entries AS e\n          JOIN lexicons AS lex ON lex.rowid = e.lexicon_rowid\n         {condition}\n         ORDER BY e.rowid ASC\n    \"\"\"\n\n    rows: Iterator[_Word] = conn.execute(query, cteparams + condparams)\n    yield from rows\n\n\ndef _load_lemmas_with_details(\n    conn,\n    cte: str,\n    cteparams: list,\n    conditions: list[str],\n    condparams: list,\n    with_lexicons: bool,\n) -> Iterator[Form]:\n    \"\"\"Load lemmas with pronunciations and tags (full details).\"\"\"\n    plex_cond = \"AND plex.specifier IN lexspecs\" if with_lexicons else \"\"\n    tlex_cond = \"AND tlex.specifier IN lexspecs\" if with_lexicons else \"\"\n    condition = \"\"\n    if conditions:\n        condition = \"AND \" + \"\\n           AND \".join(conditions)\n    query = f\"\"\"\n          {cte}\n        SELECT DISTINCT f.rowid, f.form, f.id, f.script, lex.specifier,\n               p.value, p.variety, p.notation, p.phonemic, p.audio, plex.specifier,\n               t.tag, t.category, tlex.specifier\n          FROM forms AS f\n          JOIN entries AS e ON e.rowid = f.entry_rowid\n          JOIN lexicons AS lex ON lex.rowid = e.lexicon_rowid\n          LEFT JOIN pronunciations AS p ON p.form_rowid = f.rowid\n          LEFT JOIN lexicons AS plex ON plex.rowid = p.lexicon_rowid {plex_cond}\n          LEFT JOIN tags AS t ON t.form_rowid = f.rowid\n          LEFT JOIN lexicons AS tlex ON tlex.rowid = t.lexicon_rowid {tlex_cond}\n         WHERE f.rank = 0\n         {condition}\n         ORDER BY f.rowid ASC\n    \"\"\"\n\n    # Group results by form_rowid and process pronunciations/tags\n    forms_dict: dict[\n        int, tuple[str, str | None, str | None, str, list[Pronunciation], list[Tag]]\n    ] = {}\n\n    for row in conn.execute(query, cteparams + condparams):\n        form_rowid, form, form_id, script, lexicon = row[0:5]\n        pron_data = row[5:11]\n        tag_data = row[11:14]\n\n        if form_rowid not in forms_dict:\n            forms_dict[form_rowid] = (form, form_id, script, lexicon, [], [])\n\n        # Add pronunciation if present\n        if pron_data[0] is not None:  # value\n            pron = cast(\"Pronunciation\", pron_data)\n            if pron not in forms_dict[form_rowid][4]:\n                forms_dict[form_rowid][4].append(pron)\n\n        # Add tag if present\n        if tag_data[0] is not None:  # tag\n            tag = cast(\"Tag\", tag_data)\n            if tag not in forms_dict[form_rowid][5]:\n                forms_dict[form_rowid][5].append(tag)\n\n    # Yield forms in order\n    yield from forms_dict.values()\n\n\ndef find_lemmas(\n    forms: Sequence[str] = (),\n    pos: str | None = None,\n    lexicons: Sequence[str] = (),\n    normalized: bool = False,\n    search_all_forms: bool = False,\n    load_details: bool = False,\n) -> Iterator[Form]:\n    \"\"\"Find lemmas matching the given criteria.\n\n    Returns form data for the lemma of each matching entry.\n    If load_details is False, pronunciations and tags are not loaded.\n    \"\"\"\n    conn = connect()\n    cte, cteparams, conditions, condparams = _build_entry_conditions(\n        forms, pos, lexicons, normalized, search_all_forms\n    )\n\n    if not load_details:\n        # Fast path: don't load pronunciations and tags\n        condition = \"\"\n        if conditions:\n            condition = \"AND \" + \"\\n           AND \".join(conditions)\n        query = f\"\"\"\n              {cte}\n            SELECT f.form, f.id, f.script, lex.specifier\n              FROM forms AS f\n              JOIN entries AS e ON e.rowid = f.entry_rowid\n              JOIN lexicons AS lex ON lex.rowid = e.lexicon_rowid\n             WHERE f.rank = 0\n             {condition}\n             ORDER BY f.rowid ASC\n        \"\"\"\n        for row in conn.execute(query, cteparams + condparams):\n            form, form_id, script, lexicon = row\n            yield (form, form_id, script, lexicon, [], [])\n    else:\n        # Full path: load pronunciations and tags\n        yield from _load_lemmas_with_details(\n            conn, cte, cteparams, conditions, condparams, bool(lexicons)\n        )\n\n\ndef find_senses(\n    id: str | None = None,\n    forms: Sequence[str] = (),\n    pos: str | None = None,\n    lexicons: Sequence[str] = (),\n    normalized: bool = False,\n    search_all_forms: bool = False,\n) -> Iterator[Sense]:\n    conn = connect()\n    ctes: list[str] = []\n    params: list = []\n    conditions = []\n    order = \"s.rowid\"\n    if id:\n        conditions.append(\"s.id = ?\")\n        params.append(id)\n    if forms:\n        ctes, subquery = _query_forms(forms, normalized, search_all_forms)\n        conditions.append(f\"s.entry_rowid IN {subquery}\")\n        params.extend(forms)\n        order = \"s.lexicon_rowid, e.pos, s.entry_rank\"\n    if pos:\n        conditions.append(\"e.pos = ?\")\n        params.append(pos)\n    if lexicons:\n        conditions.append(f\"slex.specifier IN ({_qs(lexicons)})\")\n        params.extend(lexicons)\n\n    cte = \"\"\n    if ctes:\n        cte = \"WITH \" + \",\\n         \".join(ctes)\n\n    condition = \"\"\n    if conditions:\n        condition = \"WHERE \" + \"\\n           AND \".join(conditions)\n\n    query = f\"\"\"\n          {cte}\n        SELECT DISTINCT s.id, e.id, ss.id, slex.specifier\n          FROM senses AS s\n          JOIN entries AS e ON e.rowid = s.entry_rowid\n          JOIN synsets AS ss ON ss.rowid = s.synset_rowid\n          JOIN lexicons AS slex ON slex.rowid = s.lexicon_rowid\n         {condition}\n         ORDER BY {order} ASC\n    \"\"\"\n\n    rows: Iterator[Sense] = conn.execute(query, params)\n    yield from rows\n\n\ndef find_synsets(\n    id: str | None = None,\n    forms: Sequence[str] = (),\n    pos: str | None = None,\n    ili: str | None = None,\n    lexicons: Sequence[str] = (),\n    normalized: bool = False,\n    search_all_forms: bool = False,\n) -> Iterator[_Synset]:\n    conn = connect()\n    ctes: list[str] = []\n    join = \"\"\n    conditions = []\n    order = \"ss.rowid\"\n    params: list = []\n    if id:\n        conditions.append(\"ss.id = ?\")\n        params.append(id)\n    if forms:\n        ctes, subquery = _query_forms(forms, normalized, search_all_forms)\n        join = f\"\"\"\\\n          JOIN (SELECT _s.entry_rowid, _s.synset_rowid, _s.entry_rank\n                  FROM senses AS _s\n                 WHERE _s.entry_rowid IN {subquery}\n               ) AS s\n            ON s.synset_rowid = ss.rowid\n        \"\"\".strip()\n        params.extend(forms)\n        order = \"ss.lexicon_rowid, ss.pos, s.entry_rank\"\n    if pos:\n        conditions.append(\"ss.pos = ?\")\n        params.append(pos)\n    if ili:\n        conditions.append(\n            \"ss.ili_rowid IN (SELECT ilis.rowid FROM ilis WHERE ilis.id = ?)\"\n        )\n        params.append(ili)\n    if lexicons:\n        conditions.append(f\"sslex.specifier IN ({_qs(lexicons)})\")\n        params.extend(lexicons)\n\n    cte = \"\"\n    if ctes:\n        cte = \"WITH \" + \",\\n         \".join(ctes)\n\n    condition = \"\"\n    if conditions:\n        condition = \"WHERE \" + \"\\n           AND \".join(conditions)\n\n    query = f\"\"\"\n          {cte}\n        SELECT DISTINCT ss.id, ss.pos,\n                        (SELECT ilis.id FROM ilis WHERE ilis.rowid=ss.ili_rowid),\n                        sslex.specifier\n          FROM synsets AS ss\n          JOIN lexicons AS sslex ON sslex.rowid = ss.lexicon_rowid\n          {join}\n         {condition}\n         ORDER BY {order} ASC\n    \"\"\"\n\n    rows: Iterator[_Synset] = conn.execute(query, params)\n    yield from rows\n\n\ndef get_entry_forms(id: str, lexicons: Sequence[str]) -> Iterator[Form]:\n    form_query = f\"\"\"\n          WITH lexspecs(s) AS (VALUES {_vs(lexicons)})\n        SELECT f.rowid, f.form, f.id, f.script, lex.specifier\n          FROM forms AS f\n          JOIN entries AS e ON e.rowid = entry_rowid\n          JOIN lexicons AS lex ON lex.rowid = e.lexicon_rowid\n         WHERE e.id = ?\n           AND lex.specifier IN lexspecs\n         ORDER BY f.rank\n    \"\"\"\n    pron_query = f\"\"\"\n          WITH lexspecs(s) AS (VALUES {_vs(lexicons)})\n        SELECT p.value, p.variety, p.notation, p.phonemic, p.audio, lex.specifier\n          FROM pronunciations AS p\n          JOIN lexicons AS lex ON lex.rowid = p.lexicon_rowid\n         WHERE form_rowid = ?\n           AND lex.specifier IN lexspecs\n    \"\"\"\n    tag_query = f\"\"\"\n          WITH lexspecs(s) AS (VALUES {_vs(lexicons)})\n        SELECT t.tag, t.category, lex.specifier\n          FROM tags AS t\n          JOIN lexicons AS lex ON lex.rowid = t.lexicon_rowid\n         WHERE form_rowid = ?\n           AND lex.specifier IN lexspecs\n    \"\"\"\n\n    cur = connect().cursor()\n    for row in cur.execute(form_query, (*lexicons, id)).fetchall():\n        params = (*lexicons, row[0])\n        prons: list[Pronunciation] = cur.execute(pron_query, params).fetchall()\n        tags: list[Tag] = cur.execute(tag_query, params).fetchall()\n        yield (*row[1:], prons, tags)\n\n\ndef get_synsets_for_ilis(\n    ilis: Collection[str],\n    lexicons: Sequence[str],\n) -> Iterator[_Synset]:\n    conn = connect()\n    query = f\"\"\"\n        SELECT DISTINCT ss.id, ss.pos, ili.id, sslex.specifier\n          FROM synsets as ss\n          JOIN ilis as ili ON ss.ili_rowid = ili.rowid\n          JOIN lexicons AS sslex ON sslex.rowid = ss.lexicon_rowid\n         WHERE ili.id IN ({_qs(ilis)})\n           AND sslex.specifier IN ({_qs(lexicons)})\n    \"\"\"\n    params = *ilis, *lexicons\n    result_rows: Iterator[_Synset] = conn.execute(query, params)\n    yield from result_rows\n\n\ndef get_synset_relations(\n    synset_id: str,\n    synset_lexicon: str,\n    relation_types: Collection[str],\n    lexicons: Sequence[str],\n    target_lexicons: Sequence[str],\n) -> Iterator[_Synset_Relation]:\n    conn = connect()\n    params: list = []\n    constraint = \"\"\n    if relation_types and \"*\" not in relation_types:\n        constraint = f\"WHERE type IN ({_qs(relation_types)})\"\n        params.extend(relation_types)\n    params.extend(lexicons)\n    params.extend(target_lexicons)\n    params.append(synset_id)\n    params.append(synset_lexicon)\n    query = f\"\"\"\n        WITH\n          reltypes(rowid) AS\n            (SELECT rowid FROM relation_types {constraint}),\n          lexrowids(rowid) AS\n            (SELECT rowid FROM lexicons\n              WHERE specifier IN ({_vs(lexicons)})),\n          tgtlexrowids(rowid) AS\n            (SELECT rowid FROM lexicons\n              WHERE specifier IN ({_vs(target_lexicons)})),\n          srcsynset(rowid) AS\n            (SELECT ss.rowid\n               FROM synsets AS ss\n               JOIN lexicons AS lex ON lex.rowid = ss.lexicon_rowid\n              WHERE ss.id = ?\n                AND lex.specifier = ?),\n          matchingrels(rowid) AS\n            (SELECT srel.rowid\n               FROM synset_relations AS srel\n              WHERE srel.source_rowid IN srcsynset\n                AND srel.lexicon_rowid IN lexrowids\n                AND srel.type_rowid IN reltypes)\n        SELECT DISTINCT rt.type, lex.specifier, srel.metadata,\n                        src.id, tgt.id, tgt.pos, tgtili.id, tgtlex.specifier\n          FROM matchingrels AS mr\n          JOIN synset_relations AS srel ON srel.rowid=mr.rowid\n          JOIN relation_types AS rt ON rt.rowid=srel.type_rowid\n          JOIN synsets AS src ON src.rowid = srel.source_rowid\n          JOIN synsets AS tgt ON tgt.rowid = srel.target_rowid\n          JOIN lexicons AS lex ON lex.rowid = srel.lexicon_rowid\n          JOIN lexicons AS tgtlex ON tgtlex.rowid = tgt.lexicon_rowid\n          LEFT JOIN ilis AS tgtili ON tgtili.rowid = tgt.ili_rowid  -- might be null\n         WHERE tgt.lexicon_rowid IN tgtlexrowids  -- ensure target is included\n    \"\"\"\n    result_rows: Iterator[_Synset_Relation] = conn.execute(query, params)\n    yield from result_rows\n\n\ndef get_expanded_synset_relations(\n    ili_id: str,\n    relation_types: Collection[str],\n    expands: Sequence[str],\n) -> Iterator[_Synset_Relation]:\n    conn = connect()\n    params: list = []\n    constraint = \"\"\n    if relation_types and \"*\" not in relation_types:\n        constraint = f\"WHERE type IN ({_qs(relation_types)})\"\n        params.extend(relation_types)\n    params.extend(expands)\n    params.append(ili_id)\n    query = f\"\"\"\n        WITH\n          reltypes(rowid) AS\n            (SELECT rowid FROM relation_types {constraint}),\n          lexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(expands)})),\n          srcsynset(rowid) AS\n            (SELECT ss.rowid\n               FROM synsets AS ss\n               JOIN ilis ON ilis.rowid = ss.ili_rowid\n              WHERE ilis.id = ?\n                AND ss.lexicon_rowid IN lexrowids),\n          matchingrels(rowid) AS\n            (SELECT srel.rowid\n               FROM synset_relations AS srel\n              WHERE srel.source_rowid IN srcsynset\n                AND srel.lexicon_rowid IN lexrowids\n                AND srel.type_rowid IN reltypes)\n        SELECT DISTINCT rt.type, lex.specifier, srel.metadata,\n                        src.id, tgt.id, tgt.pos, tgtili.id, tgtlex.specifier\n          FROM matchingrels AS mr\n          JOIN synset_relations AS srel ON srel.rowid=mr.rowid\n          JOIN relation_types AS rt ON rt.rowid=srel.type_rowid\n          JOIN synsets AS src ON src.rowid = srel.source_rowid\n          JOIN synsets AS tgt ON tgt.rowid = srel.target_rowid\n          JOIN ilis AS tgtili ON tgtili.rowid = tgt.ili_rowid\n          JOIN lexicons AS lex ON lex.rowid = srel.lexicon_rowid\n          JOIN lexicons AS tgtlex ON tgtlex.rowid = tgt.lexicon_rowid\n    \"\"\"\n    result_rows: Iterator[_Synset_Relation] = conn.execute(query, params)\n    yield from result_rows\n\n\ndef get_definitions(\n    synset_id: str,\n    lexicons: Sequence[str],\n) -> list[_Definition]:\n    conn = connect()\n    query = f\"\"\"\n        SELECT d.definition,\n               d.language,\n               (SELECT s.id FROM senses AS s WHERE s.rowid=d.sense_rowid),\n               lex.specifier,\n               d.metadata\n          FROM definitions AS d\n          JOIN synsets AS ss ON ss.rowid = d.synset_rowid\n          JOIN lexicons AS lex ON lex.rowid = d.lexicon_rowid\n         WHERE ss.id = ?\n           AND lex.specifier IN ({_qs(lexicons)})\n    \"\"\"\n    return conn.execute(query, (synset_id, *lexicons)).fetchall()\n\n\n_SANITIZED_EXAMPLE_PREFIXES = {\n    \"senses\": \"sense\",\n    \"synsets\": \"synset\",\n}\n\n\ndef get_examples(\n    id: str,\n    table: str,\n    lexicons: Sequence[str],\n) -> list[_Example]:\n    conn = connect()\n    prefix = _SANITIZED_EXAMPLE_PREFIXES.get(table)\n    if prefix is None:\n        raise Error(f\"'{table}' does not have examples\")\n    query = f\"\"\"\n        SELECT ex.example, ex.language, lex.specifier, ex.metadata\n          FROM {prefix}_examples AS ex\n          JOIN {table} AS tbl ON tbl.rowid = ex.{prefix}_rowid\n          JOIN lexicons AS lex ON lex.rowid = ex.lexicon_rowid\n         WHERE tbl.id = ?\n           AND lex.specifier IN ({_qs(lexicons)})\n    \"\"\"\n    return conn.execute(query, (id, *lexicons)).fetchall()\n\n\ndef find_syntactic_behaviours(\n    id: str | None = None,\n    lexicons: Sequence[str] = (),\n) -> Iterator[_SyntacticBehaviour]:\n    conn = connect()\n    query = \"\"\"\n        SELECT sb.id, sb.frame, s.id\n          FROM syntactic_behaviours AS sb\n          JOIN syntactic_behaviour_senses AS sbs\n            ON sbs.syntactic_behaviour_rowid = sb.rowid\n          JOIN senses AS s\n            ON s.rowid = sbs.sense_rowid\n          JOIN lexicons AS lex ON lex.rowid = sb.lexicon_rowid\n    \"\"\"\n    conditions: list[str] = []\n    params: list = []\n    if id:\n        conditions.append(\"sb.id = ?\")\n        params.append(id)\n    if lexicons:\n        conditions.append(f\"lex.specifier IN ({_qs(lexicons)})\")\n        params.extend(lexicons)\n    if conditions:\n        query += \"\\n WHERE \" + \"\\n   AND \".join(conditions)\n    rows: Iterator[tuple[str, str, str]] = conn.execute(query, params)\n    for key, group in itertools.groupby(rows, lambda row: row[0:2]):\n        id, frame = cast(\"tuple[str, str]\", key)\n        sense_ids = [row[2] for row in group]\n        yield id, frame, sense_ids\n\n\ndef get_syntactic_behaviours(\n    sense_id: str,\n    lexicons: Sequence[str],\n) -> list[str]:\n    conn = connect()\n    query = f\"\"\"\n        SELECT sb.frame\n          FROM syntactic_behaviours AS sb\n          JOIN syntactic_behaviour_senses AS sbs\n            ON sbs.syntactic_behaviour_rowid = sb.rowid\n          JOIN senses AS s ON s.rowid = sbs.sense_rowid\n          JOIN lexicons AS lex ON lex.rowid = sb.lexicon_rowid\n         WHERE s.id = ?\n           AND lex.specifier IN ({_qs(lexicons)})\n    \"\"\"\n    return [row[0] for row in conn.execute(query, (sense_id, *lexicons))]\n\n\ndef _get_senses(\n    id: str, sourcetype: str, lexicons: Sequence[str], order_by_rank: bool = True\n) -> Iterator[Sense]:\n    conn = connect()\n    match sourcetype:\n        case \"entry\":\n            sourcealias = \"e\"\n        case \"synset\":\n            sourcealias = \"ss\"\n        case _:\n            raise Error(f\"invalid sense source type: {sourcetype}\")\n    order_col = f\"{sourcetype}_rank\" if order_by_rank else \"rowid\"\n    query = f\"\"\"\n        SELECT s.id, e.id, ss.id, slex.specifier\n          FROM senses AS s\n          JOIN entries AS e\n            ON e.rowid = s.entry_rowid\n          JOIN synsets AS ss\n            ON ss.rowid = s.synset_rowid\n          JOIN lexicons AS slex\n            ON slex.rowid = s.lexicon_rowid\n         WHERE {sourcealias}.id = ?\n           AND slex.specifier IN ({_qs(lexicons)})\n         ORDER BY s.{order_col}\n    \"\"\"\n    return conn.execute(query, (id, *lexicons))\n\n\ndef get_entry_senses(\n    sense_id: str, lexicons: Sequence[str], order_by_rank: bool = True\n) -> Iterator[Sense]:\n    yield from _get_senses(sense_id, \"entry\", lexicons, order_by_rank)\n\n\ndef get_synset_members(\n    synset_id: str, lexicons: Sequence[str], order_by_rank: bool = True\n) -> Iterator[Sense]:\n    yield from _get_senses(synset_id, \"synset\", lexicons, order_by_rank)\n\n\ndef get_sense_relations(\n    sense_id: str,\n    relation_types: Collection[str],\n    lexicons: Sequence[str],\n    target_lexicons: Sequence[str],\n) -> Iterator[_Sense_Relation]:\n    params: list = []\n    constraint = \"\"\n    if relation_types and \"*\" not in relation_types:\n        constraint = f\"WHERE type IN ({_qs(relation_types)})\"\n        params.extend(relation_types)\n    params.extend(lexicons)\n    params.extend(target_lexicons)\n    params.append(sense_id)\n    query = f\"\"\"\n        WITH\n          rt(rowid, type) AS\n            (SELECT rowid, type FROM relation_types {constraint}),\n          lexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(lexicons)})),\n          tgtlexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(target_lexicons)}))\n        SELECT DISTINCT rel.type, rel.lexicon, rel.metadata,\n                        s.id, e.id, ss.id, slex.specifier\n          FROM (SELECT rt.type,\n                       lex.specifier AS lexicon,\n                       srel.metadata AS metadata,\n                       target_rowid\n                  FROM sense_relations AS srel\n                  JOIN rt ON srel.type_rowid = rt.rowid\n                  JOIN lexicons AS lex ON srel.lexicon_rowid = lex.rowid\n                  JOIN senses AS s ON s.rowid = srel.source_rowid\n                 WHERE s.id = ?\n                   AND srel.lexicon_rowid IN lexrowids\n               ) AS rel\n          JOIN senses AS s\n            ON s.rowid = rel.target_rowid\n           AND s.lexicon_rowid IN tgtlexrowids\n          JOIN lexicons AS slex\n            ON slex.rowid = s.lexicon_rowid\n          JOIN entries AS e\n            ON e.rowid = s.entry_rowid\n          JOIN synsets AS ss\n            ON ss.rowid = s.synset_rowid\n    \"\"\"\n    rows: Iterator[_Sense_Relation] = connect().execute(query, params)\n    yield from rows\n\n\ndef get_sense_synset_relations(\n    sense_id: str,\n    relation_types: Collection[str],\n    lexicons: Sequence[str],\n    target_lexicons: Sequence[str],\n) -> Iterator[_Synset_Relation]:\n    params: list = []\n    constraint = \"\"\n    if \"*\" not in relation_types:\n        constraint = f\"WHERE type IN ({_qs(relation_types)})\"\n        params.extend(relation_types)\n    params.extend(lexicons)\n    params.extend(target_lexicons)\n    params.append(sense_id)\n    query = f\"\"\"\n        WITH\n          rt(rowid, type) AS\n            (SELECT rowid, type FROM relation_types {constraint}),\n          lexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(lexicons)})),\n          tgtlexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(target_lexicons)}))\n        SELECT DISTINCT rel.type, rel.lexicon, rel.metadata,\n                        rel.source_rowid, tgt.id, tgt.pos,\n                        (SELECT ilis.id FROM ilis WHERE ilis.rowid = tgt.ili_rowid),\n                        tgtlex.specifier\n          FROM (SELECT rt.type,\n                       lex.specifier AS lexicon,\n                       srel.metadata AS metadata,\n                       source_rowid,\n                       target_rowid\n                  FROM sense_synset_relations AS srel\n                  JOIN rt ON srel.type_rowid = rt.rowid\n                  JOIN lexicons AS lex ON srel.lexicon_rowid = lex.rowid\n                  JOIN senses AS s ON s.rowid = srel.source_rowid\n                 WHERE s.id = ?\n                   AND srel.lexicon_rowid IN lexrowids\n               ) AS rel\n          JOIN synsets AS tgt\n            ON tgt.rowid = rel.target_rowid\n           AND tgt.lexicon_rowid IN tgtlexrowids\n          JOIN lexicons AS tgtlex\n            ON tgtlex.rowid = tgt.lexicon_rowid\n    \"\"\"\n    rows: Iterator[_Synset_Relation] = connect().execute(query, params)\n    yield from rows\n\n\ndef get_relation_targets(\n    rel_table: str,\n    tgt_table: str,\n    lexicons: Sequence[str],\n    target_lexicons: Sequence[str],\n) -> set[str]:\n    if rel_table not in {\n        \"sense_relations\",\n        \"sense_synset_relations\",\n        \"synset_relations\",\n    }:\n        raise ValueError(f\"invalid relation table: {rel_table}\")\n    if tgt_table not in (\"senses\", \"synsets\"):\n        raise ValueError(f\"invalid target table: {tgt_table}\")\n    params: list = [*lexicons, *target_lexicons]\n    query = f\"\"\"\n        WITH\n          lexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(lexicons)})),\n          tgtlexrowids(rowid) AS\n            (SELECT rowid FROM lexicons WHERE specifier IN ({_vs(target_lexicons)}))\n        SELECT DISTINCT tgt.id\n          FROM {rel_table} AS srel\n          JOIN lexicons AS lex ON srel.lexicon_rowid = lex.rowid\n          JOIN {tgt_table} AS tgt ON tgt.rowid = srel.target_rowid\n         WHERE srel.lexicon_rowid IN lexrowids\n           AND tgt.lexicon_rowid IN tgtlexrowids\n    \"\"\"\n    rows: Iterator[str] = connect().execute(query, params)\n    return {row[0] for row in rows}\n\n\n_SANITIZED_METADATA_TABLES = {\n    # 'ilis': 'ilis',\n    # 'proposed_ilis': 'proposed_ilis',\n    # 'lexicons': 'lexicons',\n    \"entries\": \"entries\",\n    \"senses\": \"senses\",\n    \"synsets\": \"synsets\",\n    # 'sense_relations': 'sense_relations',\n    # 'sense_synset_relations': 'sense_synset_relations',\n    # 'synset_relations': 'synset_relations',\n    # 'sense_examples': 'sense_examples',\n    # 'counts': 'counts',\n    # 'synset_examples': 'synset_examples',\n    # 'definitions': 'definitions',\n}\n\n\ndef get_metadata(id: str, lexicon: str, table: str) -> Metadata:\n    tablename = _SANITIZED_METADATA_TABLES.get(table)\n    if tablename is None:\n        raise Error(f\"'{table}' does not contain metadata\")\n    query = f\"\"\"\n        SELECT tbl.metadata\n          FROM {tablename} AS tbl\n          JOIN lexicons AS lex ON lex.rowid = lexicon_rowid\n         WHERE tbl.id=?\n           AND lex.specifier = ?\n    \"\"\"\n    return cast(\n        \"Metadata\",\n        connect().execute(query, (id, lexicon)).fetchone()[0] or {},\n    )  # TODO: benchmark using a TypeGuard\n\n\ndef get_ili_metadata(id: str) -> Metadata:\n    query = \"SELECT metadata FROM ilis WHERE id = ?\"\n    return cast(\n        \"Metadata\",\n        connect().execute(query, (id,)).fetchone()[0] or {},\n    )\n\n\ndef get_proposed_ili_metadata(synset: str, lexicon: str) -> Metadata:\n    query = \"\"\"\n        SELECT pili.metadata\n          FROM proposed_ilis AS pili\n          JOIN synsets AS ss ON ss.rowid = synset_rowid\n          JOIN lexicons AS lex ON lex.rowid = ss.lexicon_rowid\n         WHERE ss.id = ?\n           AND lex.specifier = ?\n    \"\"\"\n    return cast(\n        \"Metadata\",\n        connect().execute(query, (synset, lexicon)).fetchone()[0] or {},\n    )\n\n\n_SANITIZED_LEXICALIZED_TABLES = {\n    \"senses\": (\"senses\", \"sense_rowid\"),\n    \"synsets\": (\"synsets\", \"synset_rowid\"),\n}\n\n\ndef get_lexicalized(id: str, lexicon: str, table: str) -> bool:\n    conn = connect()\n    if table not in _SANITIZED_LEXICALIZED_TABLES:\n        raise Error(f\"'{table}' does not mark lexicalization\")\n    tablename, column = _SANITIZED_LEXICALIZED_TABLES[table]\n    if not id or not lexicon:\n        return False\n    query = f\"\"\"\n        SELECT NOT EXISTS\n               (SELECT {column}\n                  FROM unlexicalized_{tablename} AS un\n                  JOIN {tablename} AS tbl ON tbl.rowid = un.{column}\n                  JOIN lexicons AS lex ON lex.rowid = tbl.lexicon_rowid\n                 WHERE tbl.id = ?\n                   AND lex.specifier = ?)\n    \"\"\"\n    return bool(conn.execute(query, (id, lexicon)).fetchone()[0])\n\n\ndef get_adjposition(sense_id: str, lexicon: str) -> str | None:\n    conn = connect()\n    query = \"\"\"\n        SELECT adjposition\n          FROM adjpositions\n          JOIN senses AS s ON s.rowid = sense_rowid\n          JOIN lexicons AS lex ON lex.rowid = s.lexicon_rowid\n         WHERE s.id = ?\n           AND lex.specifier = ?\n    \"\"\"\n    row = conn.execute(query, (sense_id, lexicon)).fetchone()\n    if row:\n        return row[0]\n    return None\n\n\ndef get_sense_counts(sense_id: str, lexicons: Sequence[str]) -> list[_Count]:\n    conn = connect()\n    query = f\"\"\"\n        SELECT c.count, lex.specifier, c.metadata\n          FROM counts AS c\n          JOIN senses AS s ON s.rowid = c.sense_rowid\n          JOIN lexicons AS lex ON lex.rowid = c.lexicon_rowid\n         WHERE s.id = ?\n           AND lex.specifier IN ({_qs(lexicons)})\n    \"\"\"\n    rows: list[_Count] = conn.execute(query, (sense_id, *lexicons)).fetchall()\n    return rows\n\n\ndef get_lexfile(synset_id: str, lexicon: str) -> str | None:\n    conn = connect()\n    query = \"\"\"\n        SELECT lf.name\n          FROM lexfiles AS lf\n          JOIN synsets AS ss ON ss.lexfile_rowid = lf.rowid\n          JOIN lexicons AS lex ON lex.rowid = ss.lexicon_rowid\n         WHERE ss.id = ?\n           AND lex.specifier = ?\n    \"\"\"\n    row = conn.execute(query, (synset_id, lexicon)).fetchone()\n    if row is not None and row[0] is not None:\n        return row[0]\n    return None\n\n\ndef get_entry_index(entry_id: str, lexicon: str) -> str | None:\n    conn = connect()\n    query = \"\"\"\n        SELECT idx.lemma\n          FROM entries AS e\n          JOIN lexicons AS lex ON lex.rowid = e.lexicon_rowid\n          JOIN entry_index AS idx ON idx.entry_rowid = e.rowid\n         WHERE e.id = ?\n           AND lex.specifier = ?\n    \"\"\"\n    row = conn.execute(query, (entry_id, lexicon)).fetchone()\n    if row is not None:\n        return row[0]\n    return None\n\n\ndef get_sense_n(sense_id: str, lexicon: str) -> int | None:\n    conn = connect()\n    query = \"\"\"\n        SELECT s.entry_rank\n          FROM senses AS s\n          JOIN lexicons AS lex ON lex.rowid = s.lexicon_rowid\n         WHERE s.id = ?\n           AND lex.specifier = ?\n    \"\"\"\n    row = conn.execute(query, (sense_id, lexicon)).fetchone()\n    if row is not None:\n        return row[0]\n    return None\n\n\ndef _qs(xs: Collection) -> str:\n    return \",\".join(\"?\" * len(xs))\n\n\ndef _vs(xs: Collection) -> str:\n    return \",\".join([\"(?)\"] * len(xs))\n\n\ndef _kws(xs: Collection) -> str:\n    return \",\".join(f\":{x}\" for x in xs)\n\n\ndef _query_forms(\n    forms: Sequence[str],\n    normalized: bool,\n    search_all_forms: bool,\n    indexed: bool = True,\n) -> tuple[list[str], str]:\n    or_norm = \"OR f.normalized_form IN wordforms\" if normalized else \"\"\n    and_rank = \"\" if search_all_forms else \"AND f.rank = 0\"\n    ctes: list[str] = [\n        f\"wordforms(s) AS (VALUES {_vs(forms)})\",\n        f\"\"\"matched_entries(rowid) AS\n          (SELECT f.entry_rowid\n             FROM forms AS f\n            WHERE (f.form IN wordforms {or_norm}) {and_rank})\"\"\",\n    ]\n    subquery = \"matched_entries\"\n    if indexed:\n        subquery = \"\"\"\\\n          (SELECT rowid\n             FROM matched_entries\n            UNION SELECT idx.entry_rowid\n                    FROM matched_entries AS _me\n                    JOIN entry_index AS _idx ON _idx.entry_rowid = _me.rowid\n                    JOIN entry_index AS idx ON idx.lemma = _idx.lemma)\n        \"\"\"\n    return ctes, subquery\n\n\ndef _build_entry_conditions(\n    forms: Sequence[str],\n    pos: str | None,\n    lexicons: Sequence[str],\n    normalized: bool,\n    search_all_forms: bool,\n) -> tuple[str, list[str], list[str], list[str]]:\n    \"\"\"Build CTE, conditions, and parameters for entry-based queries.\n\n    Returns:\n        tuple of (cte, conditions, params)\n    \"\"\"\n    ctes: list[str] = []\n    cteparams: list[str] = []\n    subquery = \"\"\n    conditions: list[str] = []\n    condparams: list[str] = []\n\n    if lexicons:\n        ctes.append(f\"lexspecs(s) AS (VALUES {_vs(lexicons)})\")\n        conditions.append(\"lex.specifier IN lexspecs\")\n        cteparams.extend(lexicons)\n    if forms:\n        ctes_, subquery = _query_forms(forms, normalized, search_all_forms)\n        ctes.extend(ctes_)\n        conditions.append(f\"e.rowid IN {subquery}\")\n        cteparams.extend(forms)\n    if pos:\n        conditions.append(\"e.pos = ?\")\n        condparams.append(pos)\n\n    cte = \"\"\n    if ctes:\n        cte = \"WITH \" + \",\\n               \".join(ctes)\n\n    return cte, cteparams, conditions, condparams\n"
  },
  {
    "path": "wn/_types.py",
    "content": "from collections.abc import Callable, Mapping, Sequence\nfrom pathlib import Path\nfrom typing import Any, TypeAlias\n\n# For the below, use type statement instead of TypeAlias from Python 3.12\n\n# For functions taking a filesystem path as a str or a pathlib.Path\nAnyPath: TypeAlias = str | Path\n\n# LMF versions for comparison\nVersionInfo: TypeAlias = tuple[int, ...]\n\n# Synset and Sense relations map a relation type to one or more ids\nRelationMap: TypeAlias = Mapping[str, Sequence[str]]\n\n# User-facing metadata representation\nMetadata: TypeAlias = dict[str, Any]\n\n# A callable that returns a normalized word form for a given word form\nNormalizeFunction: TypeAlias = Callable[[str], str]\n\n# Lemmatization returns a mapping of parts of speech (or None) to\n# lists of wordforms that are potential lemmas for some query word\nLemmatizeResult: TypeAlias = dict[str | None, set[str]]\n\n# A callable that returns a LemmatizationResult for a given word form\n# and optional part of speech\nLemmatizeFunction: TypeAlias = Callable[[str, str | None], LemmatizeResult]\n"
  },
  {
    "path": "wn/_util.py",
    "content": "\"\"\"Non-public Wn utilities.\"\"\"\n\nimport hashlib\nfrom collections.abc import Hashable, Iterable\nfrom pathlib import Path\nfrom typing import Any, TypeGuard, TypeVar\nfrom unicodedata import combining, normalize\n\nfrom wn._types import VersionInfo\n\n\ndef version_info(version_string: str) -> VersionInfo:\n    return tuple(map(int, version_string.split(\".\")))\n\n\ndef is_url(string: str) -> bool:\n    \"\"\"Return True if *string* appears to be a URL.\"\"\"\n    # TODO: ETags?\n    return any(string.startswith(scheme) for scheme in (\"http://\", \"https://\"))\n\n\ndef is_gzip(path: Path) -> bool:\n    \"\"\"Return True if the file at *path* appears to be gzipped.\"\"\"\n    return _inspect_file_signature(path, b\"\\x1f\\x8b\")\n\n\ndef is_lzma(path: Path) -> bool:\n    \"\"\"Return True if the file at *path* appears to be lzma-compressed.\"\"\"\n    return _inspect_file_signature(path, b\"\\xfd7zXZ\\x00\")\n\n\ndef is_xml(path: Path) -> bool:\n    \"\"\"Return True if the file at *path* appears to be an XML file.\"\"\"\n    return _inspect_file_signature(path, b\"<?xml \")\n\n\ndef is_str_key_dict(obj: Any) -> TypeGuard[dict[str, Any]]:\n    return isinstance(obj, dict) and all(isinstance(key, str) for key in obj)\n\n\ndef _inspect_file_signature(path: Path, signature: bytes) -> bool:\n    if path.is_file():\n        with path.open(\"rb\") as f:\n            return f.read(len(signature)) == signature\n    return False\n\n\ndef short_hash(string: str) -> str:\n    \"\"\"Return a short hash of *string*.\"\"\"\n    b2 = hashlib.blake2b(digest_size=20)\n    b2.update(string.encode(\"utf-8\"))\n    return b2.hexdigest()\n\n\nT = TypeVar(\"T\")\n\n\ndef flatten(iterable: Iterable[Iterable[T]]) -> list[T]:\n    return [x for xs in iterable for x in xs]\n\n\nH = TypeVar(\"H\", bound=Hashable)\n\n\ndef unique_list(items: Iterable[H]) -> list[H]:\n    # use a dictionary as an order-preserving set\n    targets = dict.fromkeys(items, True)\n    return list(targets)\n\n\ndef normalize_form(s: str) -> str:\n    return \"\".join(c for c in normalize(\"NFKD\", s.casefold()) if not combining(c))\n\n\ndef format_lexicon_specifier(id: str, version: str) -> str:\n    return f\"{id}:{version}\"\n\n\ndef split_lexicon_specifier(lexicon: str) -> tuple[str, str]:\n    id, _, ver = lexicon.partition(\":\")\n    return id, ver\n"
  },
  {
    "path": "wn/_wordnet.py",
    "content": "import textwrap\nimport warnings\nfrom collections.abc import Callable, Iterator, Sequence\nfrom typing import Literal, TypeVar, overload\n\nfrom wn._core import Form, Pronunciation, Sense, Synset, Tag, Word\nfrom wn._exceptions import Error, WnWarning\nfrom wn._lexicon import Lexicon, LexiconConfiguration\nfrom wn._queries import (\n    find_entries,\n    find_lemmas,\n    find_senses,\n    find_synsets,\n    get_lexicon_dependencies,\n    resolve_lexicon_specifiers,\n)\nfrom wn._types import (\n    LemmatizeFunction,\n    NormalizeFunction,\n)\nfrom wn._util import normalize_form\n\n# Useful for factory functions of Word, Sense, or Synset\nC = TypeVar(\"C\", Word, Sense, Synset)\n\n\nclass Wordnet:\n    \"\"\"Class for interacting with wordnet data.\n\n    A wordnet object acts essentially as a filter by first selecting\n    matching lexicons and then searching only within those lexicons\n    for later queries. Lexicons can be selected on instantiation with\n    the *lexicon* or *lang* parameters. The *lexicon* parameter is a\n    string with a space-separated list of :ref:`lexicon specifiers\n    <lexicon-specifiers>`. The *lang* argument is a `BCP 47`_ language\n    code that selects any lexicon matching the given language code. As\n    the *lexicon* argument more precisely selects lexicons, it is the\n    recommended method of instantiation. Omitting both *lexicon* and\n    *lang* arguments triggers :ref:`default-mode <default-mode>`\n    queries.\n\n    Some wordnets were created by translating the words from a larger\n    wordnet, namely the Princeton WordNet, and then relying on the\n    larger wordnet for structural relations. An *expand* argument is a\n    second space-separated list of lexicon specifiers which are used\n    for traversing relations, but not as the results of\n    queries. Setting *expand* to an empty string (:python:`expand=''`)\n    disables expand lexicons. For more information, see\n    :ref:`cross-lingual-relation-traversal`.\n\n    The *normalizer* argument takes a callable that normalizes word\n    forms in order to expand the search. The default function\n    downcases the word and removes diacritics via NFKD_ normalization\n    so that, for example, searching for *san josé* in the English\n    WordNet will find the entry for *San Jose*. Setting *normalizer*\n    to :python:`None` disables normalization and forces exact-match\n    searching. For more information, see :ref:`normalization`.\n\n    The *lemmatizer* argument may be :python:`None`, which is the\n    default and disables lemmatizer-based query expansion, or a\n    callable that takes a word form and optional part of speech and\n    returns base forms of the original word. To support lemmatizers\n    that use the wordnet for instantiation, such as :mod:`wn.morphy`,\n    the lemmatizer may be assigned to the :attr:`lemmatizer` attribute\n    after creation. For more information, see :ref:`lemmatization`.\n\n    If the *search_all_forms* argument is :python:`True` (the\n    default), searches of word forms consider all forms in the\n    lexicon; if :python:`False`, only lemmas are searched. Non-lemma\n    forms may include, depending on the lexicon, morphological\n    exceptions, alternate scripts or spellings, etc.\n\n    .. _BCP 47: https://en.wikipedia.org/wiki/IETF_language_tag\n    .. _NFKD: https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms\n\n    Attributes:\n\n        lemmatizer: A lemmatization function or :python:`None`.\n\n    \"\"\"\n\n    __slots__ = (\n        \"_default_mode\",\n        \"_lexconf\",\n        \"_normalizer\",\n        \"_search_all_forms\",\n        \"lemmatizer\",\n    )\n    __module__ = \"wn\"\n\n    def __init__(\n        self,\n        lexicon: str | None = None,\n        *,\n        lang: str | None = None,\n        expand: str | None = None,\n        normalizer: NormalizeFunction | None = normalize_form,\n        lemmatizer: LemmatizeFunction | None = None,\n        search_all_forms: bool = True,\n    ):\n        if lexicon or lang:\n            lexicons = tuple(resolve_lexicon_specifiers(lexicon or \"*\", lang=lang))\n        else:\n            lexicons = ()\n        if lang and len(lexicons) > 1:\n            warnings.warn(\n                f\"multiple lexicons match {lang=}: {lexicons!r}; \"\n                \"use the lexicon parameter instead to avoid this warning\",\n                WnWarning,\n                stacklevel=2,\n            )\n\n        # default mode means any lexicon is searched or expanded upon,\n        # but relation traversals only target the source's lexicon\n        default_mode = not lexicon and not lang\n        expand = _resolve_lexicon_dependencies(expand, lexicons, default_mode)\n        expands = tuple(resolve_lexicon_specifiers(expand)) if expand else ()\n\n        self._lexconf = LexiconConfiguration(\n            lexicons=lexicons,\n            expands=expands,\n            default_mode=default_mode,\n        )\n\n        self._normalizer = normalizer\n        self.lemmatizer = lemmatizer\n        self._search_all_forms = search_all_forms\n\n    def lexicons(self) -> list[Lexicon]:\n        \"\"\"Return the list of lexicons covered by this wordnet.\"\"\"\n        return list(map(Lexicon.from_specifier, self._lexconf.lexicons))\n\n    def expanded_lexicons(self) -> list[Lexicon]:\n        \"\"\"Return the list of expand lexicons for this wordnet.\"\"\"\n        return list(map(Lexicon.from_specifier, self._lexconf.expands))\n\n    def word(self, id: str) -> Word:\n        \"\"\"Return the first word in this wordnet with identifier *id*.\"\"\"\n        iterable = find_entries(id=id, lexicons=self._lexconf.lexicons)\n        try:\n            id, pos, lex = next(iterable)\n            return Word(id, pos, _lexicon=lex, _lexconf=self._lexconf)\n        except StopIteration:\n            raise Error(f\"no such lexical entry: {id}\") from None\n\n    def words(self, form: str | None = None, pos: str | None = None) -> list[Word]:\n        \"\"\"Return the list of matching words in this wordnet.\n\n        Without any arguments, this function returns all words in the\n        wordnet's selected lexicons. A *form* argument restricts the\n        words to those matching the given word form, and *pos*\n        restricts words by their part of speech.\n\n        \"\"\"\n        return _find_helper(self, Word, find_entries, form, pos)\n\n    @overload\n    def lemmas(\n        self,\n        form: str | None = None,\n        pos: str | None = None,\n        *,\n        data: Literal[False] = False,\n    ) -> list[str]: ...\n    @overload\n    def lemmas(\n        self,\n        form: str | None = None,\n        pos: str | None = None,\n        *,\n        data: Literal[True] = True,\n    ) -> list[Form]: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def lemmas(\n        self, form: str | None = None, pos: str | None = None, *, data: bool\n    ) -> list[str] | list[Form]: ...\n\n    def lemmas(\n        self, form: str | None = None, pos: str | None = None, *, data: bool = False\n    ) -> list[str] | list[Form]:\n        \"\"\"Return the list of lemmas for matching words in this wordnet.\n\n        Without any arguments, this function returns all distinct lemma\n        forms in the wordnet's selected lexicons. A *form* argument\n        restricts the words to those matching the given word form, and\n        *pos* restricts words by their part of speech.\n\n        If the *data* argument is :python:`False` (the default), only\n        distinct lemma forms are returned as :class:`str` types. If it\n        is :python:`True`, :class:`wn.Form` objects are returned for\n        all matching entries, which may include multiple Form objects\n        with the same lemma string.\n\n        Example:\n\n            >>> wn.Wordnet().lemmas(\"wolves\")\n            ['wolf']\n            >>> wn.Wordnet().lemmas(\"wolves\", data=True)\n            [Form(value='wolf')]\n\n        \"\"\"\n        form_data = _find_lemmas(self, form, pos, load_details=data)\n\n        if data:\n            return [\n                Form(\n                    form,\n                    id=id,\n                    script=script,\n                    _lexicon=lex,\n                    _pronunciations=tuple(Pronunciation(*p) for p in prons),\n                    _tags=tuple(Tag(*t) for t in tags),\n                )\n                for form, id, script, lex, prons, tags in form_data\n            ]\n\n        # When data=False, extract and deduplicate strings\n        return list(dict.fromkeys(fd[0] for fd in form_data))\n\n    def synset(self, id: str) -> Synset:\n        \"\"\"Return the first synset in this wordnet with identifier *id*.\"\"\"\n        iterable = find_synsets(id=id, lexicons=self._lexconf.lexicons)\n        try:\n            id, pos, ili, lex = next(iterable)\n            return Synset(id, pos, ili=ili, _lexicon=lex, _lexconf=self._lexconf)\n        except StopIteration:\n            raise Error(f\"no such synset: {id}\") from None\n\n    def synsets(\n        self, form: str | None = None, pos: str | None = None, ili: str | None = None\n    ) -> list[Synset]:\n        \"\"\"Return the list of matching synsets in this wordnet.\n\n        Without any arguments, this function returns all synsets in\n        the wordnet's selected lexicons. A *form* argument restricts\n        synsets to those whose member words match the given word\n        form. A *pos* argument restricts synsets to those with the\n        given part of speech. An *ili* argument restricts synsets to\n        those with the given interlingual index; generally this should\n        select a unique synset within a single lexicon.\n\n        \"\"\"\n        return _find_helper(self, Synset, find_synsets, form, pos, ili=ili)\n\n    def sense(self, id: str) -> Sense:\n        \"\"\"Return the first sense in this wordnet with identifier *id*.\"\"\"\n        iterable = find_senses(id=id, lexicons=self._lexconf.lexicons)\n        try:\n            id, eid, ssid, lex = next(iterable)\n            return Sense(id, eid, ssid, _lexicon=lex, _lexconf=self._lexconf)\n        except StopIteration:\n            raise Error(f\"no such sense: {id}\") from None\n\n    def senses(self, form: str | None = None, pos: str | None = None) -> list[Sense]:\n        \"\"\"Return the list of matching senses in this wordnet.\n\n        Without any arguments, this function returns all senses in the\n        wordnet's selected lexicons. A *form* argument restricts the\n        senses to those whose word matches the given word form, and\n        *pos* restricts senses by their word's part of speech.\n\n        \"\"\"\n        return _find_helper(self, Sense, find_senses, form, pos)\n\n    def describe(self) -> str:\n        \"\"\"Return a formatted string describing the lexicons in this wordnet.\n\n        Example:\n\n            >>> oewn = wn.Wordnet(\"oewn:2021\")\n            >>> print(oewn.describe())\n            Primary lexicons:\n              oewn:2021\n                Label  : Open English WordNet\n                URL    : https://github.com/globalwordnet/english-wordnet\n                License: https://creativecommons.org/licenses/by/4.0/\n                Words  : 163161 (a: 8386, n: 123456, r: 4481, s: 15231, v: 11607)\n                Senses : 211865\n                Synsets: 120039 (a: 7494, n: 84349, r: 3623, s: 10727, v: 13846)\n                ILIs   : 120039\n\n        \"\"\"\n        substrings = [\"Primary lexicons:\"]\n        for lex in self.lexicons():\n            substrings.append(textwrap.indent(lex.describe(), \"  \"))\n        if self._lexconf.expands:\n            substrings.append(\"Expand lexicons:\")\n            for lex in self.expanded_lexicons():\n                substrings.append(textwrap.indent(lex.describe(full=False), \"  \"))\n        return \"\\n\".join(substrings)\n\n\ndef _resolve_lexicon_dependencies(\n    expand: str | None,\n    lexicons: Sequence[str],\n    default_mode: bool,\n) -> str:\n    if expand is not None:\n        return expand.strip()\n    if default_mode:\n        return \"*\"\n    # find dependencies specified by the lexicons\n    deps = [\n        (depspec, added)\n        for lexspec in lexicons\n        for depspec, _, added in get_lexicon_dependencies(lexspec)\n    ]\n    missing = \" \".join(spec for spec, added in deps if not added)\n    if missing:\n        warnings.warn(\n            f\"lexicon dependencies not available: {missing}\",\n            WnWarning,\n            stacklevel=3,\n        )\n    return \" \".join(spec for spec, added in deps if added)\n\n\ndef _find_lemmas(\n    w: Wordnet, form: str | None, pos: str | None, load_details: bool = False\n) -> Iterator[tuple]:\n    \"\"\"Return an iterator of matching lemma form data.\n\n    This works like _find_helper but returns raw form tuples instead of\n    Word/Sense/Synset objects. The load_details parameter controls whether\n    pronunciations and tags are loaded from the database.\n    \"\"\"\n    kwargs: dict = {\n        \"lexicons\": w._lexconf.lexicons,\n        \"search_all_forms\": w._search_all_forms,\n        \"load_details\": load_details,\n    }\n\n    # easy case is when there is no form\n    if form is None:\n        yield from find_lemmas(pos=pos, **kwargs)\n        return\n\n    # if there's a form, we may need to lemmatize and normalize\n    normalize = w._normalizer\n    kwargs[\"normalized\"] = bool(normalize)\n\n    lemmatize = w.lemmatizer\n    forms = lemmatize(form, pos) if lemmatize else {}\n    # if no lemmatizer or word not covered by lemmatizer, back off to\n    # the original form and pos\n    if not forms:\n        forms = {pos: {form}}\n\n    yield from _query_with_forms(find_lemmas, forms, normalize, kwargs)\n\n\ndef _query_with_forms(\n    query_func: Callable,\n    forms: dict[str | None, set[str]],\n    normalize: NormalizeFunction | None,\n    kwargs: dict,\n) -> list[tuple]:\n    \"\"\"Query database with forms, falling back to normalized forms if needed.\n\n    Queries the database for each pos/forms combination. If a normalizer\n    is available and the original forms return no results, queries again\n    with normalized forms.\n    \"\"\"\n    results = []\n    for _pos, _forms in forms.items():\n        results.extend(query_func(forms=_forms, pos=_pos, **kwargs))\n\n    # Only try normalized forms if we got no results with original forms\n    if not results and normalize:\n        for _pos, _forms in forms.items():\n            normalized_forms = [normalize(f) for f in _forms]\n            results.extend(query_func(forms=normalized_forms, pos=_pos, **kwargs))\n\n    return results\n\n\ndef _find_helper(\n    w: Wordnet,\n    cls: type[C],\n    query_func: Callable,\n    form: str | None,\n    pos: str | None,\n    ili: str | None = None,\n) -> list[C]:\n    \"\"\"Return the list of matching wordnet entities.\n\n    If the wordnet has a normalizer and the search includes a word\n    form, the original word form is searched against both the\n    original and normalized columns in the database. Then, if no\n    results are found, the search is repeated with the normalized\n    form. If the wordnet does not have a normalizer, only exact\n    string matches are used.\n\n    \"\"\"\n    kwargs: dict = {\n        \"lexicons\": w._lexconf.lexicons,\n        \"search_all_forms\": w._search_all_forms,\n    }\n    if ili:\n        kwargs[\"ili\"] = ili\n\n    # easy case is when there is no form\n    # (for type checking, it is hard to guess the correct number of\n    #  fields in data, so ignore here and further down)\n    if form is None:\n        return [\n            cls(*data, _lexconf=w._lexconf)  # type: ignore\n            for data in query_func(pos=pos, **kwargs)\n        ]\n\n    # if there's a form, we may need to lemmatize and normalize\n    normalize = w._normalizer\n    kwargs[\"normalized\"] = bool(normalize)\n\n    lemmatize = w.lemmatizer\n    forms = lemmatize(form, pos) if lemmatize else {}\n    # if no lemmatizer or word not covered by lemmatizer, back off to\n    # the original form and pos\n    if not forms:\n        forms = {pos: {form}}\n\n    results_data = _query_with_forms(query_func, forms, normalize, kwargs)\n\n    # we want unique results here, but a set can make the order\n    # erratic, so filter manually\n    results = [\n        cls(*data, _lexconf=w._lexconf)  # type: ignore\n        for data in results_data\n    ]\n    unique_results: list[C] = []\n    seen: set[C] = set()\n    for result in results:\n        if result not in seen:\n            unique_results.append(result)\n            seen.add(result)\n    return unique_results\n"
  },
  {
    "path": "wn/compat/__init__.py",
    "content": ""
  },
  {
    "path": "wn/compat/sensekey.py",
    "content": "\"\"\"Functions Related to Sense Keys\n\nSense keys are identifiers of senses that (mostly) persist across\nwordnet versions. They are only used by the English wordnets. For the\nOMW lexicons derived from the Princeton WordNet and the EWN 2019/2020\nlexicons, the sense key is encoded in the ``identifier`` metadata of a\nSense:\n\n>>> import wn\n>>> en = wn.Wordnet(\"omw-en:1.4\")\n>>> sense = en.sense(\"omw-en-carrousel-02966372-n\")\n>>> sense.metadata()\n{'identifier': 'carrousel%1:06:01::'}\n\nFor OEWN 2021+ lexicons, the sense key is encoded in the sense ID, but\nsome characters are escaped or replaced to ensure it is a valid XML\nID.\n\n>>> oewn = wn.Wordnet(\"oewn:2024\")\n>>> sense = oewn.sense(\"oewn-carousel__1.06.01..\")\n>>> sense.id\n'oewn-carousel__1.06.01..'\n\nThis module has four functions:\n\n1. :func:`escape` transforms a sense key into a form that is valid for\n   XML IDs. The *flavor* keyword argument specifies the escaping\n   mechanism and it defaults to :python:`\"oewn-v2\"`.\n\n2. :func:`unescape` transforms an escaped sense key back into the\n   original form. The *flavor* keyword is the same as with\n   :func:`escape`.\n\n3. :func:`sense_key_getter` creates a function for retrieving the\n   sense key for a given :class:`wn.Sense` object. Depending on the\n   lexicon, it will retrieve the sense key from metadata or it will\n   unescape the sense ID.\n\n4. :func:`sense_getter` creates a function for retrieving a\n   :class:`wn.Sense` object given a sense key. Depending on the\n   lexicon, it will build and use a mapping of sense key metadata to\n   :class:`wn.Sense` objects, or it will escape the sense key and use\n   the escaped form as the ``id`` argument for\n   :meth:`wn.Wordnet.sense`.\n\n.. seealso::\n\n   The documentation from the Princeton WordNet:\n   https://wordnet.princeton.edu/documentation/senseidx5wn\n\n\"\"\"\n\nfrom collections.abc import Callable\nfrom typing import TypeAlias\n\nimport wn\nfrom wn._util import split_lexicon_specifier\n\nSensekeyGetter: TypeAlias = Callable[[wn.Sense], str | None]\nSenseGetter: TypeAlias = Callable[[str], wn.Sense | None]\n\nMETADATA_LEXICONS = {\n    # OMW 1.4\n    \"omw-en:1.4\",\n    \"omw-en31:1.4\",\n    # OMW 2.0\n    \"omw-en15:2.0\",\n    \"omw-en16:2.0\",\n    \"omw-en17:2.0\",\n    \"omw-en171:2.0\",\n    \"omw-en20:2.0\",\n    \"omw-en21:2.0\",\n    \"omw-en:2.0\",\n    \"omw-en31:2.0\",\n    # EWN (OEWN) 2019, 2020\n    \"ewn:2019\",\n    \"ewn:2020\",\n}\n\nSENSE_ID_LEXICONS = {  # specifier:flavor\n    \"oewn:2021\": \"oewn\",\n    \"oewn:2022\": \"oewn\",\n    \"oewn:2023\": \"oewn\",\n    \"oewn:2024\": \"oewn\",\n    \"oewn:2025\": \"oewn-v2\",\n    \"oewn:2025+\": \"oewn-v2\",\n}\n\nOEWN_LEMMA_UNESCAPE_SEQUENCES = [\n    (\"-ap-\", \"'\"),\n    (\"-ex-\", \"!\"),\n    (\"-cm-\", \",\"),\n    (\"-cn-\", \":\"),\n    (\"-pl-\", \"+\"),\n    (\"-sl-\", \"/\"),\n]\n\nOEWN_V2_LEMMA_UNESCAPE_SEQUENCES = [\n    (\"-apos-\", \"'\"),\n    (\"-colon-\", \":\"),\n    (\"-excl-\", \"!\"),\n    (\"-num-\", \"#\"),\n    (\"-dollar-\", \"$\"),\n    (\"-percnt-\", \"%\"),\n    (\"-amp-\", \"&\"),\n    (\"-lpar-\", \"(\"),\n    (\"-rpar-\", \")\"),\n    (\"-ast-\", \"*\"),\n    (\"-plus-\", \"+\"),\n    (\"-comma-\", \",\"),\n    (\"-sol-\", \"/\"),\n    (\"-lbrace-\", \"{\"),\n    (\"-vert-\", \"|\"),\n    (\"-rbrace-\", \"}\"),\n    (\"-tilde-\", \"~\"),\n    (\"-cent-\", \"¢\"),\n    (\"-pound-\", \"£\"),\n    (\"-sect-\", \"§\"),\n    (\"-copy-\", \"©\"),\n    (\"-reg-\", \"®\"),\n    (\"-deg-\", \"°\"),\n    (\"-acute-\", \"´\"),  # noqa: RUF001\n    (\"-para-\", \"¶\"),\n    (\"-ordm-\", \"º\"),\n    (\"--\", \"-\"),\n]\n\n\ndef unescape(s: str, /, flavor: str = \"oewn-v2\") -> str:\n    \"\"\"Return the original form of an escaped sense key.\n\n    The *flavor* argument specifies how the unescaping will be done.\n    Its default value is :python:`\"oewn-v2\"`, which unescapes like the\n    Open English Wordnet 2025 editions, including separate rules for the\n    left and right side of the ``__`` delimiter. The other possible\n    value is ``\"oewn\"``, which unescapes like the Open English Wordnet\n    2024 and prior editions.\n\n    >>> from wn.compat import sensekey\n    >>> sensekey.unescape(\"ceramic__3.01.00..\")\n    'ceramic%3:01:00::'\n\n    Note that this function does not remove any lexicon ID prefixes on\n    sense IDs, so that may need to be done manually:\n\n    >>> sensekey.unescape(\"oewn-ceramic__3.01.00..\")\n    'oewn-ceramic%3:01:00::'\n    >>> sensekey.unescape(\"oewn-ceramic__3.01.00..\".removeprefix(\"oewn-\"))\n    'ceramic%3:01:00::'\n\n    \"\"\"\n    match flavor:\n        case \"oewn\":\n            return _unescape_oewn(s, OEWN_LEMMA_UNESCAPE_SEQUENCES)\n        case \"oewn-v2\":\n            return _unescape_oewn(s, OEWN_V2_LEMMA_UNESCAPE_SEQUENCES)\n        case _:\n            raise ValueError(f\"unsupported flavor: {flavor}\")\n\n\ndef _unescape_oewn(s: str, escape_sequences: list[tuple[str, str]]) -> str:\n    lemma, _, rest = s.partition(\"__\")\n    for esc, char in escape_sequences:\n        lemma = lemma.replace(esc, char)\n    rest = rest.replace(\".\", \":\").replace(\"-sp-\", \"_\")\n    if rest:\n        return f\"{lemma}%{rest}\"\n    else:\n        return lemma\n\n\ndef escape(sense_key: str, /, flavor: str = \"oewn-v2\") -> str:\n    \"\"\"Return an escaped sense key that is valid for XML IDs.\n\n    The *flavor* argument specifies how the escaping will be done. Its\n    default value is :python:`\"oewn-v2\"`, which escapes like the Open\n    English Wordnet 2025 editions, including separate rules for the left\n    and right side of the ``%`` delimiter. The other possible\n    value is ``\"oewn\"``, which escapes like the Open English Wordnet\n    2024 and prior editions.\n\n    >>> from wn.compat import sensekey\n    >>> sensekey.escape(\"ceramic%3:01:00::\")\n    'ceramic__3.01.00..'\n\n    \"\"\"\n    match flavor:\n        case \"oewn\":\n            return _escape_oewn(sense_key, OEWN_LEMMA_UNESCAPE_SEQUENCES)\n        case \"oewn-v2\":\n            return _escape_oewn(sense_key, OEWN_V2_LEMMA_UNESCAPE_SEQUENCES)\n        case _:\n            raise ValueError(f\"unsupported flavor: {flavor}\")\n\n\ndef _escape_oewn(sense_key: str, escape_sequences: list[tuple[str, str]]) -> str:\n    lemma, _, rest = sense_key.partition(\"%\")\n    for esc, char in reversed(escape_sequences):\n        lemma = lemma.replace(char, esc)\n    rest = rest.replace(\":\", \".\").replace(\"_\", \"-sp-\")\n    if rest:\n        return f\"{lemma}__{rest}\"\n    else:\n        return lemma\n\n\ndef sense_key_getter(lexicon: str) -> SensekeyGetter:\n    \"\"\"Return a function that gets sense keys from senses.\n\n    The *lexicon* argument determines how the function will retrieve\n    the sense key; i.e., whether it is from the ``identifier``\n    metadata or unescaping the sense ID. For any unsupported lexicon,\n    an error is raised.\n\n    The function that is returned accepts one argument, a\n    :class:`wn.Sense` (ideally from the same lexicon specified in the\n    *lexicon* argument), and returns a :class:`str` if the sense key\n    exists in the lexicon or :data:`None` otherwise.\n\n    >>> import wn\n    >>> from wn.compat import sensekey\n    >>> oewn = wn.Wordnet(\"oewn:2024\")\n    >>> get_sense_key = sensekey.sense_key_getter(\"oewn:2024\")\n    >>> get_sense_key(oewn.senses(\"alabaster\")[0])\n    'alabaster%3:01:00::'\n\n    When unescaping a sense ID, if the ID starts with its lexicon's ID\n    and a hyphen (e.g., `\"oewn-\"`), it is assumed to be a conventional\n    ID prefix and is removed prior to unescaping.\n\n    \"\"\"\n    if lexicon in METADATA_LEXICONS:\n\n        def getter(sense: wn.Sense) -> str | None:\n            return sense.metadata().get(\"identifier\")\n\n    elif lexicon in SENSE_ID_LEXICONS:\n        flavor = SENSE_ID_LEXICONS[lexicon]\n        lexid, _ = split_lexicon_specifier(lexicon)\n        prefix = f\"{lexid}-\"\n\n        def getter(sense: wn.Sense) -> str | None:\n            sense_key = sense.id.removeprefix(prefix)\n            # check if sense id is likely an escaped sense key\n            if \"__\" in sense_key:\n                return unescape(sense_key, flavor=flavor)\n            return None\n\n    else:\n        raise wn.Error(f\"no sense key getter is defined for {lexicon}\")\n\n    return getter\n\n\ndef sense_getter(lexicon: str, wordnet: wn.Wordnet | None = None) -> SenseGetter:\n    \"\"\"Return a function that gets the sense for a sense key.\n\n    The *lexicon* argument determines how the function will retrieve\n    the sense; i.e., whether a mapping between a sense's\n    ``identifier`` metadata and the sense will be created and used or\n    the escaped sense key is used as the sense ID. For any unsupported\n    lexicon, an error is raised.\n\n    The optional *wordnet* object is used as the source of the\n    returned :class:`wn.Sense` objects. If none is provided, a new\n    :class:`wn.Wordnet` object is created using the *lexicon*\n    argument.\n\n    The function that is returned accepts one argument, a :class:`str`\n    of the sense key, and returns a :class:`wn.Sense` if the sense key\n    exists in the lexicon or :data:`None` otherwise.\n\n    >>> import wn\n    >>> from wn.compat import sensekey\n    >>> get_sense = sensekey.sense_getter(\"oewn:2024\")\n    >>> get_sense(\"alabaster%3:01:00::\")\n    Sense('oewn-alabaster__3.01.00..')\n\n    .. warning::\n\n       The mapping built for the ``omw-en*`` or ``ewn`` lexicons\n       requires significant memory---around 100MiB---to use. The\n       ``oewn`` lexicons do not require such a mapping and the memory\n       usage is negligible.\n\n    \"\"\"\n    if wordnet is None:\n        wordnet = wn.Wordnet(lexicon)\n\n    if lexicon in METADATA_LEXICONS:\n        get_sense_key = sense_key_getter(lexicon)\n        sense_key_map = {get_sense_key(s): s.id for s in wordnet.senses()}\n        if None in sense_key_map:\n            sense_key_map.pop(None)  # senses without sense keys\n\n        def getter(sense_key: str) -> wn.Sense | None:\n            if sense_id := sense_key_map.get(sense_key):\n                return wordnet.sense(sense_id)\n            return None\n\n    elif lexicon in SENSE_ID_LEXICONS:\n        flavor = SENSE_ID_LEXICONS[lexicon]\n        lexid, _ = split_lexicon_specifier(lexicon)\n\n        def getter(sense_key: str) -> wn.Sense | None:\n            sense_id = f\"{lexid}-{escape(sense_key, flavor=flavor)}\"\n            try:\n                return wordnet.sense(sense_id)\n            except wn.Error:\n                return None\n\n    else:\n        raise wn.Error(f\"no sense getter is defined for {lexicon}\")\n\n    return getter\n"
  },
  {
    "path": "wn/constants.py",
    "content": "\"\"\"\nConstants and literals used in wordnets.\n\"\"\"\n\nSENSE_RELATIONS = frozenset(\n    [\n        \"antonym\",\n        \"also\",\n        \"participle\",\n        \"pertainym\",\n        \"derivation\",\n        \"domain_topic\",\n        \"has_domain_topic\",\n        \"domain_region\",\n        \"has_domain_region\",\n        \"exemplifies\",\n        \"is_exemplified_by\",\n        \"similar\",\n        \"other\",\n        \"feminine\",\n        \"has_feminine\",\n        \"masculine\",\n        \"has_masculine\",\n        \"young\",\n        \"has_young\",\n        \"diminutive\",\n        \"has_diminutive\",\n        \"augmentative\",\n        \"has_augmentative\",\n        \"anto_gradable\",\n        \"anto_simple\",\n        \"anto_converse\",\n        \"simple_aspect_ip\",\n        \"secondary_aspect_ip\",\n        \"simple_aspect_pi\",\n        \"secondary_aspect_pi\",\n        \"metaphor\",\n        \"has_metaphor\",\n        \"metonym\",\n        \"has_metonym\",\n        \"agent\",\n        \"body_part\",\n        \"by_means_of\",\n        \"destination\",\n        \"event\",\n        \"instrument\",\n        \"location\",\n        \"material\",\n        \"property\",\n        \"result\",\n        \"state\",\n        \"undergoer\",\n        \"uses\",\n        \"vehicle\",\n    ]\n)\n\nSENSE_SYNSET_RELATIONS = frozenset(\n    [\n        \"other\",\n        \"domain_topic\",\n        \"domain_region\",\n        \"exemplifies\",\n    ]\n)\n\nSYNSET_RELATIONS = frozenset(\n    [\n        \"agent\",\n        \"also\",\n        \"attribute\",\n        \"be_in_state\",\n        \"causes\",\n        \"classified_by\",\n        \"classifies\",\n        \"co_agent_instrument\",\n        \"co_agent_patient\",\n        \"co_agent_result\",\n        \"co_instrument_agent\",\n        \"co_instrument_patient\",\n        \"co_instrument_result\",\n        \"co_patient_agent\",\n        \"co_patient_instrument\",\n        \"co_result_agent\",\n        \"co_result_instrument\",\n        \"co_role\",\n        \"direction\",\n        \"domain_region\",\n        \"domain_topic\",\n        \"exemplifies\",\n        \"entails\",\n        \"eq_synonym\",\n        \"has_domain_region\",\n        \"has_domain_topic\",\n        \"is_exemplified_by\",\n        \"holo_location\",\n        \"holo_member\",\n        \"holo_part\",\n        \"holo_portion\",\n        \"holo_substance\",\n        \"holonym\",\n        \"hypernym\",\n        \"hyponym\",\n        \"in_manner\",\n        \"instance_hypernym\",\n        \"instance_hyponym\",\n        \"instrument\",\n        \"involved\",\n        \"involved_agent\",\n        \"involved_direction\",\n        \"involved_instrument\",\n        \"involved_location\",\n        \"involved_patient\",\n        \"involved_result\",\n        \"involved_source_direction\",\n        \"involved_target_direction\",\n        \"is_caused_by\",\n        \"is_entailed_by\",\n        \"location\",\n        \"manner_of\",\n        \"mero_location\",\n        \"mero_member\",\n        \"mero_part\",\n        \"mero_portion\",\n        \"mero_substance\",\n        \"meronym\",\n        \"similar\",\n        \"other\",\n        \"patient\",\n        \"restricted_by\",\n        \"restricts\",\n        \"result\",\n        \"role\",\n        \"source_direction\",\n        \"state_of\",\n        \"target_direction\",\n        \"subevent\",\n        \"is_subevent_of\",\n        \"antonym\",\n        \"feminine\",\n        \"has_feminine\",\n        \"masculine\",\n        \"has_masculine\",\n        \"young\",\n        \"has_young\",\n        \"diminutive\",\n        \"has_diminutive\",\n        \"augmentative\",\n        \"has_augmentative\",\n        \"anto_gradable\",\n        \"anto_simple\",\n        \"anto_converse\",\n        \"ir_synonym\",\n    ]\n)\n\n\nREVERSE_RELATIONS = {\n    \"hypernym\": \"hyponym\",\n    \"hyponym\": \"hypernym\",\n    \"instance_hypernym\": \"instance_hyponym\",\n    \"instance_hyponym\": \"instance_hypernym\",\n    \"antonym\": \"antonym\",\n    \"eq_synonym\": \"eq_synonym\",\n    \"similar\": \"similar\",\n    \"meronym\": \"holonym\",\n    \"holonym\": \"meronym\",\n    \"mero_location\": \"holo_location\",\n    \"holo_location\": \"mero_location\",\n    \"mero_member\": \"holo_member\",\n    \"holo_member\": \"mero_member\",\n    \"mero_part\": \"holo_part\",\n    \"holo_part\": \"mero_part\",\n    \"mero_portion\": \"holo_portion\",\n    \"holo_portion\": \"mero_portion\",\n    \"mero_substance\": \"holo_substance\",\n    \"holo_substance\": \"mero_substance\",\n    # 'also': '',\n    \"state_of\": \"be_in_state\",\n    \"be_in_state\": \"state_of\",\n    \"causes\": \"is_caused_by\",\n    \"is_caused_by\": \"causes\",\n    \"subevent\": \"is_subevent_of\",\n    \"is_subevent_of\": \"subevent\",\n    \"manner_of\": \"in_manner\",\n    \"in_manner\": \"manner_of\",\n    \"attribute\": \"attribute\",\n    \"restricts\": \"restricted_by\",\n    \"restricted_by\": \"restricts\",\n    \"classifies\": \"classified_by\",\n    \"classified_by\": \"classifies\",\n    \"entails\": \"is_entailed_by\",\n    \"is_entailed_by\": \"entails\",\n    \"domain_topic\": \"has_domain_topic\",\n    \"has_domain_topic\": \"domain_topic\",\n    \"domain_region\": \"has_domain_region\",\n    \"has_domain_region\": \"domain_region\",\n    \"exemplifies\": \"is_exemplified_by\",\n    \"is_exemplified_by\": \"exemplifies\",\n    \"role\": \"involved\",\n    \"involved\": \"role\",\n    \"agent\": \"involved_agent\",\n    \"involved_agent\": \"agent\",\n    \"patient\": \"involved_patient\",\n    \"involved_patient\": \"patient\",\n    \"result\": \"involved_result\",\n    \"involved_result\": \"result\",\n    \"instrument\": \"involved_instrument\",\n    \"involved_instrument\": \"instrument\",\n    \"location\": \"involved_location\",\n    \"involved_location\": \"location\",\n    \"direction\": \"involved_direction\",\n    \"involved_direction\": \"direction\",\n    \"target_direction\": \"involved_target_direction\",\n    \"involved_target_direction\": \"target_direction\",\n    \"source_direction\": \"involved_source_direction\",\n    \"involved_source_direction\": \"source_direction\",\n    \"co_role\": \"co_role\",\n    \"co_agent_patient\": \"co_patient_agent\",\n    \"co_patient_agent\": \"co_agent_patient\",\n    \"co_agent_instrument\": \"co_instrument_agent\",\n    \"co_instrument_agent\": \"co_agent_instrument\",\n    \"co_agent_result\": \"co_result_agent\",\n    \"co_result_agent\": \"co_agent_result\",\n    \"co_patient_instrument\": \"co_instrument_patient\",\n    \"co_instrument_patient\": \"co_patient_instrument\",\n    \"co_result_instrument\": \"co_instrument_result\",\n    \"co_instrument_result\": \"co_result_instrument\",\n    # 'pertainym': '',\n    \"derivation\": \"derivation\",\n    \"simple_aspect_ip\": \"simple_aspect_pi\",\n    \"simple_aspect_pi\": \"simple_aspect_ip\",\n    \"secondary_aspect_ip\": \"secondary_aspect_pi\",\n    \"secondary_aspect_pi\": \"secondary_aspect_ip\",\n    \"feminine\": \"has_feminine\",\n    \"has_feminine\": \"feminine\",\n    \"masculine\": \"has_masculine\",\n    \"has_masculine\": \"masculine\",\n    \"young\": \"has_young\",\n    \"has_young\": \"young\",\n    \"diminutive\": \"has_diminutive\",\n    \"has_diminutive\": \"diminutive\",\n    \"augmentative\": \"has_augmentative\",\n    \"has_augmentative\": \"augmentative\",\n    \"anto_gradable\": \"anto_gradable\",\n    \"anto_simple\": \"anto_simple\",\n    \"anto_converse\": \"anto_converse\",\n    \"ir_synonym\": \"ir_synonym\",\n    # 'participle': '',\n    # 'other': '',\n    \"metaphor\": \"has_metaphor\",\n    \"metonym\": \"has_metonym\",\n}\n\n# Adjective Positions\n\nADJPOSITIONS = frozenset(\n    (\n        \"a\",  # attributive\n        \"ip\",  # immediate postnominal\n        \"p\",  # predicative\n    )\n)\n\n\n# Parts of Speech\n\nNOUN = \"n\"  #:\nVERB = \"v\"  #:\nADJ = ADJECTIVE = \"a\"  #:\nADV = ADVERB = \"r\"  #:\nADJ_SAT = ADJECTIVE_SATELLITE = \"s\"  #:\nPHRASE = \"t\"  #:\nCONJ = CONJUNCTION = \"c\"  #:\nADP = ADPOSITION = \"p\"  #:\nOTHER = \"x\"  #:\nUNKNOWN = \"u\"  #:\n\nPARTS_OF_SPEECH = frozenset(\n    (\n        NOUN,\n        VERB,\n        ADJECTIVE,\n        ADVERB,\n        ADJECTIVE_SATELLITE,\n        PHRASE,\n        CONJUNCTION,\n        ADPOSITION,\n        OTHER,\n        UNKNOWN,\n    )\n)\n\n\n# Lexicographer Files\n# from https://wordnet.princeton.edu/documentation/lexnames5wn\n\nLEXICOGRAPHER_FILES = {\n    \"adj.all\": 0,\n    \"adj.pert\": 1,\n    \"adv.all\": 2,\n    \"noun.Tops\": 3,\n    \"noun.act\": 4,\n    \"noun.animal\": 5,\n    \"noun.artifact\": 6,\n    \"noun.attribute\": 7,\n    \"noun.body\": 8,\n    \"noun.cognition\": 9,\n    \"noun.communication\": 10,\n    \"noun.event\": 11,\n    \"noun.feeling\": 12,\n    \"noun.food\": 13,\n    \"noun.group\": 14,\n    \"noun.location\": 15,\n    \"noun.motive\": 16,\n    \"noun.object\": 17,\n    \"noun.person\": 18,\n    \"noun.phenomenon\": 19,\n    \"noun.plant\": 20,\n    \"noun.possession\": 21,\n    \"noun.process\": 22,\n    \"noun.quantity\": 23,\n    \"noun.relation\": 24,\n    \"noun.shape\": 25,\n    \"noun.state\": 26,\n    \"noun.substance\": 27,\n    \"noun.time\": 28,\n    \"verb.body\": 29,\n    \"verb.change\": 30,\n    \"verb.cognition\": 31,\n    \"verb.communication\": 32,\n    \"verb.competition\": 33,\n    \"verb.consumption\": 34,\n    \"verb.contact\": 35,\n    \"verb.creation\": 36,\n    \"verb.emotion\": 37,\n    \"verb.motion\": 38,\n    \"verb.perception\": 39,\n    \"verb.possession\": 40,\n    \"verb.social\": 41,\n    \"verb.stative\": 42,\n    \"verb.weather\": 43,\n    \"adj.ppl\": 44,\n}\n"
  },
  {
    "path": "wn/ic.py",
    "content": "\"\"\"Information Content is a corpus-based metrics of synset or sense\nspecificity.\n\n\"\"\"\n\nfrom collections import Counter\nfrom collections.abc import Callable, Iterable, Iterator\nfrom math import log\nfrom pathlib import Path\nfrom typing import TextIO, TypeAlias\n\nfrom wn import Synset, Wordnet\nfrom wn._types import AnyPath\nfrom wn.constants import ADJ, ADJ_SAT, ADV, NOUN, VERB\nfrom wn.util import synset_id_formatter\n\n# Just use a subset of all available parts of speech\nIC_PARTS_OF_SPEECH = frozenset((NOUN, VERB, ADJ, ADV))\nFreq: TypeAlias = dict[str, dict[str | None, float]]\n\n\ndef information_content(synset: Synset, freq: Freq) -> float:\n    \"\"\"Calculate the Information Content value for a synset.\n\n    The information content of a synset is the negative log of the\n    synset probability (see :func:`synset_probability`).\n\n    \"\"\"\n    return -log(synset_probability(synset, freq))\n\n\ndef synset_probability(synset: Synset, freq: Freq) -> float:\n    \"\"\"Calculate the synset probability.\n\n    The synset probability is defined as freq(ss)/N where freq(ss) is\n    the IC weight for the synset and N is the total IC weight for all\n    synsets with the same part of speech.\n\n    Note: this function is not generally used directly, but indirectly\n    through :func:`information_content`.\n\n    \"\"\"\n    pos_freq = freq[synset.pos]\n    return pos_freq[synset.id] / pos_freq[None]\n\n\ndef _initialize(\n    wordnet: Wordnet,\n    smoothing: float,\n) -> Freq:\n    \"\"\"Populate an Information Content weight mapping to a smoothing value.\n\n    All synsets in *wordnet* are inserted into the dictionary and\n    mapped to *smoothing*.\n\n    \"\"\"\n    freq: Freq = {\n        pos: {synset.id: smoothing for synset in wordnet.synsets(pos=pos)}\n        for pos in IC_PARTS_OF_SPEECH\n    }\n    # pretend ADJ_SAT is just ADJ\n    for synset in wordnet.synsets(pos=ADJ_SAT):\n        freq[ADJ][synset.id] = smoothing\n    # also initialize totals (when synset is None) for each part-of-speech\n    for pos in IC_PARTS_OF_SPEECH:\n        freq[pos][None] = smoothing\n    return freq\n\n\ndef compute(\n    corpus: Iterable[str],\n    wordnet: Wordnet,\n    distribute_weight: bool = True,\n    smoothing: float = 1.0,\n) -> Freq:\n    \"\"\"Compute Information Content weights from a corpus.\n\n    Arguments:\n        corpus: An iterable of string tokens. This is a flat list of\n            words and the order does not matter. Tokens may be single\n            words or multiple words separated by a space.\n\n        wordnet: An instantiated :class:`wn.Wordnet` object, used to\n            look up synsets from words.\n\n        distribute_weight: If :python:`True`, the counts for a word\n            are divided evenly among all synsets for the word.\n\n        smoothing: The initial value given to each synset.\n\n    Example:\n        >>> import wn, wn.ic, wn.morphy\n        >>> ewn = wn.Wordnet(\"ewn:2020\", lemmatizer=wn.morphy.morphy)\n        >>> freq = wn.ic.compute([\"Dogs\", \"run\", \".\", \"Cats\", \"sleep\", \".\"], ewn)\n        >>> dog = ewn.synsets(\"dog\", pos=\"n\")[0]\n        >>> cat = ewn.synsets(\"cat\", pos=\"n\")[0]\n        >>> frog = ewn.synsets(\"frog\", pos=\"n\")[0]\n        >>> freq[\"n\"][dog.id]\n        1.125\n        >>> freq[\"n\"][cat.id]\n        1.1\n        >>> freq[\"n\"][frog.id]  # no occurrence; smoothing value only\n        1.0\n        >>> carnivore = dog.lowest_common_hypernyms(cat)[0]\n        >>> freq[\"n\"][carnivore.id]\n        1.3250000000000002\n    \"\"\"\n    freq = _initialize(wordnet, smoothing)\n    counts = Counter(corpus)\n\n    hypernym_cache: dict[Synset, list[Synset]] = {}\n    for word, count in counts.items():\n        synsets = wordnet.synsets(word)\n        num = len(synsets)\n        if num == 0:\n            continue\n\n        weight = float(count / num if distribute_weight else count)\n\n        for synset in synsets:\n            pos = synset.pos\n            if pos == ADJ_SAT:\n                pos = ADJ\n            if pos not in IC_PARTS_OF_SPEECH:\n                continue\n\n            freq[pos][None] += weight\n\n            # The following while-loop is equivalent to:\n            #\n            # freq[pos][synset.id] += weight\n            # for path in synset.hypernym_paths():\n            #     for ss in path:\n            #         freq[pos][ss.id] += weight\n            #\n            # ...but it caches hypernym lookups for speed\n\n            agenda: list[tuple[Synset, set[Synset]]] = [(synset, set())]\n            while agenda:\n                ss, seen = agenda.pop()\n\n                # avoid cycles\n                if ss in seen:\n                    continue\n\n                freq[pos][ss.id] += weight\n\n                if ss not in hypernym_cache:\n                    hypernym_cache[ss] = ss.hypernyms()\n                agenda.extend((hyp, seen | {ss}) for hyp in hypernym_cache[ss])\n\n    return freq\n\n\ndef load(\n    source: AnyPath,\n    wordnet: Wordnet,\n    get_synset_id: Callable | None = None,\n) -> Freq:\n    \"\"\"Load an Information Content mapping from a file.\n\n    Arguments:\n\n        source: A path to an information content weights file.\n\n        wordnet: A :class:`wn.Wordnet` instance with synset\n            identifiers matching the offsets in the weights file.\n\n        get_synset_id: A callable that takes a synset offset and part\n            of speech and returns a synset ID valid in *wordnet*.\n\n    Raises:\n\n        :class:`wn.Error`: If *wordnet* does not have exactly one\n            lexicon.\n\n    Example:\n\n        >>> import wn, wn.ic\n        >>> pwn = wn.Wordnet(\"pwn:3.0\")\n        >>> path = \"~/nltk_data/corpora/wordnet_ic/ic-brown-resnik-add1.dat\"\n        >>> freq = wn.ic.load(path, pwn)\n\n    \"\"\"\n    source = Path(source).expanduser().resolve(strict=True)\n    assert len(wordnet.lexicons()) == 1\n    lexid = wordnet.lexicons()[0].id\n    if get_synset_id is None:\n        get_synset_id = synset_id_formatter(prefix=lexid)\n\n    freq = _initialize(wordnet, 0.0)\n\n    with source.open() as icfile:\n        for offset, pos, weight, is_root in _parse_ic_file(icfile):\n            ssid = get_synset_id(offset=offset, pos=pos)\n            # synset = wordnet.synset(ssid)\n            freq[pos][ssid] = weight\n            if is_root:\n                freq[pos][None] += weight\n    return freq\n\n\ndef _parse_ic_file(icfile: TextIO) -> Iterator[tuple[int, str, float, bool]]:\n    \"\"\"Parse the Information Content file.\n\n    A sample of the format is::\n\n        wnver::eOS9lXC6GvMWznF1wkZofDdtbBU\n        1740n 1915712 ROOT\n        1930n 859272\n        2137n 1055337\n\n    \"\"\"\n    next(icfile)  # skip header\n    for line in icfile:\n        ssinfo, value, *isroot = line.split()\n        yield (int(ssinfo[:-1]), ssinfo[-1], float(value), bool(isroot))\n"
  },
  {
    "path": "wn/ili.py",
    "content": "\"\"\"Interlingual Indices\n\nThis module provides classes and functions for inspecting Interlingual\nIndex (ILI) objects, both existing and proposed and including their\ndefinitions and any metadata, for synsets and lexicons.\n\"\"\"\n\nfrom __future__ import annotations\n\nfrom dataclasses import dataclass, field\nfrom enum import Enum\nfrom itertools import zip_longest\nfrom pathlib import Path\nfrom typing import TYPE_CHECKING, Literal, Protocol, overload\n\nfrom wn._lexicon import Lexicon, LexiconElementWithMetadata\nfrom wn._metadata import HasMetadata\nfrom wn._queries import (\n    find_ilis,\n    find_proposed_ilis,\n    get_ili,\n)\nfrom wn._wordnet import Wordnet\n\nif TYPE_CHECKING:\n    from collections.abc import Iterator\n\n    from wn._core import Synset\n    from wn._metadata import Metadata\n    from wn._types import AnyPath\n\n\nclass ILIStatus(str, Enum):\n    __module__ = \"wn\"\n\n    UNKNOWN = \"unknown\"  # no information available\n    ACTIVE = \"active\"  # attested in ILI file and marked as active\n    PRESUPPOSED = \"presupposed\"  # used by lexicon, ILI file not loaded\n    PROPOSED = \"proposed\"  # proposed by lexicon for addition to ILI\n\n\n@dataclass(slots=True)\nclass ILIDefinition(HasMetadata):\n    \"\"\"Class for modeling ILI definitions.\"\"\"\n\n    __module__ = \"wn\"\n\n    text: str\n    _metadata: Metadata | None = field(default=None, compare=False, repr=False)\n    _lexicon: str | None = field(default=None, compare=False, repr=False)\n\n    def metadata(self) -> Metadata:\n        \"\"\"Return the ILI's metadata.\"\"\"\n        return self._metadata if self._metadata is not None else {}\n\n    def confidence(self) -> float:\n        c = self.metadata().get(\"confidenceScore\")\n        if c is None:\n            if self._lexicon:\n                # ProposedILIs are lexicon elements and inherit their\n                # lexicon's confidence value\n                c = Lexicon.from_specifier(self._lexicon).confidence()\n            else:\n                # Regular ILIs are not lexicon elements\n                c = 1.0\n        return float(c)\n\n\nclass ILIProtocol(Protocol):\n    _definition_text: str | None\n    _definition_metadata: Metadata | None\n\n    @property\n    def id(self) -> str | None:\n        \"\"\"The ILI identifier.\"\"\"\n        ...\n\n    @property\n    def status(self) -> ILIStatus:\n        \"\"\"The status of the ILI.\"\"\"\n        ...\n\n    @overload\n    def definition(self, *, data: Literal[False] = False) -> str | None: ...\n    @overload\n    def definition(self, *, data: Literal[True] = True) -> ILIDefinition | None: ...\n\n    # fallback for non-literal bool argument\n    @overload\n    def definition(self, *, data: bool) -> str | ILIDefinition | None: ...\n\n    def definition(self, *, data: bool = False) -> str | ILIDefinition | None:\n        \"\"\"Return the ILI's definition.\n\n        If the *data* argument is :python:`False` (the default), the\n        definition is returned as a :class:`str` type. If it is\n        :python:`True`, a :class:`wn.ILIDefinition` object is used instead.\n\n        Note that :class:`ILI` objects will not have definitions unless\n        an ILI resource has been added, but :class:`ProposedILI` objects\n        will have definitions if one is provided by the proposing lexicon.\n\n        \"\"\"\n        if data and self._definition_text:\n            return ILIDefinition(\n                self._definition_text,\n                _metadata=self._definition_metadata,\n                # lexicon is defined only for proposed ILIs\n                _lexicon=getattr(self, \"_lexicon\", None),\n            )\n        return self._definition_text\n\n\n@dataclass(frozen=True, slots=True)\nclass ILI(ILIProtocol):\n    \"\"\"A class for interlingual indices.\"\"\"\n\n    __module__ = \"wn\"\n\n    id: str\n    status: ILIStatus = field(\n        default=ILIStatus.UNKNOWN, repr=False, hash=False, compare=False\n    )\n    _definition_text: str | None = field(\n        default=None, repr=False, hash=False, compare=False\n    )\n    _definition_metadata: Metadata | None = field(\n        default=None, repr=False, hash=False, compare=False\n    )\n\n\n@dataclass(frozen=True, slots=True)\nclass ProposedILI(LexiconElementWithMetadata, ILIProtocol):\n    __module__ = \"wn\"\n\n    _synset: str\n    _lexicon: str\n    _definition_text: str | None = field(\n        default=None, repr=False, hash=False, compare=False\n    )\n    _definition_metadata: Metadata | None = field(\n        default=None, repr=False, hash=False, compare=False\n    )\n\n    @property\n    def id(self) -> Literal[None]:\n        \"\"\"Always return :python:`None`.\n\n        Proposed ILIs do not have identifiers. This method is kept for\n        interface consistency.\n\n        \"\"\"\n        return None\n\n    @property\n    def status(self) -> Literal[ILIStatus.PROPOSED]:\n        \"\"\"Always return :attr:`ILIStatus.PROPOSED`.\n\n        Proposed ILI objects are only used for ILIs that are proposed.\n\n        \"\"\"\n        return ILIStatus.PROPOSED\n\n    def synset(self) -> Synset:\n        \"\"\"Return the synset object associated with the proposed ILI.\"\"\"\n        return Wordnet(self._lexicon).synset(self._synset)\n\n\ndef get(id: str) -> ILI | None:\n    \"\"\"Get the ILI object with the given id.\n\n    The *id* argument is a string ILI identifier. If *id* does not\n    match a known ILI, :python:`None` is returned. Note that a\n    :python:`None` value does not necessarily mean that there is no\n    such ILI, but rather that no resource declaring that ILI has been\n    loaded into Wn's database.\n\n    Example:\n\n    >>> from wn import ili\n    >>> ili.get(\"i12345\")\n    ILI('i12345')\n    >>> ili.get(\"i0\") is None\n    True\n\n    \"\"\"\n    if row := get_ili(id=id):\n        id, status, defn, meta = row\n        return ILI(\n            id,\n            status=ILIStatus(status),\n            _definition_text=defn,\n            _definition_metadata=meta,\n        )\n    return None\n\n\ndef get_all(\n    *,\n    status: ILIStatus | str | None = None,\n    lexicon: str | None = None,\n) -> list[ILI]:\n    \"\"\"Get the list of all matching ILI objects.\n\n    The *status* argument may be a string matching a single\n    :class:`ILIStatus`, or a union of one or more :class:`ILIStatus`\n    values. The *lexicon* argument is a space-separated string of\n    lexicon specifiers. All ILIs with a matching status and lexicon\n    will be returned.\n\n    Example:\n\n    >>> from wn import ili\n    >>> len(ili.get_all())\n    117442\n\n    \"\"\"\n    if isinstance(status, str):\n        status = ILIStatus(status)\n    lexicons = lexicon.split() if lexicon else []\n    return [\n        ILI(\n            id,\n            status=ILIStatus(status),\n            _definition_text=defn,\n            _definition_metadata=meta,\n        )\n        for id, status, defn, meta in find_ilis(status=status, lexicons=lexicons)\n    ]\n\n\ndef get_proposed(synset: Synset) -> ProposedILI | None:\n    \"\"\"Get a proposed ILI for *synset* if it exists.\n\n    The synset itself does not give a good indication if it has an\n    associated proposed ILI. The :attr:`wn.Synset.ili` value will be\n    :python:`None`, but this is also true if there is no ILI at all.\n    In most cases it is easier to list the proposed ILIs for a lexicon\n    using :func:`get_all_proposed`, then to retrieve their associated\n    synsets.\n\n    Example:\n\n    >>> import wn\n    >>> from wn import ili\n    >>> en = wn.Wordnet(\"oewn:2024\")\n    >>> en.synset(\"oewn-00002935-r\").ili is None\n    True\n    >>> ili.get_proposed(en.synset(\"oewn-00002935-r\"))\n    ProposedILI(_synset='oewn-00002935-r', _lexicon='oewn:2024')\n\n    \"\"\"\n    results = find_proposed_ilis(\n        synset_id=synset.id,\n        lexicons=(synset.lexicon().specifier(),),\n    )\n    if row := next(results, None):\n        return ProposedILI(*row)\n    return None\n\n\ndef get_all_proposed(lexicon: str | None = None) -> list[ProposedILI]:\n    \"\"\"Get the list of all proposed ILI objects.\n\n    The *lexicon* argument is a space-separated string of lexicon\n    specifiers. Proposed ILIs matching the lexicon will be returned.\n\n    Example:\n\n    >>> from wn import ili\n    >>> proposed = ili.get_all_proposed(\"oewn:2024\")\n    >>> proposed[0]\n    ProposedILI(_synset='oewn-00002935-r', _lexicon='oewn:2024')\n    >>> proposed[0].synset()\n    Synset('oewn-00002935-r')\n\n    \"\"\"\n    lexicons = lexicon.split() if lexicon else []\n    return [ProposedILI(*row) for row in find_proposed_ilis(lexicons=lexicons)]\n\n\ndef is_ili_tsv(source: AnyPath) -> bool:\n    \"\"\"Return True if *source* is an ILI tab-separated-value file.\n\n    This only checks that the first column, split by tabs, of the\n    first line is 'ili' or 'ILI'. It does not check if each line has\n    the correct number of columns.\n\n    \"\"\"\n    source = Path(source).expanduser()\n    if source.is_file():\n        try:\n            with source.open(\"rb\") as fh:\n                return next(fh).split(b\"\\t\")[0] in (b\"ili\", b\"ILI\")\n        except (StopIteration, IndexError):\n            pass\n    return False\n\n\ndef load_tsv(source: AnyPath) -> Iterator[dict[str, str]]:\n    \"\"\"Yield data from an ILI tab-separated-value file.\n\n    This function yields dictionaries mapping field names to values.\n    The *source* argument is a path to an ILI file.\n\n    Example:\n\n    >>> from wn import ili\n    >>> obj = next(ili._load_tsv(\"cili.tsv\"))\n    >>> obj.keys()\n    dict_keys(['ili', 'definition'])\n    >>> obj[\"ili\"]\n    'i1'\n\n    \"\"\"\n    source = Path(source).expanduser()\n    with source.open(encoding=\"utf-8\") as fh:\n        header = next(fh).rstrip(\"\\r\\n\")\n        fields = tuple(map(str.lower, header.split(\"\\t\")))\n        for line in fh:\n            yield dict(\n                zip_longest(\n                    fields,\n                    line.rstrip(\"\\r\\n\").split(\"\\t\"),\n                    fillvalue=\"\",\n                )\n            )\n"
  },
  {
    "path": "wn/index.toml",
    "content": "[cili]\n  type = \"ili\"\n  label = \"Collaborative Interlingual Index\"\n  license = \"https://creativecommons.org/licenses/by/4.0/\"\n  [cili.versions.\"1.0\"]\n    url = \"https://github.com/globalwordnet/cili/releases/download/v1.0/cili.tsv.xz\"\n\n\n[oewn]\n  label = \"Open English WordNet\"\n  language = \"en\"\n  license = \"https://creativecommons.org/licenses/by/4.0/\"\n  [oewn.versions.\"2025+\"]\n    url = \"https://en-word.net/static/english-wordnet-2025-plus.xml.gz\"\n  [oewn.versions.2025]\n    url = \"https://en-word.net/static/english-wordnet-2025.xml.gz\"\n  [oewn.versions.2024]\n    url = \"\"\"\n      https://en-word.net/static/english-wordnet-2024.xml.gz\n      https://github.com/globalwordnet/english-wordnet/releases/download/2024-edition/english-wordnet-2024.xml.gz\n    \"\"\"\n  [oewn.versions.2023]\n    url = \"\"\"\n      https://en-word.net/static/english-wordnet-2023.xml.gz\n      https://github.com/globalwordnet/english-wordnet/releases/download/2023-edition/english-wordnet-2023.xml.gz\n    \"\"\"\n  [oewn.versions.2022]\n    url = \"\"\"\n      https://en-word.net/static/english-wordnet-2022.xml.gz\n      https://github.com/globalwordnet/english-wordnet/releases/download/2022-edition/english-wordnet-2022.xml.gz\n    \"\"\"\n  [oewn.versions.2021]\n    url = \"https://en-word.net/static/english-wordnet-2021.xml.gz\"\n  [oewn.versions.2020]\n    error = \"Use 'ewn' as the ID prior to version 2021 ('ewn:2020')\"\n  [oewn.versions.2019]\n    error = \"Use 'ewn' as the ID prior to version 2021 ('ewn:2019')\"\n\n\n[ewn]\n  label = \"Open English WordNet\"\n  language = \"en\"\n  license = \"https://creativecommons.org/licenses/by/4.0/\"\n  [ewn.versions.2021]\n    error = \"Use 'oewn' as the ID from version 2021 ('oewn:2021')\"\n  [ewn.versions.2020]\n    url = \"https://en-word.net/static/english-wordnet-2020.xml.gz\"\n  [ewn.versions.2019]\n    url = \"https://en-word.net/static/english-wordnet-2019.xml.gz\"\n\n\n[odenet]\n  label = \"Open German WordNet\"\n  language = \"de\"\n  license = \"https://creativecommons.org/licenses/by-sa/4.0/\"\n  [odenet.versions.\"1.4\"]\n    url = \"https://github.com/hdaSprachtechnologie/odenet/releases/download/v1.4/odenet-1.4.tar.xz\"\n  [odenet.versions.\"1.3\"]\n    url = \"https://github.com/hdaSprachtechnologie/odenet/releases/download/v1.3/odenet-1.3.tar.xz\"\n\n\n[omw]\n  label = \"Open Multilingual Wordnet\"\n  language = \"mul\"\n  license = \"Please consult the LICENSE files included with the individual wordnets. Note that all permit redistribution.\"\n  [omw.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-2.0.tar.xz\"\n  [omw.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-1.4.tar.xz\"\n  [omw.versions.\"1.3\"]\n    error = \"OMW 1.3 is no longer indexed; See https://github.com/goodmami/wn#changes-to-the-index\"\n\n\n[omw-en]\n  label = \"OMW English Wordnet based on WordNet 3.0\"\n  language = \"en\"\n  license = \"https://wordnet.princeton.edu/license-and-commercial-use\"\n  [omw-en.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en-2.0.tar.xz\"\n  [omw-en.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-en-1.4.tar.xz\"\n\n[omw-en15]\n  label = \"OMW English Wordnet based on WordNet-1.5\"\n  language = \"en\"\n  license = \"WordNet-1.5 License\"\n\n  [omw-en15.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en15-2.0.tar.xz\"\n\n[omw-en16]\n  label = \"OMW English Wordnet based on WordNet-1.6\"\n  language = \"en\"\n  license = \"WordNet-1.6 License\"\n\n  [omw-en16.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en16-2.0.tar.xz\"\n\n[omw-en17]\n  label = \"OMW English Wordnet based on WordNet-1.7\"\n  language = \"en\"\n  license = \"https://wordnetcode.princeton.edu/1.7/LICENSE\"\n\n  [omw-en17.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en17-2.0.tar.xz\"\n\n[omw-en171]\n  label = \"OMW English Wordnet based on WordNet-1.7.1\"\n  language = \"en\"\n  license = \"https://wordnetcode.princeton.edu/1.7.1/LICENSE\"\n\n  [omw-en171.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en171-2.0.tar.xz\"\n\n[omw-en20]\n  label = \"OMW English Wordnet based on WordNet-2.0\"\n  language = \"en\"\n  license = \"https://wordnetcode.princeton.edu/2.0/LICENSE\"\n\n  [omw-en20.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en20-2.0.tar.xz\"\n\n[omw-en21]\n  label = \"OMW English Wordnet based on WordNet-2.1\"\n  language = \"en\"\n  license = \"https://wordnetcode.princeton.edu/2.1/LICENSE\"\n\n  [omw-en21.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en21-2.0.tar.xz\"\n\n[omw-en31]\n  label = \"OMW English Wordnet based on WordNet 3.1\"\n  language = \"en\"\n  license = \"https://wordnet.princeton.edu/license-and-commercial-use\"\n  [omw-en31.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-en31-2.0.tar.xz\"\n  [omw-en31.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-en31-1.4.tar.xz\"\n\n\n[omw-arb]\n  label = \"Arabic WordNet (AWN v2)\"\n  language = \"arb\"\n  license = \"https://creativecommons.org/licenses/by-sa/3.0/\"\n  [omw-arb.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-arb-2.0.tar.xz\"\n  [omw-arb.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-arb-1.4.tar.xz\"\n\n\n[omw-bg]\n  label = \"BulTreeBank Wordnet (BTB-WN)\"\n  language = \"bg\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-bg.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-bg-2.0.tar.xz\"\n  [omw-bg.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-bg-1.4.tar.xz\"\n\n\n[omw-ca]\n  label = \"Multilingual Central Repository (Catalan)\"\n  language = \"ca\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-ca.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-ca-2.0.tar.xz\"\n  [omw-ca.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-ca-1.4.tar.xz\"\n\n\n[omw-cmn]\n  label = \"Chinese Open Wordnet\"\n  language = \"cmn-Hans\"\n  license = \"wordnet\"\n  [omw-cmn.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-cmn-2.0.tar.xz\"\n  [omw-cmn.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-cmn-1.4.tar.xz\"\n\n\n[omw-da]\n  label = \"DanNet\"\n  language = \"da\"\n  license = \"wordnet\"\n  [omw-da.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-da-2.0.tar.xz\"\n  [omw-da.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-da-1.4.tar.xz\"\n\n\n[omw-el]\n  label = \"Greek Wordnet\"\n  language = \"el\"\n  license = \"https://opensource.org/licenses/Apache-2.0\"\n  [omw-el.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-el-2.0.tar.xz\"\n  [omw-el.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-el-1.4.tar.xz\"\n\n\n[omw-es]\n  label = \"Multilingual Central Repository (Spanish)\"\n  language = \"es\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-es.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-es-2.0.tar.xz\"\n  [omw-es.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-es-1.4.tar.xz\"\n\n\n[omw-eu]\n  label = \"Multilingual Central Repository (Basque)\"\n  language = \"eu\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-eu.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-eu-2.0.tar.xz\"\n  [omw-eu.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-eu-1.4.tar.xz\"\n\n\n[omw-fi]\n  label = \"FinnWordNet\"\n  language = \"fi\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-fi.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-fi-2.0.tar.xz\"\n  [omw-fi.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-fi-1.4.tar.xz\"\n\n\n[omw-fr]\n  label = \"WOLF (Wordnet Libre du Français)\"\n  language = \"fr\"\n  license = \"http://www.cecill.info/licenses/Licence_CeCILL-C_V1-en.html\"\n  [omw-fr.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-fr-2.0.tar.xz\"\n  [omw-fr.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-fr-1.4.tar.xz\"\n\n\n[omw-gl]\n  label = \"Multilingual Central Repository (Galician)\"\n  language = \"gl\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-gl.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-gl-2.0.tar.xz\"\n  [omw-gl.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-gl-1.4.tar.xz\"\n\n\n[omw-he]\n  label = \"Hebrew Wordnet\"\n  language = \"he\"\n  license = \"wordnet\"\n  [omw-he.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-he-2.0.tar.xz\"\n  [omw-he.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-he-1.4.tar.xz\"\n\n\n[omw-hr]\n  label = \"Croatian Wordnet\"\n  language = \"hr\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-hr.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-hr-2.0.tar.xz\"\n  [omw-hr.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-hr-1.4.tar.xz\"\n\n\n[omw-id]\n  label = \"Wordnet Bahasa (Indonesian)\"\n  language = \"id\"\n  license = \"https://opensource.org/licenses/MIT/\"\n  [omw-id.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-id-2.0.tar.xz\"\n  [omw-id.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-id-1.4.tar.xz\"\n\n\n[omw-is]\n  label = \"IceWordNet\"\n  language = \"is\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-is.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-is-2.0.tar.xz\"\n  [omw-is.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-is-1.4.tar.xz\"\n\n\n[omw-it]\n  label = \"MultiWordNet (Italian)\"\n  language = \"it\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-it.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-it-2.0.tar.xz\"\n  [omw-it.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-it-1.4.tar.xz\"\n\n\n[omw-iwn]\n  label = \"ItalWordNet\"\n  language = \"it\"\n  license = \"http://opendefinition.org/licenses/odc-by/\"\n  [omw-iwn.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-iwn-2.0.tar.xz\"\n  [omw-iwn.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-iwn-1.4.tar.xz\"\n\n\n[omw-ja]\n  label = \"Japanese Wordnet\"\n  language = \"ja\"\n  license = \"wordnet\"\n  [omw-ja.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-ja-2.0.tar.xz\"\n  [omw-ja.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-ja-1.4.tar.xz\"\n\n\n[omw-lt]\n  label = \"Lithuanian  WordNet\"\n  language = \"lt\"\n  license = \"https://creativecommons.org/licenses/by-sa/3.0/\"\n  [omw-lt.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-lt-2.0.tar.xz\"\n  [omw-lt.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-lt-1.4.tar.xz\"\n\n\n[omw-nb]\n  label = \"Norwegian Wordnet (Bokmål)\"\n  language = \"nb\"\n  license = \"wordnet\"\n  [omw-nb.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-nb-2.0.tar.xz\"\n  [omw-nb.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-nb-1.4.tar.xz\"\n\n\n[omw-nl]\n  label = \"Open Dutch WordNet\"\n  language = \"nl\"\n  license = \"https://creativecommons.org/licenses/by-sa/4.0/\"\n  [omw-nl.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-nl-2.0.tar.xz\"\n  [omw-nl.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-nl-1.4.tar.xz\"\n\n\n[omw-nn]\n  label = \"Norwegian Wordnet (Nynorsk)\"\n  language = \"nn\"\n  license = \"wordnet\"\n  [omw-nn.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-nn-2.0.tar.xz\"\n  [omw-nn.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-nn-1.4.tar.xz\"\n\n\n[omw-pl]\n  label = \"plWordNet\"\n  language = \"pl\"\n  license = \"wordnet\"\n  [omw-pl.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-pl-2.0.tar.xz\"\n  [omw-pl.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-pl-1.4.tar.xz\"\n\n\n[omw-pt]\n  label = \"OpenWN-PT\"\n  language = \"pt\"\n  license = \"https://creativecommons.org/licenses/by-sa/\"\n  [omw-pt.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-pt-2.0.tar.xz\"\n  [omw-pt.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-pt-1.4.tar.xz\"\n\n\n[omw-ro]\n  label = \"Romanian Wordnet\"\n  language = \"ro\"\n  license = \"https://creativecommons.org/licenses/by-sa/\"\n  [omw-ro.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-ro-2.0.tar.xz\"\n  [omw-ro.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-ro-1.4.tar.xz\"\n\n\n[omw-sk]\n  label = \"Slovak WordNet\"\n  language = \"sk\"\n  license = \"https://creativecommons.org/licenses/by-sa/3.0/\"\n  [omw-sk.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-sk-2.0.tar.xz\"\n  [omw-sk.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-sk-1.4.tar.xz\"\n\n\n[omw-sl]\n  label = \"sloWNet\"\n  language = \"sl\"\n  license = \"https://creativecommons.org/licenses/by-sa/3.0/\"\n  [omw-sl.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-sl-2.0.tar.xz\"\n  [omw-sl.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-sl-1.4.tar.xz\"\n\n\n[omw-sq]\n  label = \"Albanet\"\n  language = \"sq\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-sq.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-sq-2.0.tar.xz\"\n  [omw-sq.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-sq-1.4.tar.xz\"\n\n\n[omw-sv]\n  label = \"WordNet-SALDO\"\n  language = \"sv\"\n  license = \"https://creativecommons.org/licenses/by/3.0/\"\n  [omw-sv.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-sv-2.0.tar.xz\"\n  [omw-sv.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-sv-1.4.tar.xz\"\n\n\n[omw-th]\n  label = \"Thai Wordnet\"\n  language = \"th\"\n  license = \"wordnet\"\n  [omw-th.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-th-2.0.tar.xz\"\n  [omw-th.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-th-1.4.tar.xz\"\n\n\n[omw-zsm]\n  label = \"Wordnet Bahasa (Malaysian)\"\n  language = \"zsm\"\n  license = \"https://opensource.org/licenses/MIT/\"\n  [omw-zsm.versions.\"2.0\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v2.0/omw-zsm-2.0.tar.xz\"\n  [omw-zsm.versions.\"1.4\"]\n    url = \"https://github.com/omwn/omw-data/releases/download/v1.4/omw-zsm-1.4.tar.xz\"\n\n\n[own]\n  label = \"Open Wordnets for Portuguese and English\"\n  language = \"mul\"\n  license = \"Please consult the LICENSE files.\"\n  [own.versions.\"1.0.0\"]\n    url = \"https://github.com/own-pt/openWordnet-PT/releases/download/v1.0.0/own.tar.gz\"\n\n\n[own-en]\n  label = \"Open Wordnet for English\"\n  language = \"en\"\n  license = \"Please consult the LICENSE files.\"\n  [own-en.versions.\"1.0.0\"]\n    url = \"https://github.com/own-pt/openWordnet-PT/releases/download/v1.0.0/own-en.tar.gz\"\n\n\n[own-pt]\n  label = \"Open Wordnet for Portuguese\"\n  language = \"pt\"\n  license = \"Please consult the LICENSE files.\"\n  [own-pt.versions.\"1.0.0\"]\n    url = \"https://github.com/own-pt/openWordnet-PT/releases/download/v1.0.0/own-pt.tar.gz\"\n\n\n[kurdnet]\n  label = \"KurdNet (Kurdish WordNet)\"\n  language = \"ckb\"\n  license = \"https://creativecommons.org/licenses/by-sa/4.0/\"\n  [kurdnet.versions.\"1.0\"]\n    url = \"https://github.com/sinaahmadi/kurdnet/releases/download/kurdnet-1.0.tar.xz/kurdnet-1.0.tar.xz\"\n\n\n# Delisted wordnets\n\n[pwn]\n  [pwn.versions.\"3.0\"]\n    error = \"'pwn:3.0' is no longer indexed; use 'omw-en:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n  [pwn.versions.\"3.1\"]\n    error = \"'pwn:3.1' is no longer indexed; use 'omw-en31:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n\n[alswn]\n  error = \"'alswn:1.3+omw' is no longer indexed; use 'omw-sq:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[arbwn]\n  error = \"'arbwn:1.3+omw' is no longer indexed; use 'omw-arb:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[bulwn]\n  error = \"'bulwn:1.3+omw' is no longer indexed; use 'omw-bg:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[catwn]\n  error = \"'catwn:1.3+omw' is no longer indexed; use 'omw-ca:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[cmnwn]\n  error = \"'cmnwn:1.3+omw' is no longer indexed; use 'omw-cmn:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[danwn]\n  error = \"'danwn:1.3+omw' is no longer indexed; use 'omw-da:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[ellwn]\n  error = \"'ellwn:1.3+omw' is no longer indexed; use 'omw-el:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[euswn]\n  error = \"'euswn:1.3+omw' is no longer indexed; use 'omw-eu:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[finwn]\n  error = \"'finwn:1.3+omw' is no longer indexed; use 'omw-fi:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[frawn]\n  error = \"'frawn:1.3+omw' is no longer indexed; use 'omw-fr:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[glgwn]\n  error = \"'glgwn:1.3+omw' is no longer indexed; use 'omw-gl:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[hebwn]\n  error = \"'hebwn:1.3+omw' is no longer indexed; use 'omw-he:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[hrvwn]\n  error = \"'hrvwn:1.3+omw' is no longer indexed; use 'omw-hr:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[indwn]\n  error = \"'indwn:1.3+omw' is no longer indexed; use 'omw-id:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[islwn]\n  error = \"'islwn:1.3+omw' is no longer indexed; use 'omw-is:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[itawn]\n  error = \"'itawn:1.3+omw' is no longer indexed; use 'omw-it:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[iwn]\n  error = \"'iwn:1.3+omw' is no longer indexed; use 'omw-iwn:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[jpnwn]\n  error = \"'jpnwn:1.3+omw' is no longer indexed; use 'omw-ja:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[litwn]\n  error = \"'litwn:1.3+omw' is no longer indexed; use 'omw-lt:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[nldwn]\n  error = \"'nldwn:1.3+omw' is no longer indexed; use 'omw-nl:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[nnown]\n  error = \"'nnown:1.3+omw' is no longer indexed; use 'omw-nn:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[nobwn]\n  error = \"'nobwn:1.3+omw' is no longer indexed; use 'omw-nb:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[polwn]\n  error = \"'polwn:1.3+omw' is no longer indexed; use 'omw-pl:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[porwn]\n  error = \"'porwn:1.3+omw' is no longer indexed; use 'omw-pt:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[ronwn]\n  error = \"'ronwn:1.3+omw' is no longer indexed; use 'omw-ro:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[slkwn]\n  error = \"'slkwn:1.3+omw' is no longer indexed; use 'omw-sk:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[slvwn]\n  error = \"'slvwn:1.3+omw' is no longer indexed; use 'omw-sl:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[spawn]\n  error = \"'spawn:1.3+omw' is no longer indexed; use 'omw-es:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[swewn]\n  error = \"'swewn:1.3+omw' is no longer indexed; use 'omw-sv:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[thawn]\n  error = \"'thawn:1.3+omw' is no longer indexed; use 'omw-th:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n[zsmwn]\n  error = \"'zsmwn:1.3+omw' is no longer indexed; use 'omw-zsm:1.4' instead (https://github.com/goodmami/wn#changes-to-the-index)\"\n"
  },
  {
    "path": "wn/lmf.py",
    "content": "\"\"\"\nReader for the Lexical Markup Framework (LMF) format.\n\"\"\"\n\nimport re\nimport xml.etree.ElementTree as ET  # for general XML parsing\nimport xml.parsers.expat  # for fast scanning of Lexicon versions\nfrom pathlib import Path\nfrom typing import Any, BinaryIO, Literal, TextIO, TypedDict, cast\nfrom xml.sax.saxutils import quoteattr\n\nfrom wn._exceptions import Error\nfrom wn._metadata import Metadata\nfrom wn._types import AnyPath, VersionInfo\nfrom wn._util import is_xml, version_info\nfrom wn.util import ProgressBar, ProgressHandler\n\n\nclass LMFError(Error):\n    \"\"\"Raised on invalid LMF-XML documents.\"\"\"\n\n\nclass LMFWarning(Warning):\n    \"\"\"Issued on non-conforming LFM values.\"\"\"\n\n\nSUPPORTED_VERSIONS = {\"1.0\", \"1.1\", \"1.2\", \"1.3\", \"1.4\"}\n_XMLDECL = b'<?xml version=\"1.0\" encoding=\"UTF-8\"?>'\n_XMLSPACEATTR = \"http://www.w3.org/XML/1998/namespace space\"  # xml:space\n_DOCTYPE = '<!DOCTYPE LexicalResource SYSTEM \"{schema}\">'\n_SCHEMAS = {\n    \"1.0\": \"https://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd\",\n    \"1.1\": \"https://globalwordnet.github.io/schemas/WN-LMF-1.1.dtd\",\n    \"1.2\": \"https://globalwordnet.github.io/schemas/WN-LMF-1.2.dtd\",\n    \"1.3\": \"https://globalwordnet.github.io/schemas/WN-LMF-1.3.dtd\",\n    \"1.4\": \"https://globalwordnet.github.io/schemas/WN-LMF-1.4.dtd\",\n}\n_DOCTYPES = {\n    _DOCTYPE.format(schema=schema): version for version, schema in _SCHEMAS.items()\n}\n_DOCTYPES.update(\n    (_DOCTYPE.format(schema=schema.replace(\"https://\", \"http://\")), version)\n    for version, schema in _SCHEMAS.items()\n)\n\n_DC_URIS = {\n    \"1.0\": \"http://purl.org/dc/elements/1.1/\",\n    \"1.1\": \"https://globalwordnet.github.io/schemas/dc/\",\n    \"1.2\": \"https://globalwordnet.github.io/schemas/dc/\",\n    \"1.3\": \"https://globalwordnet.github.io/schemas/dc/\",\n    \"1.4\": \"https://globalwordnet.github.io/schemas/dc/\",\n}\n_DC_ATTRS = [\n    \"contributor\",\n    \"coverage\",\n    \"creator\",\n    \"date\",\n    \"description\",\n    \"format\",\n    \"identifier\",\n    \"publisher\",\n    \"relation\",\n    \"rights\",\n    \"source\",\n    \"subject\",\n    \"title\",\n    \"type\",\n]\n_NS_ATTRS = {\n    version: dict(\n        [(f\"{uri} {attr}\", attr) for attr in _DC_ATTRS]\n        + [\n            (\"status\", \"status\"),\n            (\"note\", \"note\"),\n            (\"confidenceScore\", \"confidenceScore\"),\n        ]\n    )\n    for version, uri in _DC_URIS.items()\n}\n\n_LMF_1_0_ELEMS: dict[str, str] = {\n    \"LexicalResource\": \"lexical-resource\",\n    \"Lexicon\": \"lexicons\",\n    \"LexicalEntry\": \"entries\",\n    \"Lemma\": \"lemma\",\n    \"Form\": \"forms\",\n    \"Tag\": \"tags\",\n    \"Sense\": \"senses\",\n    \"SenseRelation\": \"relations\",\n    \"Example\": \"examples\",\n    \"Count\": \"counts\",\n    \"SyntacticBehaviour\": \"frames\",\n    \"Synset\": \"synsets\",\n    \"Definition\": \"definitions\",\n    \"ILIDefinition\": \"ili_definition\",\n    \"SynsetRelation\": \"relations\",\n}\n_LMF_1_1_ELEMS = dict(_LMF_1_0_ELEMS)\n_LMF_1_1_ELEMS.update(\n    {\n        \"Requires\": \"requires\",\n        \"Extends\": \"extends\",\n        \"Pronunciation\": \"pronunciations\",\n        \"LexiconExtension\": \"lexicons\",\n        \"ExternalLexicalEntry\": \"entries\",\n        \"ExternalLemma\": \"lemma\",\n        \"ExternalForm\": \"forms\",\n        \"ExternalSense\": \"senses\",\n        \"ExternalSynset\": \"synsets\",\n    }\n)\n_VALID_ELEMS = {\n    \"1.0\": _LMF_1_0_ELEMS,\n    \"1.1\": _LMF_1_1_ELEMS,\n    \"1.2\": _LMF_1_1_ELEMS,  # no new elements\n    \"1.3\": _LMF_1_1_ELEMS,  # no new elements\n    \"1.4\": _LMF_1_1_ELEMS,  # no new elements\n}\n_LIST_ELEMS = {  # elements that collect into lists\n    \"Lexicon\",\n    \"LexicalEntry\",\n    \"Form\",\n    \"Pronunciation\",\n    \"Tag\",\n    \"Sense\",\n    \"SenseRelation\",\n    \"Example\",\n    \"Count\",\n    \"Synset\",\n    \"Definition\",\n    \"SynsetRelation\",\n    \"SyntacticBehaviour\",\n    \"LexiconExtension\",\n    \"Requires\",\n    \"ExternalLexicalEntry\",\n    \"ExternalForm\",\n    \"ExternalSense\",\n    \"ExternalSynset\",\n}\n_CDATA_ELEMS = {  # elements with inner text\n    \"Pronunciation\",\n    \"Tag\",\n    \"Definition\",\n    \"ILIDefinition\",\n    \"Example\",\n    \"Count\",\n}\n_META_ELEMS = {  # elements with metadata\n    \"Lexicon\",\n    \"LexicalEntry\",\n    \"Sense\",\n    \"SenseRelation\",\n    \"Example\",\n    \"Count\",\n    \"Synset\",\n    \"Definition\",\n    \"ILIDefinition\",\n    \"SynsetRelation\",\n    \"LexiconExtension\",\n}\n\n\n# WN-LMF Modeling ######################################################\n\n# WN-LMF type-checking is handled via TypedDicts.  Inheritance and\n# `total=False` are used to model optionality. For more information\n# about this tactic, see https://www.python.org/dev/peps/pep-0589/.\n# From Python 3.11, we can use typing.Required / typing.NotRequired.\n\n\nclass _HasId(TypedDict):\n    id: str\n\n\nclass _HasILI(TypedDict):\n    ili: str\n\n\nclass _HasSynset(TypedDict):\n    synset: str\n\n\nclass _MaybeId(TypedDict, total=False):\n    id: str\n\n\nclass _HasText(TypedDict):\n    text: str\n\n\nclass _MaybeScript(TypedDict, total=False):\n    script: str\n\n\nclass _HasMeta(TypedDict, total=False):\n    meta: Metadata | None\n\n\nclass _External(TypedDict):\n    external: Literal[True]\n\n\nclass ILIDefinition(_HasText, _HasMeta): ...\n\n\nclass Definition(_HasText, _HasMeta, total=False):\n    language: str\n    sourceSense: str\n\n\nclass Relation(_HasMeta):\n    target: str\n    relType: str\n\n\nclass Example(_HasText, _HasMeta, total=False):\n    language: str\n\n\nclass Synset(_HasId, _HasILI, _HasMeta, total=False):\n    ili_definition: ILIDefinition\n    partOfSpeech: str\n    definitions: list[Definition]\n    relations: list[Relation]\n    examples: list[Example]\n    lexicalized: bool\n    members: list[str]\n    lexfile: str\n\n\nclass ExternalSynset(_HasId, _External, total=False):\n    definitions: list[Definition]\n    relations: list[Relation]\n    examples: list[Example]\n\n\nclass Count(_HasMeta):\n    value: int\n\n\nclass Sense(_HasId, _HasSynset, _HasMeta, total=False):\n    relations: list[Relation]\n    examples: list[Example]\n    counts: list[Count]\n    n: int\n    lexicalized: bool\n    adjposition: str\n    subcat: list[str]\n\n\nclass ExternalSense(_HasId, _External, total=False):\n    relations: list[Relation]\n    examples: list[Example]\n    counts: list[Count]\n\n\nclass Pronunciation(_HasText, total=False):\n    variety: str\n    notation: str\n    phonemic: bool\n    audio: str\n\n\nclass Tag(_HasText):\n    category: str\n\n\nclass _FormChildren(TypedDict, total=False):\n    pronunciations: list[Pronunciation]\n    tags: list[Tag]\n\n\nclass Lemma(_MaybeScript, _FormChildren):\n    writtenForm: str\n    partOfSpeech: str\n\n\nclass ExternalLemma(_FormChildren, _External): ...\n\n\nclass Form(_MaybeId, _MaybeScript, _FormChildren):\n    writtenForm: str\n\n\nclass ExternalForm(_HasId, _FormChildren, _External): ...\n\n\nclass _SyntacticBehaviourBase(_MaybeId):\n    subcategorizationFrame: str\n\n\nclass SyntacticBehaviour(_SyntacticBehaviourBase, total=False):\n    senses: list[str]\n\n\nclass _LexicalEntryBase(_HasId, _HasMeta, total=False):\n    index: str\n    forms: list[Form]\n    senses: list[Sense]\n    frames: list[SyntacticBehaviour]\n\n\nclass LexicalEntry(_LexicalEntryBase):\n    lemma: Lemma\n\n\nclass ExternalLexicalEntry(_HasId, _External, total=False):\n    lemma: ExternalLemma | None\n    forms: list[Form | ExternalForm]\n    senses: list[Sense | ExternalSense]\n\n\nclass LexiconSpecifier(_HasId):  # public but not an LMF entry\n    version: str\n\n\nclass Dependency(LexiconSpecifier, total=False):\n    url: str | None\n\n\nclass _LexiconRequired(LexiconSpecifier, _HasMeta):\n    label: str\n    language: str\n    email: str\n    license: str\n\n\nclass _LexiconBase(_LexiconRequired, total=False):\n    url: str\n    citation: str\n    logo: str\n\n\nclass Lexicon(_LexiconBase, total=False):\n    requires: list[Dependency]\n    entries: list[LexicalEntry]\n    synsets: list[Synset]\n    frames: list[SyntacticBehaviour]\n\n\nclass _LexiconExtensionBase(_LexiconBase):\n    extends: Dependency\n\n\nclass LexiconExtension(_LexiconExtensionBase, total=False):\n    requires: list[Dependency]\n    entries: list[LexicalEntry | ExternalLexicalEntry]\n    synsets: list[Synset | ExternalSynset]\n    frames: list[SyntacticBehaviour]\n\n\nclass LexicalResource(TypedDict):\n    lmf_version: str\n    lexicons: list[Lexicon | LexiconExtension]\n\n\n# Reading ##############################################################\n\n\ndef is_lmf(source: AnyPath) -> bool:\n    \"\"\"Return True if *source* is a WN-LMF file.\"\"\"\n    source = Path(source).expanduser()\n    if not is_xml(source):\n        return False\n    with source.open(mode=\"rb\") as fh:\n        try:\n            _read_header(fh)\n        except LMFError:\n            return False\n    return True\n\n\ndef _read_header(fh: BinaryIO) -> str:\n    xmldecl = fh.readline().rstrip().replace(b\"'\", b'\"')\n    doctype = fh.readline().rstrip().replace(b\"'\", b'\"')\n\n    if xmldecl != _XMLDECL:\n        raise LMFError(\"invalid or missing XML declaration\")\n\n    # the XML declaration states that the file is UTF-8 (other\n    # encodings are not allowed)\n    doctype_decoded = doctype.decode(\"utf-8\")\n    if doctype_decoded not in _DOCTYPES:\n        raise LMFError(\"invalid or missing DOCTYPE declaration\")\n\n    return _DOCTYPES[doctype_decoded]\n\n\nclass ScanInfo(LexiconSpecifier):\n    label: str | None\n    extends: LexiconSpecifier | None\n\n\ndef scan_lexicons(source: AnyPath) -> list[ScanInfo]:\n    \"\"\"Scan *source* and return only the top-level lexicon info.\n\n    The returned info is a dictionary containing the `id`, `version`,\n    and `label` attributes from a lexicon. If the Lexicon is an\n    extension, an `extends` key maps to a dictionary with the `id` and\n    `version` of the base lexicon, otherwise it maps to\n    :python:`None`.\n    \"\"\"\n\n    source = Path(source).expanduser()\n    infos: list[ScanInfo] = []\n\n    lex_re = re.compile(b\"<(Lexicon|LexiconExtension|Extends)\\\\b([^>]*)>\", flags=re.M)\n    attr_re = re.compile(b\"\"\"\\\\b(id|ref|version|label)=[\"']([^\"']+)[\"']\"\"\", flags=re.M)\n\n    with open(source, \"rb\") as fh:\n        for m in lex_re.finditer(fh.read()):\n            lextype, remainder = m.groups()\n            attrs = {\n                _m.group(1).decode(\"utf-8\"): _m.group(2).decode(\"utf-8\")\n                for _m in attr_re.finditer(remainder)\n            }\n            info: ScanInfo = {\n                \"id\": attrs.get(\"id\", attrs.get(\"ref\", \"(unknown id)\")),\n                \"version\": attrs.get(\"version\", \"(unknown version)\"),\n                \"label\": attrs.get(\"label\"),\n                \"extends\": None,\n            }\n            if info[\"id\"] is None or info[\"version\"] is None:\n                raise LMFError(f\"<{lextype.decode('utf-8')}> missing id or version\")\n            if lextype != b\"Extends\":\n                infos.append(info)\n            elif len(infos) > 0:\n                infos[-1][\"extends\"] = {\"id\": info[\"id\"], \"version\": info[\"version\"]}\n            else:\n                raise LMFError(\"invalid use of <Extends> in WN-LMF file\")\n\n    return infos\n\n\n_Elem = dict[str, Any]  # basic type for the loaded XML data\n\n\ndef load(\n    source: AnyPath, progress_handler: type[ProgressHandler] | None = ProgressBar\n) -> LexicalResource:\n    \"\"\"Load wordnets encoded in the WN-LMF format.\n\n    Args:\n        source: path to a WN-LMF file\n    \"\"\"\n    source = Path(source).expanduser()\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n\n    version, num_elements = _quick_scan(source)\n    progress = progress_handler(\n        message=\"Read\", total=num_elements, refresh_interval=10000\n    )\n\n    root: dict[str, _Elem] = {}\n    parser = _make_parser(root, version, progress)\n\n    with open(source, \"rb\") as fh:\n        try:\n            parser.ParseFile(fh)\n        except xml.parsers.expat.ExpatError as exc:\n            raise LMFError(\"invalid or ill-formed XML\") from exc\n\n    progress.close()\n\n    resource: LexicalResource = {\n        \"lmf_version\": version,\n        \"lexicons\": [\n            _validate(lex) for lex in root[\"lexical-resource\"].get(\"lexicons\", [])\n        ],\n    }\n\n    return resource\n\n\ndef _quick_scan(source: Path) -> tuple[str, int]:\n    with source.open(\"rb\") as fh:\n        version = _read_header(fh)\n        # _read_header() only reads the first 2 lines\n        remainder = fh.read()\n        num_elements = remainder.count(b\"</\") + remainder.count(b\"/>\")\n    return version, num_elements\n\n\ndef _make_parser(root, version, progress):  # noqa: C901\n    stack = [root]\n    ELEMS = _VALID_ELEMS[version]\n    NS_ATTRS = _NS_ATTRS[version]\n    CDATA_ELEMS = _CDATA_ELEMS & set(ELEMS)\n    LIST_ELEMS = _LIST_ELEMS & set(ELEMS)\n\n    p = xml.parsers.expat.ParserCreate(namespace_separator=\" \")\n\n    def start(name, attrs):\n        if name in _META_ELEMS:\n            meta = {}\n            for attr in list(attrs):\n                if attr in NS_ATTRS:\n                    meta[NS_ATTRS[attr]] = attrs.pop(attr)\n            attrs[\"meta\"] = meta or None\n\n        if name in CDATA_ELEMS:\n            attrs[\"text\"] = \"\"\n\n        if name.startswith(\"External\"):\n            attrs[\"external\"] = True\n\n        parent = stack[-1]\n        key = ELEMS.get(name)\n        if name in LIST_ELEMS:\n            parent.setdefault(key, []).append(attrs)\n        elif key is None or key in parent:\n            raise _unexpected(name, p)\n        else:\n            parent[key] = attrs\n\n        stack.append(attrs)\n\n    def char_data(data):\n        parent = stack[-1]\n        if \"text\" in parent:\n            # sometimes the buffering occurs in the middle of text, so\n            # append the current data, don't just assign it\n            parent[\"text\"] += data\n\n    def end(name):\n        elem = stack.pop()\n        # normalize whitespace unless xml:space=preserve\n        if \"text\" in elem and elem.get(_XMLSPACEATTR, \"\") != \"preserve\":\n            elem[\"text\"] = \" \".join(elem[\"text\"].split())\n        progress.update(force=(name == \"LexicalResource\"))\n\n    p.StartElementHandler = start\n    p.EndElementHandler = end\n    p.CharacterDataHandler = char_data\n\n    return p\n\n\ndef _unexpected(name: str, p: xml.parsers.expat.XMLParserType) -> LMFError:\n    return LMFError(f\"unexpected element at line {p.CurrentLineNumber}: {name}\")\n\n\n# Validation ###########################################################\n\n\ndef _validate(elem: _Elem) -> Lexicon | LexiconExtension:\n    ext = elem.get(\"extends\")\n    if ext:\n        if \"ref\" in ext:\n            ext[\"id\"] = ext.pop(\"ref\")  # normalize ref to id internally\n        assert \"id\" in ext\n        assert \"version\" in ext\n        _validate_lexicon(elem, True)\n        return cast(\"LexiconExtension\", elem)\n    else:\n        _validate_lexicon(elem, False)\n        return cast(\"Lexicon\", elem)\n\n\ndef _validate_lexicon(elem: _Elem, extension: bool) -> None:\n    for attr in \"id\", \"version\", \"label\", \"language\", \"email\", \"license\":\n        assert attr in elem, f\"<Lexicon> missing required attribute: {attr}\"\n    for dep in elem.get(\"requires\", []):\n        if \"ref\" in dep:\n            dep[\"id\"] = dep.pop(\"ref\")  # normalize ref to id internally\n        assert \"id\" in dep\n        assert \"version\" in dep\n    _validate_entries(elem.get(\"entries\", []), extension)\n    _validate_synsets(elem.get(\"synsets\", []), extension)\n    _validate_frames(elem.get(\"frames\", []))\n\n\ndef _validate_entries(elems: list[_Elem], extension: bool) -> None:\n    for elem in elems:\n        assert \"id\" in elem\n        if not extension:\n            assert not elem.get(\"external\")\n        lemma = elem.get(\"lemma\")\n        if not elem.get(\"external\"):\n            assert lemma is not None\n            elem.setdefault(\"meta\")\n        # lemma and forms are the same except for partOfSpeech and id\n        if lemma is not None and not lemma.get(\"external\"):\n            assert \"partOfSpeech\" in lemma\n        for form in elem.get(\"forms\", []):\n            assert not form.get(\"external\") or form.get(\"id\")\n        _validate_forms(([lemma] if lemma else []) + elem.get(\"forms\", []), extension)\n        _validate_senses(elem.get(\"senses\", []), extension)\n        _validate_frames(elem.get(\"frames\", []))\n\n\ndef _validate_forms(elems: list[_Elem], extension: bool) -> None:\n    for elem in elems:\n        if not extension:\n            assert not elem.get(\"external\")\n        if not elem.get(\"external\"):\n            assert \"writtenForm\" in elem\n        for pron in elem.get(\"pronunciations\", []):\n            pron.setdefault(\"text\", \"\")\n            if pron.get(\"phonemic\"):\n                pron[\"phonemic\"] = pron[\"phonemic\"] != \"false\"\n        for tag in elem.get(\"tags\", []):\n            tag.setdefault(\"text\", \"\")\n            assert \"category\" in tag\n\n\ndef _validate_senses(elems: list[_Elem], extension: bool) -> None:\n    for elem in elems:\n        assert \"id\" in elem\n        if not extension:\n            assert not elem.get(\"external\")\n        if not elem.get(\"external\"):\n            assert \"synset\" in elem\n            elem.setdefault(\"meta\")\n        for rel in elem.get(\"relations\", []):\n            assert \"target\" in rel\n            assert \"relType\" in rel\n            rel.setdefault(\"meta\")\n        for ex in elem.get(\"examples\", []):\n            ex.setdefault(\"text\", \"\")\n            ex.setdefault(\"meta\")\n        for cnt in elem.get(\"counts\", []):\n            assert \"text\" in cnt\n            cnt[\"value\"] = int(cnt.pop(\"text\"))\n            cnt.setdefault(\"meta\")\n        if elem.get(\"lexicalized\"):\n            elem[\"lexicalized\"] = elem[\"lexicalized\"] != \"false\"\n        if elem.get(\"subcat\"):\n            elem[\"subcat\"] = elem[\"subcat\"].split()\n        if elem.get(\"n\"):\n            elem[\"n\"] = int(elem[\"n\"])\n\n\ndef _validate_frames(elems: list[_Elem]) -> None:\n    for elem in elems:\n        assert \"subcategorizationFrame\" in elem\n        if elem.get(\"senses\"):\n            elem[\"senses\"] = elem[\"senses\"].split()\n\n\ndef _validate_synsets(elems: list[_Elem], extension: bool) -> None:\n    for elem in elems:\n        assert \"id\" in elem\n        if not extension:\n            assert not elem.get(\"external\")\n        if not elem.get(\"external\"):\n            assert \"ili\" in elem\n            elem.setdefault(\"meta\")\n        for defn in elem.get(\"definitions\", []):\n            defn.setdefault(\"text\", \"\")\n            defn.setdefault(\"meta\")\n        for rel in elem.get(\"relations\", []):\n            assert \"target\" in rel\n            assert \"relType\" in rel\n            rel.setdefault(\"meta\")\n        for ex in elem.get(\"examples\", []):\n            ex.setdefault(\"text\", \"\")\n            ex.setdefault(\"meta\")\n        if elem.get(\"lexicalized\"):\n            elem[\"lexicalized\"] = elem[\"lexicalized\"] != \"false\"\n        if elem.get(\"members\"):\n            elem[\"members\"] = elem[\"members\"].split()\n\n\ndef _validate_metadata(elem: _Elem) -> None:\n    if elem.get(\"confidenceScore\"):\n        elem[\"confidenceScore\"] = float(elem[\"confidenceScore\"])\n\n\n# Serialization ########################################################\n\n\ndef dump(resource: LexicalResource, destination: AnyPath) -> None:\n    \"\"\"Write wordnets in the WN-LMF format.\n\n    Args:\n        lexicons: a list of :class:`Lexicon` objects\n    \"\"\"\n    version = resource[\"lmf_version\"]\n    if version not in SUPPORTED_VERSIONS:\n        raise LMFError(f\"invalid version: {version}\")\n    destination = Path(destination).expanduser()\n    doctype = _DOCTYPE.format(schema=_SCHEMAS[version])\n    dc_uri = _DC_URIS[version]\n    _version = version_info(version)\n    with destination.open(\"wt\", encoding=\"utf-8\") as out:\n        print(_XMLDECL.decode(\"utf-8\"), file=out)\n        print(doctype, file=out)\n        print(f'<LexicalResource xmlns:dc=\"{dc_uri}\">', file=out)\n        for lexicon in resource[\"lexicons\"]:\n            _dump_lexicon(lexicon, out, _version)\n        print(\"</LexicalResource>\", file=out)\n\n\ndef _dump_lexicon(\n    lexicon: Lexicon | LexiconExtension, out: TextIO, version: VersionInfo\n) -> None:\n    lexicontype = \"LexiconExtension\" if lexicon.get(\"extends\") else \"Lexicon\"\n    attrib = _build_lexicon_attrib(lexicon, version)\n    attrdelim = \"\\n\" + (\" \" * len(f\"  <{lexicontype} \"))\n    attrs = attrdelim.join(\n        f\"{attr}={quoteattr(str(val))}\" for attr, val in attrib.items()\n    )\n    print(f\"  <{lexicontype} {attrs}>\", file=out)\n\n    if version >= (1, 1):\n        if lexicontype == \"LexiconExtension\":\n            assert lexicon.get(\"extends\")\n            lexicon = cast(\"LexiconExtension\", lexicon)\n            _dump_dependency(lexicon[\"extends\"], \"Extends\", out, version)\n        for req in lexicon.get(\"requires\", []):\n            _dump_dependency(req, \"Requires\", out, version)\n\n    for entry in lexicon.get(\"entries\", []):\n        _dump_lexical_entry(entry, out, version)\n\n    for synset in lexicon.get(\"synsets\", []):\n        _dump_synset(synset, out, version)\n\n    if version >= (1, 1):\n        for sb in lexicon.get(\"frames\", []):\n            _dump_syntactic_behaviour(sb, out, version)\n\n    print(f\"  </{lexicontype}>\", file=out)\n\n\ndef _build_lexicon_attrib(\n    lexicon: Lexicon | LexiconExtension, version: VersionInfo\n) -> dict[str, str]:\n    attrib = {\n        \"id\": lexicon[\"id\"],\n        \"label\": lexicon[\"label\"],\n        \"language\": lexicon[\"language\"],\n        \"email\": lexicon[\"email\"],\n        \"license\": lexicon[\"license\"],\n        \"version\": lexicon[\"version\"],\n    }\n    if lexicon.get(\"url\"):\n        attrib[\"url\"] = lexicon[\"url\"]\n    if lexicon.get(\"citation\"):\n        attrib[\"citation\"] = lexicon[\"citation\"]\n    if version >= (1, 1) and lexicon.get(\"logo\"):\n        attrib[\"logo\"] = lexicon[\"logo\"]\n    attrib.update(_meta_dict(lexicon.get(\"meta\")))\n    return attrib\n\n\ndef _dump_dependency(\n    dep: Dependency, deptype: str, out: TextIO, version: VersionInfo\n) -> None:\n    id_ref_key = \"id\" if version < (1, 4) else \"ref\"\n    attrib = {id_ref_key: dep[\"id\"], \"version\": dep[\"version\"]}\n    if (url := dep.get(\"url\")) is not None:\n        attrib[\"url\"] = url\n    elem = ET.Element(deptype, attrib=attrib)\n    print(_tostring(elem, 2), file=out)\n\n\ndef _dump_lexical_entry(\n    entry: LexicalEntry | ExternalLexicalEntry,\n    out: TextIO,\n    version: VersionInfo,\n) -> None:\n    frames = []\n    attrib = {\"id\": entry[\"id\"]}\n    if entry.get(\"external\", False):\n        elem = ET.Element(\"ExternalLexicalEntry\", attrib=attrib)\n        if (lemma := entry.get(\"lemma\")) is not None:\n            assert lemma.get(\"external\", False)\n            elem.append(_build_lemma(lemma, version))\n    else:\n        entry = cast(\"LexicalEntry\", entry)\n        if version >= (1, 4) and entry.get(\"index\"):\n            attrib[\"index\"] = entry[\"index\"]\n        attrib.update(_meta_dict(entry.get(\"meta\")))\n        elem = ET.Element(\"LexicalEntry\", attrib=attrib)\n        elem.append(_build_lemma(entry[\"lemma\"], version))\n        if version < (1, 1):\n            frames = [\n                _build_syntactic_behaviour(sb, version)\n                for sb in entry.get(\"frames\", [])\n            ]\n    elem.extend([_build_form(form, version) for form in entry.get(\"forms\", [])])\n    elem.extend([_build_sense(sense, version) for sense in entry.get(\"senses\", [])])\n    elem.extend(frames)\n    print(_tostring(elem, 2), file=out)\n\n\ndef _build_lemma(lemma: Lemma | ExternalLemma, version: VersionInfo) -> ET.Element:\n    if lemma.get(\"external\", False):\n        elem = ET.Element(\"ExternalLemma\")\n    else:\n        lemma = cast(\"Lemma\", lemma)\n        attrib = {\"writtenForm\": lemma[\"writtenForm\"]}\n        if lemma.get(\"script\"):\n            attrib[\"script\"] = lemma[\"script\"]\n        attrib[\"partOfSpeech\"] = lemma[\"partOfSpeech\"]\n        elem = ET.Element(\"Lemma\", attrib=attrib)\n    if version >= (1, 1):\n        for pron in lemma.get(\"pronunciations\", []):\n            elem.append(_build_pronunciation(pron))\n    for tag in lemma.get(\"tags\", []):\n        elem.append(_build_tag(tag))\n    return elem\n\n\ndef _build_form(form: Form | ExternalForm, version: VersionInfo) -> ET.Element:\n    attrib = {}\n    if version >= (1, 1) and form.get(\"id\"):\n        attrib[\"id\"] = form[\"id\"]\n    if form.get(\"external\", False):\n        elem = ET.Element(\"ExternalForm\", attrib=attrib)\n    else:\n        form = cast(\"Form\", form)\n        attrib[\"writtenForm\"] = form[\"writtenForm\"]\n        if form.get(\"script\"):\n            attrib[\"script\"] = form[\"script\"]\n        elem = ET.Element(\"Form\", attrib=attrib)\n    if version >= (1, 1):\n        for pron in form.get(\"pronunciations\", []):\n            elem.append(_build_pronunciation(pron))\n    for tag in form.get(\"tags\", []):\n        elem.append(_build_tag(tag))\n    return elem\n\n\ndef _build_pronunciation(pron: Pronunciation) -> ET.Element:\n    attrib = {}\n    if pron.get(\"variety\"):\n        attrib[\"variety\"] = pron[\"variety\"]\n    if pron.get(\"notation\"):\n        attrib[\"notation\"] = pron[\"notation\"]\n    if not pron.get(\"phonemic\", True):\n        attrib[\"phonemic\"] = \"false\"\n    if pron.get(\"audio\"):\n        attrib[\"audio\"] = pron[\"audio\"]\n    elem = ET.Element(\"Pronunciation\", attrib=attrib)\n    elem.text = pron[\"text\"]\n    return elem\n\n\ndef _build_tag(tag: Tag) -> ET.Element:\n    elem = ET.Element(\"Tag\", category=tag[\"category\"])\n    elem.text = tag[\"text\"]\n    return elem\n\n\ndef _build_sense(\n    sense: Sense | ExternalSense,\n    version: VersionInfo,\n) -> ET.Element:\n    attrib = {\"id\": sense[\"id\"]}\n    if sense.get(\"external\"):\n        elem = ET.Element(\"ExternalSense\", attrib=attrib)\n    else:\n        sense = cast(\"Sense\", sense)\n        attrib[\"synset\"] = sense[\"synset\"]\n        if version >= (1, 4) and sense.get(\"n\"):\n            attrib[\"n\"] = str(sense[\"n\"])\n        attrib.update(_meta_dict(sense.get(\"meta\")))\n        if not sense.get(\"lexicalized\", True):\n            attrib[\"lexicalized\"] = \"false\"\n        if sense.get(\"adjposition\"):\n            attrib[\"adjposition\"] = sense[\"adjposition\"]\n        if version >= (1, 1) and sense.get(\"subcat\"):\n            attrib[\"subcat\"] = \" \".join(sense[\"subcat\"])\n        elem = ET.Element(\"Sense\", attrib=attrib)\n    elem.extend(\n        [_build_relation(rel, \"SenseRelation\") for rel in sense.get(\"relations\", [])]\n    )\n    elem.extend([_build_example(ex) for ex in sense.get(\"examples\", [])])\n    elem.extend([_build_count(cnt) for cnt in sense.get(\"counts\", [])])\n    return elem\n\n\ndef _build_example(example: Example) -> ET.Element:\n    attrib: dict[str, str] = {}\n    if example.get(\"language\"):\n        attrib[\"language\"] = example[\"language\"]\n    attrib.update(_meta_dict(example.get(\"meta\")))\n    elem = ET.Element(\"Example\", attrib=attrib)\n    elem.text = example[\"text\"]\n    return elem\n\n\ndef _build_count(count: Count) -> ET.Element:\n    elem = ET.Element(\"Count\", attrib=_meta_dict(count.get(\"meta\")))\n    elem.text = str(count[\"value\"])\n    return elem\n\n\ndef _dump_synset(\n    synset: Synset | ExternalSynset, out: TextIO, version: VersionInfo\n) -> None:\n    attrib: dict[str, str] = {\"id\": synset[\"id\"]}\n    if synset.get(\"external\", False):\n        elem = ET.Element(\"ExternalSynset\", attrib=attrib)\n        elem.extend([_build_definition(defn) for defn in synset.get(\"definitions\", [])])\n    else:\n        synset = cast(\"Synset\", synset)\n        attrib[\"ili\"] = synset[\"ili\"]\n        if synset.get(\"partOfSpeech\"):\n            attrib[\"partOfSpeech\"] = synset[\"partOfSpeech\"]\n        if not synset.get(\"lexicalized\", True):\n            attrib[\"lexicalized\"] = \"false\"\n        if version >= (1, 1):\n            if synset.get(\"members\"):\n                attrib[\"members\"] = \" \".join(synset[\"members\"])\n            if synset.get(\"lexfile\"):\n                attrib[\"lexfile\"] = synset[\"lexfile\"]\n        attrib.update(_meta_dict(synset.get(\"meta\")))\n        elem = ET.Element(\"Synset\", attrib=attrib)\n        elem.extend([_build_definition(defn) for defn in synset.get(\"definitions\", [])])\n        if synset.get(\"ili_definition\"):\n            elem.append(_build_ili_definition(synset[\"ili_definition\"]))\n    elem.extend(\n        [_build_relation(rel, \"SynsetRelation\") for rel in synset.get(\"relations\", [])]\n    )\n    elem.extend([_build_example(ex) for ex in synset.get(\"examples\", [])])\n    print(_tostring(elem, 2), file=out)\n\n\ndef _build_definition(definition: Definition) -> ET.Element:\n    attrib = {}\n    if definition.get(\"language\"):\n        attrib[\"language\"] = definition[\"language\"]\n    if definition.get(\"sourceSense\"):\n        attrib[\"sourceSense\"] = definition[\"sourceSense\"]\n    attrib.update(_meta_dict(definition.get(\"meta\")))\n    elem = ET.Element(\"Definition\", attrib=attrib)\n    elem.text = definition[\"text\"]\n    return elem\n\n\ndef _build_ili_definition(ili_definition: ILIDefinition) -> ET.Element:\n    elem = ET.Element(\"ILIDefinition\", attrib=_meta_dict(ili_definition.get(\"meta\")))\n    elem.text = ili_definition[\"text\"]\n    return elem\n\n\ndef _build_relation(relation: Relation, elemtype: str) -> ET.Element:\n    attrib = {\"target\": relation[\"target\"], \"relType\": relation[\"relType\"]}\n    attrib.update(_meta_dict(relation.get(\"meta\")))\n    return ET.Element(elemtype, attrib=attrib)\n\n\ndef _dump_syntactic_behaviour(\n    syntactic_behaviour: SyntacticBehaviour, out: TextIO, version: VersionInfo\n) -> None:\n    elem = _build_syntactic_behaviour(syntactic_behaviour, version)\n    print(_tostring(elem, 2), file=out)\n\n\ndef _build_syntactic_behaviour(\n    syntactic_behaviour: SyntacticBehaviour, version: VersionInfo\n) -> ET.Element:\n    attrib = {\"subcategorizationFrame\": syntactic_behaviour[\"subcategorizationFrame\"]}\n    if version >= (1, 1) and syntactic_behaviour.get(\"id\"):\n        attrib[\"id\"] = syntactic_behaviour[\"id\"]\n    elif version < (1, 1) and syntactic_behaviour.get(\"senses\"):\n        attrib[\"senses\"] = \" \".join(syntactic_behaviour[\"senses\"])\n    return ET.Element(\"SyntacticBehaviour\", attrib=attrib)\n\n\ndef _tostring(elem: ET.Element, level: int, short_empty_elements: bool = True) -> str:\n    _indent(elem, level)\n    return (\"  \" * level) + ET.tostring(\n        elem, encoding=\"unicode\", short_empty_elements=short_empty_elements\n    )\n\n\ndef _indent(elem: ET.Element, level: int) -> None:\n    self_indent = \"\\n\" + \"  \" * level\n    child_indent = self_indent + \"  \"\n    if len(elem):\n        if not elem.text or not elem.text.strip():\n            elem.text = child_indent\n        for child in elem[:-1]:\n            _indent(child, level + 1)\n            child.tail = child_indent\n        _indent(elem[-1], level + 1)\n        elem[-1].tail = self_indent\n\n\ndef _meta_dict(meta: Metadata | None) -> dict[str, str]:\n    if meta is not None:\n        # Literal keys are required for typing purposes, so first\n        # construct the dict and then remove those that weren't specified.\n        d = {\n            \"dc:contributor\": meta.get(\"contributor\", \"\"),\n            \"dc:coverage\": meta.get(\"coverage\", \"\"),\n            \"dc:creator\": meta.get(\"creator\", \"\"),\n            \"dc:date\": meta.get(\"date\", \"\"),\n            \"dc:description\": meta.get(\"description\", \"\"),\n            \"dc:format\": meta.get(\"format\", \"\"),\n            \"dc:identifier\": meta.get(\"identifier\", \"\"),\n            \"dc:publisher\": meta.get(\"publisher\", \"\"),\n            \"dc:relation\": meta.get(\"relation\", \"\"),\n            \"dc:rights\": meta.get(\"rights\", \"\"),\n            \"dc:source\": meta.get(\"source\", \"\"),\n            \"dc:subject\": meta.get(\"subject\", \"\"),\n            \"dc:title\": meta.get(\"title\", \"\"),\n            \"dc:type\": meta.get(\"type\", \"\"),\n            \"status\": meta.get(\"status\", \"\"),\n            \"note\": meta.get(\"note\", \"\"),\n        }\n        d = {key: val for key, val in d.items() if val}\n        # this one requires a conversion, so do it separately\n        if \"confidenceScore\" in meta:\n            d[\"confidenceScore\"] = str(meta[\"confidenceScore\"])\n    else:\n        d = {}\n    return d\n"
  },
  {
    "path": "wn/metrics.py",
    "content": "from wn._core import Synset, Word\n\n# Word-based Metrics\n\n\ndef ambiguity(word: Word) -> int:\n    return len(word.synsets())\n\n\ndef average_ambiguity(synset: Synset) -> float:\n    words = synset.words()\n    return sum(len(word.synsets()) for word in words) / len(words)\n"
  },
  {
    "path": "wn/morphy.py",
    "content": "\"\"\"A simple English lemmatizer that finds and removes known suffixes.\"\"\"\n\nfrom enum import Flag, auto\nfrom typing import TypeAlias\n\nimport wn\nfrom wn._types import LemmatizeResult\nfrom wn.constants import ADJ, ADJ_SAT, ADV, NOUN, PARTS_OF_SPEECH, VERB\n\nPOSExceptionMap: TypeAlias = dict[str, set[str]]\nExceptionMap: TypeAlias = dict[str, POSExceptionMap]\n\n\nclass _System(Flag):\n    \"\"\"Flags to track suffix rules in various implementations of Morphy.\"\"\"\n\n    PWN = auto()\n    NLTK = auto()\n    WN = auto()\n    ALL = PWN | NLTK | WN\n\n\n_PWN = _System.PWN\n_NLTK = _System.NLTK\n_WN = _System.WN\n_ALL = _System.ALL\n\n\nRule: TypeAlias = tuple[str, str, _System]\n\nDETACHMENT_RULES: dict[str, list[Rule]] = {\n    NOUN: [\n        (\"s\", \"\", _ALL),\n        (\"ces\", \"x\", _WN),\n        (\"ses\", \"s\", _ALL),\n        (\"ves\", \"f\", _NLTK | _WN),\n        (\"ives\", \"ife\", _WN),\n        (\"xes\", \"x\", _ALL),\n        (\"xes\", \"xis\", _WN),\n        (\"zes\", \"z\", _ALL),\n        (\"ches\", \"ch\", _ALL),\n        (\"shes\", \"sh\", _ALL),\n        (\"men\", \"man\", _ALL),\n        (\"ies\", \"y\", _ALL),\n    ],\n    VERB: [\n        (\"s\", \"\", _ALL),\n        (\"ies\", \"y\", _ALL),\n        (\"es\", \"e\", _ALL),\n        (\"es\", \"\", _ALL),\n        (\"ed\", \"e\", _ALL),\n        (\"ed\", \"\", _ALL),\n        (\"ing\", \"e\", _ALL),\n        (\"ing\", \"\", _ALL),\n    ],\n    ADJ: [\n        (\"er\", \"\", _ALL),\n        (\"est\", \"\", _ALL),\n        (\"er\", \"e\", _ALL),\n        (\"est\", \"e\", _ALL),\n    ],\n    ADV: [],\n}\nDETACHMENT_RULES[ADJ_SAT] = DETACHMENT_RULES[ADJ]\n\n\nclass Morphy:\n    \"\"\"The Morphy lemmatizer class.\n\n    Objects of this class are callables that take a wordform and an\n    optional part of speech and return a dictionary mapping parts of\n    speech to lemmas. If objects of this class are not created with a\n    :class:`wn.Wordnet` object, the returned lemmas may be invalid.\n\n    Arguments:\n        wordnet: optional :class:`wn.Wordnet` instance\n\n    Example:\n\n        >>> import wn\n        >>> from wn.morphy import Morphy\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> m = Morphy(ewn)\n        >>> m(\"axes\", pos=\"n\")\n        {'n': {'axe', 'ax', 'axis'}}\n        >>> m(\"geese\", pos=\"n\")\n        {'n': {'goose'}}\n        >>> m(\"gooses\")\n        {'n': {'goose'}, 'v': {'goose'}}\n        >>> m(\"goosing\")\n        {'v': {'goose'}}\n\n    \"\"\"\n\n    def __init__(self, wordnet: wn.Wordnet | None = None):\n        self._rules = {\n            pos: [rule for rule in rules if rule[2] & _System.WN]\n            for pos, rules in DETACHMENT_RULES.items()\n        }\n        exceptions: ExceptionMap = {pos: {} for pos in PARTS_OF_SPEECH}\n        all_lemmas: dict[str, set[str]] = {pos: set() for pos in PARTS_OF_SPEECH}\n        if wordnet:\n            for word in wordnet.words():\n                pos = word.pos\n                pos_exc = exceptions[pos]\n                lemma, *others = word.forms()\n                # store every lemma whether it has other forms or not\n                all_lemmas[pos].add(lemma)\n                # those with other forms map to the original lemmas\n                for other in others:\n                    if other in pos_exc:\n                        pos_exc[other].add(lemma)\n                    else:\n                        pos_exc[other] = {lemma}\n            self._initialized = True\n        else:\n            self._initialized = False\n        self._exceptions = exceptions\n        self._all_lemmas = all_lemmas\n\n    def __call__(self, form: str, pos: str | None = None) -> LemmatizeResult:\n        result = {}\n        if not self._initialized:\n            result[pos] = {form}  # always include original when not initialized\n\n        if pos is None:\n            pos_list = list(DETACHMENT_RULES)\n        elif pos in DETACHMENT_RULES:\n            pos_list = [pos]\n        else:\n            pos_list = []  # not handled by morphy\n\n        no_pos_forms = result.get(None, set())  # avoid unnecessary duplicates\n        for _pos in pos_list:\n            candidates = self._morphstr(form, _pos) - no_pos_forms\n            if candidates:\n                result.setdefault(_pos, set()).update(candidates)\n\n        return result\n\n    def _morphstr(self, form: str, pos: str) -> set[str]:\n        candidates: set[str] = set()\n\n        initialized = self._initialized\n        if initialized:\n            all_lemmas = self._all_lemmas[pos]\n            if form in all_lemmas:\n                candidates.add(form)\n            candidates.update(self._exceptions[pos].get(form, set()))\n        else:\n            all_lemmas = set()\n\n        for suffix, repl, _ in self._rules[pos]:\n            # avoid applying rules that perform full suppletion\n            if form.endswith(suffix) and len(suffix) < len(form):\n                candidate = f\"{form[: -len(suffix)]}{repl}\"\n                if not initialized or candidate in all_lemmas:\n                    candidates.add(candidate)\n\n        return candidates\n\n\nmorphy = Morphy()\n"
  },
  {
    "path": "wn/project.py",
    "content": "\"\"\"\nWordnet and ILI Packages and Collections\n\"\"\"\n\nimport gzip\nimport lzma\nimport shutil\nimport tarfile\nimport tempfile\nfrom collections.abc import Iterator\nfrom pathlib import Path\n\nfrom wn import ili, lmf\nfrom wn._config import ResourceType, config\nfrom wn._exceptions import Error\nfrom wn._types import AnyPath\nfrom wn._util import is_gzip, is_lzma\n\n_ADDITIONAL_FILE_SUFFIXES = (\"\", \".txt\", \".md\", \".rst\")\n\n\ndef is_package_directory(path: AnyPath) -> bool:\n    \"\"\"Return ``True`` if *path* appears to be a wordnet or ILI package.\"\"\"\n    path = Path(path).expanduser()\n    return len(_package_directory_types(path)) == 1\n\n\ndef _package_directory_types(path: Path) -> list[tuple[Path, str]]:\n    types: list[tuple[Path, str]] = []\n    if path.is_dir():\n        for p in path.iterdir():\n            typ = _resource_file_type(p)\n            if typ is not None:\n                types.append((p, typ))\n    return types\n\n\ndef _resource_file_type(path: Path) -> str | None:\n    if lmf.is_lmf(path):\n        return ResourceType.WORDNET\n    elif ili.is_ili_tsv(path):\n        return ResourceType.ILI\n    return None\n\n\ndef is_collection_directory(path: AnyPath) -> bool:\n    \"\"\"Return ``True`` if *path* appears to be a wordnet collection.\"\"\"\n    path = Path(path).expanduser()\n    return (\n        path.is_dir() and len(list(filter(is_package_directory, path.iterdir()))) >= 1\n    )\n\n\nclass Project:\n    \"\"\"The base class for packages and collections.\"\"\"\n\n    __slots__ = (\"_path\",)\n\n    def __init__(self, path: AnyPath):\n        self._path: Path = Path(path).expanduser()\n\n    @property\n    def path(self) -> Path:\n        \"\"\"The path of the project directory or resource file.\n\n        For :class:`Package` and :class:`Collection` objects, the path\n        is its directory. For :class:`ResourceOnlyPackage` objects,\n        the path is the same as from\n        :meth:`resource_file() <Package.resource_file>`\n        \"\"\"\n        return self._path\n\n    def readme(self) -> Path | None:\n        \"\"\"Return the path of the README file, or :data:`None` if none exists.\"\"\"\n        return self._find_file(self._path / \"README\", _ADDITIONAL_FILE_SUFFIXES)\n\n    def license(self) -> Path | None:\n        \"\"\"Return the path of the license, or :data:`None` if none exists.\"\"\"\n        return self._find_file(self._path / \"LICENSE\", _ADDITIONAL_FILE_SUFFIXES)\n\n    def citation(self) -> Path | None:\n        \"\"\"Return the path of the citation, or :data:`None` if none exists.\"\"\"\n        return self._find_file(self._path / \"citation\", (\".bib\",))\n\n    def _find_file(self, base: Path, suffixes: tuple[str, ...]) -> Path | None:\n        for suffix in suffixes:\n            base = base.with_suffix(suffix)\n            if base.is_file():\n                return base\n        return None\n\n\nclass Package(Project):\n    \"\"\"A wordnet or ILI package.\n\n    A package is a directory with a resource file and optional\n    metadata files.\n\n    \"\"\"\n\n    @property\n    def type(self) -> str | None:\n        \"\"\"Return the name of the type of resource contained by the package.\n\n        Valid return values are:\n        - :python:`\"wordnet\"` -- the resource is a WN-LMF lexicon file\n        - :python:`\"ili\"` -- the resource is an interlingual index file\n        - :data:`None` -- the resource type is undetermined\n        \"\"\"\n        return _resource_file_type(self.resource_file())\n\n    def resource_file(self) -> Path:\n        \"\"\"Return the path of the package's resource file.\"\"\"\n        files = _package_directory_types(self._path)\n        if not files:\n            raise Error(f\"no resource found in package: {self._path!s}\")\n        elif len(files) > 1:\n            raise Error(f\"multiple resource found in package: {self._path!s}\")\n        return files[0][0]\n\n\nclass ResourceOnlyPackage(Package):\n    \"\"\"A virtual package for a single-file resource.\n\n    This class is for resource files that are not distributed in a\n    package directory. The :meth:`readme() <Project.readme>`,\n    :meth:`license() <Project.license>`, and\n    :meth:`citation() <Project.citation>` methods all return\n    :data:`None`.\n    \"\"\"\n\n    def resource_file(self) -> Path:\n        return self._path\n\n    def readme(self):\n        return None\n\n    def license(self):\n        return None\n\n    def citation(self):\n        return None\n\n\nclass Collection(Project):\n    \"\"\"A wordnet or ILI collection\n\n    Collections are directories that contain package directories and\n    optional metadata files.\n    \"\"\"\n\n    def packages(self) -> list[Package]:\n        \"\"\"Return the list of packages in the collection.\"\"\"\n        return [\n            Package(path) for path in self._path.iterdir() if is_package_directory(path)\n        ]\n\n\ndef get_project(\n    *,\n    project: str | None = None,\n    path: AnyPath | None = None,\n) -> Project:\n    \"\"\"Return the :class:`Project` object for *project* or *path*.\n\n    The *project* argument is a project specifier and will look in the\n    download cache for the project data. If the project has not been\n    downloaded and cached, an error will be raised.\n\n    The *path* argument looks for project data at the given path. It\n    can point to a resource file, a package directory, or a collection\n    directory. Unlike :func:`iterpackages`, this function does not\n    iterate over packages within a collection, and instead the\n    :class:`Collection` object is returned.\n\n    .. note::\n\n       If the target is compressed or archived, the data will be\n       extracted to a temporary directory. It is the user's\n       responsibility to delete this temporary directory, which is\n       indicated by :data:`Project.path`.\n    \"\"\"\n    if project and path:\n        raise TypeError(\"expected a project specifier or a path, not both\")\n    if not project and not path:\n        raise TypeError(\"expected a project specifier or a path\")\n\n    if project:\n        info = config.get_project_info(project)\n        if not info[\"cache\"]:\n            raise Error(f\"{project} is not cached; try `wn.download({project!r}` first\")\n        path = info[\"cache\"]\n    assert path\n\n    proj, _ = _get_project_from_path(path)\n    return proj\n\n\ndef _get_project_from_path(\n    path: AnyPath,\n    tmp_path: Path | None = None,\n) -> tuple[Project, Path | None]:\n    path = Path(path).expanduser()\n\n    if path.is_dir():\n        if is_package_directory(path):\n            return Package(path), tmp_path\n\n        elif is_collection_directory(path):\n            return Collection(path), tmp_path\n\n        else:\n            raise Error(\n                f\"does not appear to be a valid package or collection: {path!s}\"\n            )\n\n    elif tarfile.is_tarfile(path):\n        tmpdir_ = Path(tempfile.mkdtemp())\n        with tarfile.open(path) as tar:\n            _check_tar(tar)\n            tar.extractall(path=tmpdir_)\n            contents = list(tmpdir_.iterdir())\n            if len(contents) != 1:\n                raise Error(\n                    \"archive may only have one resource, package, or collection\"\n                )\n            return _get_project_from_path(contents[0], tmp_path=tmpdir_)\n\n    else:\n        decompressed, tmp_path = _get_decompressed(path, tmp_path)\n        if lmf.is_lmf(decompressed) or ili.is_ili_tsv(decompressed):\n            return ResourceOnlyPackage(decompressed), tmp_path\n        else:\n            raise Error(f\"not a valid lexical resource: {path!s}\")\n\n\ndef iterpackages(path: AnyPath, delete: bool = True) -> Iterator[Package]:\n    \"\"\"Yield any wordnet or ILI packages found at *path*.\n\n    The *path* argument can point to one of the following:\n      - a lexical resource file or ILI file\n      - a wordnet package directory\n      - a wordnet collection directory\n      - a tar archive containing one of the above\n      - a compressed (gzip or lzma) resource file or tar archive\n\n    The *delete* argument determines whether any created temporary\n    directories will be deleted after iteration is complete. When it\n    is :data:`True`, the package objects can only be inspected during\n    iteration. If one needs persistent objects (e.g.,\n    :python:`pkgs = list(iterpackages(...))`), then set *delete* to\n    :data:`False`.\n\n    .. warning::\n\n       When *delete* is set to :data:`False`, the user is responsible\n       for cleaning up any temporary directories. The\n       :data:`Project.path` attribute indicates the path of the\n       temporary directory.\n\n    \"\"\"\n    project, tmp_path = _get_project_from_path(path)\n\n    try:\n        match project:\n            case Package():\n                yield project\n            case Collection():\n                yield from project.packages()\n            case _:\n                raise Error(f\"unexpected project type: {project.__class__.__name__}\")\n    finally:\n        if tmp_path and delete:\n            if tmp_path.is_dir():\n                shutil.rmtree(tmp_path)\n            elif tmp_path.is_file():\n                tmp_path.unlink()\n            else:\n                raise Error(f\"could not remove temporary path: {tmp_path}\")\n\n\ndef _get_decompressed(\n    source: Path,\n    tmp_path: Path | None,\n) -> tuple[Path, Path | None]:\n    gzipped = is_gzip(source)\n    xzipped = is_lzma(source)\n    if not (gzipped or xzipped):\n        return source, tmp_path\n    else:\n        tmp = tempfile.NamedTemporaryFile(suffix=\".xml\", delete=False)  # noqa: SIM115\n        path = Path(tmp.name)\n        try:\n            if gzipped:\n                with gzip.open(source, \"rb\") as gzip_src:\n                    shutil.copyfileobj(gzip_src, tmp)\n            else:  # xzipped\n                with lzma.open(source, \"rb\") as lzma_src:\n                    shutil.copyfileobj(lzma_src, tmp)\n\n            tmp.close()  # Windows cannot reliably reopen until it's closed\n\n        except (OSError, EOFError, lzma.LZMAError) as exc:\n            raise Error(f\"could not decompress file: {source}\") from exc\n\n        # if tmp_path is not None, the compressed file was in a\n        # temporary directory, so return that. Otherwise the new path\n        # becomes the tmp_path\n        return path, tmp_path or path\n\n\ndef _check_tar(tar: tarfile.TarFile) -> None:\n    \"\"\"Check the tarfile to avoid potential security issues.\n\n    Currently collections and packages have the following constraints:\n    - Only regular files or directories\n    - No paths starting with '/' or containing '..'\n    \"\"\"\n    for info in tar.getmembers():\n        if not (info.isfile() or info.isdir()):\n            raise Error(\n                f\"tarfile member is not a regular file or directory: {info.name}\"\n            )\n        if info.name.startswith(\"/\") or \"..\" in info.name:\n            raise Error(\n                f\"tarfile member paths may not be absolute or contain ..: {info.name}\"\n            )\n"
  },
  {
    "path": "wn/py.typed",
    "content": "\n"
  },
  {
    "path": "wn/schema.sql",
    "content": "\n-- ILI : Interlingual Index\n\nCREATE TABLE ilis (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT NOT NULL,\n    status_rowid INTEGER NOT NULL REFERENCES ili_statuses (rowid),\n    definition TEXT,\n    metadata META,\n    UNIQUE (id)\n);\nCREATE INDEX ili_id_index ON ilis (id);\n\nCREATE TABLE proposed_ilis (\n    rowid INTEGER PRIMARY KEY,\n    synset_rowid INTEGER REFERENCES synsets (rowid) ON DELETE CASCADE,\n    definition TEXT,\n    metadata META,\n    UNIQUE (synset_rowid)\n);\nCREATE INDEX proposed_ili_synset_rowid_index ON proposed_ilis (synset_rowid);\n\n\n-- Wordnet lexicons\n\nCREATE TABLE lexicons (\n    rowid INTEGER PRIMARY KEY,  -- unique database-internal id\n    specifier TEXT NOT NULL,    -- lexicon specifer -> id:version\n    id TEXT NOT NULL,           -- user-facing id\n    label TEXT NOT NULL,\n    language TEXT NOT NULL,     -- bcp-47 language tag\n    email TEXT NOT NULL,\n    license TEXT NOT NULL,\n    version TEXT NOT NULL,\n    url TEXT,\n    citation TEXT,\n    logo TEXT,\n    metadata META,\n    modified BOOLEAN CHECK( modified IN (0, 1) ) DEFAULT 0 NOT NULL,\n    UNIQUE (id, version),\n    UNIQUE (specifier)\n);\nCREATE INDEX lexicon_specifier_index ON lexicons (specifier);\n\nCREATE TABLE lexicon_dependencies (\n    dependent_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    provider_id TEXT NOT NULL,\n    provider_version TEXT NOT NULL,\n    provider_url TEXT,\n    provider_rowid INTEGER REFERENCES lexicons (rowid) ON DELETE SET NULL\n);\nCREATE INDEX lexicon_dependent_index ON lexicon_dependencies(dependent_rowid);\n\nCREATE TABLE lexicon_extensions (\n    extension_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    base_id TEXT NOT NULL,\n    base_version TEXT NOT NULL,\n    base_url TEXT,\n    base_rowid INTEGER REFERENCES lexicons (rowid),\n    UNIQUE (extension_rowid, base_rowid)\n);\nCREATE INDEX lexicon_extension_index ON lexicon_extensions(extension_rowid);\n\n\n-- Lexical Entries\n\nCREATE TABLE entry_index (\n    entry_rowid INTEGER NOT NULL REFERENCES entries (rowid) ON DELETE CASCADE,\n    lemma TEXT NOT NULL,\n    UNIQUE (entry_rowid)\n);\nCREATE INDEX entry_index_entry_index ON entry_index(entry_rowid);\nCREATE INDEX entry_index_lemma_index ON entry_index(lemma);\n\n/* The 'lemma' entity of a lexical entry is just a form, but it should\n   be the only form with rank = 0. After that, rank can be used to\n   indicate preference for a form. */\n\nCREATE TABLE entries (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT NOT NULL,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    pos TEXT NOT NULL,\n    metadata META,\n    UNIQUE (id, lexicon_rowid)\n);\nCREATE INDEX entry_id_index ON entries (id);\n\nCREATE TABLE forms (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    entry_rowid INTEGER NOT NULL REFERENCES entries(rowid) ON DELETE CASCADE,\n    form TEXT NOT NULL,\n    normalized_form TEXT,\n    script TEXT,\n    rank INTEGER DEFAULT 1,  -- rank 0 is the preferred lemma\n    UNIQUE (entry_rowid, form, script)\n);\nCREATE INDEX form_entry_index ON forms (entry_rowid);\nCREATE INDEX form_index ON forms (form);\nCREATE INDEX form_norm_index ON forms (normalized_form);\n\nCREATE TABLE pronunciations (\n    form_rowid INTEGER NOT NULL REFERENCES forms (rowid) ON DELETE CASCADE,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    value TEXT,\n    variety TEXT,\n    notation TEXT,\n    phonemic BOOLEAN CHECK( phonemic IN (0, 1) ) DEFAULT 1 NOT NULL,\n    audio TEXT\n);\nCREATE INDEX pronunciation_form_index ON pronunciations (form_rowid);\n\nCREATE TABLE tags (\n    form_rowid INTEGER NOT NULL REFERENCES forms (rowid) ON DELETE CASCADE,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    tag TEXT,\n    category TEXT\n);\nCREATE INDEX tag_form_index ON tags (form_rowid);\n\n\n-- Synsets\n\nCREATE TABLE synsets (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT NOT NULL,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    ili_rowid INTEGER REFERENCES ilis (rowid),\n    pos TEXT,\n    lexfile_rowid INTEGER REFERENCES lexfiles (rowid),\n    metadata META\n);\nCREATE INDEX synset_id_index ON synsets (id);\nCREATE INDEX synset_ili_rowid_index ON synsets (ili_rowid);\n\nCREATE TABLE unlexicalized_synsets (\n    synset_rowid INTEGER NOT NULL REFERENCES synsets (rowid) ON DELETE CASCADE\n);\nCREATE INDEX unlexicalized_synsets_index ON unlexicalized_synsets (synset_rowid);\n\nCREATE TABLE synset_relations (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    source_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    target_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    type_rowid INTEGER NOT NULL REFERENCES relation_types(rowid),\n    metadata META\n);\nCREATE INDEX synset_relation_source_index ON synset_relations (source_rowid);\nCREATE INDEX synset_relation_target_index ON synset_relations (target_rowid);\n\nCREATE TABLE definitions (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    synset_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    definition TEXT,\n    language TEXT,  -- bcp-47 language tag\n    sense_rowid INTEGER REFERENCES senses(rowid) ON DELETE SET NULL,\n    metadata META\n);\nCREATE INDEX definition_rowid_index ON definitions (synset_rowid);\nCREATE INDEX definition_sense_index ON definitions (sense_rowid);\n\nCREATE TABLE synset_examples (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    synset_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    example TEXT,\n    language TEXT,  -- bcp-47 language tag\n    metadata META\n);\nCREATE INDEX synset_example_rowid_index ON synset_examples(synset_rowid);\n\n\n-- Senses\n\nCREATE TABLE senses (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT NOT NULL,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    entry_rowid INTEGER NOT NULL REFERENCES entries(rowid) ON DELETE CASCADE,\n    entry_rank INTEGER DEFAULT 1,\n    synset_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    synset_rank INTEGER DEFAULT 1,\n    metadata META\n);\nCREATE INDEX sense_id_index ON senses(id);\nCREATE INDEX sense_entry_rowid_index ON senses (entry_rowid);\nCREATE INDEX sense_synset_rowid_index ON senses (synset_rowid);\n\nCREATE TABLE unlexicalized_senses (\n    sense_rowid INTEGER NOT NULL REFERENCES senses (rowid) ON DELETE CASCADE\n);\nCREATE INDEX unlexicalized_senses_index ON unlexicalized_senses (sense_rowid);\n\nCREATE TABLE sense_relations (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    source_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    target_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    type_rowid INTEGER NOT NULL REFERENCES relation_types(rowid),\n    metadata META\n);\nCREATE INDEX sense_relation_source_index ON sense_relations (source_rowid);\nCREATE INDEX sense_relation_target_index ON sense_relations (target_rowid);\n\nCREATE TABLE sense_synset_relations (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    source_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    target_rowid INTEGER NOT NULL REFERENCES synsets(rowid) ON DELETE CASCADE,\n    type_rowid INTEGER NOT NULL REFERENCES relation_types(rowid),\n    metadata META\n);\nCREATE INDEX sense_synset_relation_source_index ON sense_synset_relations (source_rowid);\nCREATE INDEX sense_synset_relation_target_index ON sense_synset_relations (target_rowid);\n\nCREATE TABLE adjpositions (\n    sense_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    adjposition TEXT NOT NULL\n);\nCREATE INDEX adjposition_sense_index ON adjpositions (sense_rowid);\n\nCREATE TABLE sense_examples (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    sense_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    example TEXT,\n    language TEXT,  -- bcp-47 language tag\n    metadata META\n);\nCREATE INDEX sense_example_index ON sense_examples (sense_rowid);\n\nCREATE TABLE counts (\n    rowid INTEGER PRIMARY KEY,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons(rowid) ON DELETE CASCADE,\n    sense_rowid INTEGER NOT NULL REFERENCES senses(rowid) ON DELETE CASCADE,\n    count INTEGER NOT NULL,\n    metadata META\n);\nCREATE INDEX count_index ON counts(sense_rowid);\n\n\n-- Syntactic Behaviours\n\nCREATE TABLE syntactic_behaviours (\n    rowid INTEGER PRIMARY KEY,\n    id TEXT,\n    lexicon_rowid INTEGER NOT NULL REFERENCES lexicons (rowid) ON DELETE CASCADE,\n    frame TEXT NOT NULL,\n    UNIQUE (lexicon_rowid, id),\n    UNIQUE (lexicon_rowid, frame)\n);\nCREATE INDEX syntactic_behaviour_id_index ON syntactic_behaviours (id);\n\nCREATE TABLE syntactic_behaviour_senses (\n    syntactic_behaviour_rowid INTEGER NOT NULL REFERENCES syntactic_behaviours (rowid) ON DELETE CASCADE,\n    sense_rowid INTEGER NOT NULL REFERENCES senses (rowid) ON DELETE CASCADE\n);\nCREATE INDEX syntactic_behaviour_sense_sb_index\n    ON syntactic_behaviour_senses (syntactic_behaviour_rowid);\nCREATE INDEX syntactic_behaviour_sense_sense_index\n    ON syntactic_behaviour_senses (sense_rowid);\n\n\n-- Lookup Tables\n\nCREATE TABLE relation_types (\n    rowid INTEGER PRIMARY KEY,\n    type TEXT NOT NULL,\n    UNIQUE (type)\n);\nCREATE INDEX relation_type_index ON relation_types (type);\n\nCREATE TABLE ili_statuses (\n    rowid INTEGER PRIMARY KEY,\n    status TEXT NOT NULL,\n    UNIQUE (status)\n);\nCREATE INDEX ili_status_index ON ili_statuses (status);\n\nCREATE TABLE lexfiles (\n    rowid INTEGER PRIMARY KEY,\n    name TEXT NOT NULL,\n    UNIQUE (name)\n);\nCREATE INDEX lexfile_index ON lexfiles (name);\n"
  },
  {
    "path": "wn/similarity.py",
    "content": "\"\"\"Synset similarity metrics.\"\"\"\n\nimport math\n\nimport wn\nfrom wn._core import Synset\nfrom wn.constants import ADJ, ADJ_SAT\nfrom wn.ic import Freq, information_content\n\n\ndef path(synset1: Synset, synset2: Synset, simulate_root: bool = False) -> float:\n    \"\"\"Return the Path similarity of *synset1* and *synset2*.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        simulate_root: When :python:`True`, a fake root node connects\n            all other roots; default: :python:`False`.\n\n    Example:\n        >>> import wn\n        >>> from wn.similarity import path\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> spatula = ewn.synsets(\"spatula\")[0]\n        >>> path(spatula, ewn.synsets(\"pancake\")[0])\n        0.058823529411764705\n        >>> path(spatula, ewn.synsets(\"utensil\")[0])\n        0.2\n        >>> path(spatula, spatula)\n        1.0\n        >>> flip = ewn.synsets(\"flip\", pos=\"v\")[0]\n        >>> turn_over = ewn.synsets(\"turn over\", pos=\"v\")[0]\n        >>> path(flip, turn_over)\n        0.0\n        >>> path(flip, turn_over, simulate_root=True)\n        0.16666666666666666\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    try:\n        path = synset1.shortest_path(synset2, simulate_root=simulate_root)\n    except wn.Error:\n        distance = float(\"inf\")\n    else:\n        distance = len(path)\n    return 1 / (distance + 1)\n\n\ndef wup(synset1: Synset, synset2: Synset, simulate_root=False) -> float:\n    \"\"\"Return the Wu-Palmer similarity of *synset1* and *synset2*.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        simulate_root: When :python:`True`, a fake root node connects\n            all other roots; default: :python:`False`.\n\n    Raises:\n        wn.Error: When no path connects the *synset1* and *synset2*.\n\n    Example:\n        >>> import wn\n        >>> from wn.similarity import wup\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> spatula = ewn.synsets(\"spatula\")[0]\n        >>> wup(spatula, ewn.synsets(\"pancake\")[0])\n        0.2\n        >>> wup(spatula, ewn.synsets(\"utensil\")[0])\n        0.8\n        >>> wup(spatula, spatula)\n        1.0\n        >>> flip = ewn.synsets(\"flip\", pos=\"v\")[0]\n        >>> turn_over = ewn.synsets(\"turn over\", pos=\"v\")[0]\n        >>> wup(flip, turn_over, simulate_root=True)\n        0.2857142857142857\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    lcs_list = _least_common_subsumers(synset1, synset2, simulate_root)\n    lcs = lcs_list[0]\n    i = len(synset1.shortest_path(lcs, simulate_root=simulate_root))\n    j = len(synset2.shortest_path(lcs, simulate_root=simulate_root))\n    k = lcs.max_depth() + 1\n    return (2 * k) / (i + j + 2 * k)\n\n\ndef lch(\n    synset1: Synset, synset2: Synset, max_depth: int, simulate_root: bool = False\n) -> float:\n    \"\"\"Return the Leacock-Chodorow similarity between *synset1* and *synset2*.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        max_depth: The taxonomy depth (see :func:`wn.taxonomy.taxonomy_depth`)\n        simulate_root: When :python:`True`, a fake root node connects\n            all other roots; default: :python:`False`.\n\n    Example:\n        >>> import wn, wn.taxonomy\n        >>> from wn.similarity import lch\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> n_depth = wn.taxonomy.taxonomy_depth(ewn, \"n\")\n        >>> spatula = ewn.synsets(\"spatula\")[0]\n        >>> lch(spatula, ewn.synsets(\"pancake\")[0], n_depth)\n        0.8043728156701697\n        >>> lch(spatula, ewn.synsets(\"utensil\")[0], n_depth)\n        2.0281482472922856\n        >>> lch(spatula, spatula, n_depth)\n        3.6375861597263857\n        >>> v_depth = taxonomy.taxonomy_depth(ewn, \"v\")\n        >>> flip = ewn.synsets(\"flip\", pos=\"v\")[0]\n        >>> turn_over = ewn.synsets(\"turn over\", pos=\"v\")[0]\n        >>> lch(flip, turn_over, v_depth, simulate_root=True)\n        1.3862943611198906\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    distance = len(synset1.shortest_path(synset2, simulate_root=simulate_root))\n    if max_depth <= 0:\n        raise wn.Error(\"max_depth must be greater than 0\")\n    return -math.log((distance + 1) / (2 * max_depth))\n\n\ndef res(synset1: Synset, synset2: Synset, ic: Freq) -> float:\n    \"\"\"Return the Resnik similarity between *synset1* and *synset2*.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        ic: Information Content weights.\n\n    Example:\n        >>> import wn, wn.ic, wn.taxonomy\n        >>> from wn.similarity import res\n        >>> pwn = wn.Wordnet(\"pwn:3.0\")\n        >>> ic = wn.ic.load(\"~/nltk_data/corpora/wordnet_ic/ic-brown.dat\", pwn)\n        >>> spatula = pwn.synsets(\"spatula\")[0]\n        >>> res(spatula, pwn.synsets(\"pancake\")[0], ic)\n        0.8017591149538994\n        >>> res(spatula, pwn.synsets(\"utensil\")[0], ic)\n        5.87738923441087\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    lcs = _most_informative_lcs(synset1, synset2, ic)\n    return information_content(lcs, ic)\n\n\ndef jcn(synset1: Synset, synset2: Synset, ic: Freq) -> float:\n    \"\"\"Return the Jiang-Conrath similarity of two synsets.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        ic: Information Content weights.\n\n    Example:\n        >>> import wn, wn.ic, wn.taxonomy\n        >>> from wn.similarity import jcn\n        >>> pwn = wn.Wordnet(\"pwn:3.0\")\n        >>> ic = wn.ic.load(\"~/nltk_data/corpora/wordnet_ic/ic-brown.dat\", pwn)\n        >>> spatula = pwn.synsets(\"spatula\")[0]\n        >>> jcn(spatula, pwn.synsets(\"pancake\")[0], ic)\n        0.04061799236354239\n        >>> jcn(spatula, pwn.synsets(\"utensil\")[0], ic)\n        0.10794048564613007\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    ic1 = information_content(synset1, ic)\n    ic2 = information_content(synset2, ic)\n    lcs = _most_informative_lcs(synset1, synset2, ic)\n    ic_lcs = information_content(lcs, ic)\n    if ic1 == ic2 == ic_lcs == 0:\n        return 0\n    elif ic1 + ic2 == 2 * ic_lcs:\n        return float(\"inf\")\n    else:\n        return 1 / (ic1 + ic2 - 2 * ic_lcs)\n\n\ndef lin(synset1: Synset, synset2: Synset, ic: Freq) -> float:\n    \"\"\"Return the Lin similarity of two synsets.\n\n    Arguments:\n        synset1: The first synset to compare.\n        synset2: The second synset to compare.\n        ic: Information Content weights.\n\n    Example:\n        >>> import wn, wn.ic, wn.taxonomy\n        >>> from wn.similarity import lin\n        >>> pwn = wn.Wordnet(\"pwn:3.0\")\n        >>> ic = wn.ic.load(\"~/nltk_data/corpora/wordnet_ic/ic-brown.dat\", pwn)\n        >>> spatula = pwn.synsets(\"spatula\")[0]\n        >>> lin(spatula, pwn.synsets(\"pancake\")[0], ic)\n        0.061148956278604116\n        >>> lin(spatula, pwn.synsets(\"utensil\")[0], ic)\n        0.5592415686750427\n\n    \"\"\"\n    _check_if_pos_compatible(synset1.pos, synset2.pos)\n    lcs = _most_informative_lcs(synset1, synset2, ic)\n    ic1 = information_content(synset1, ic)\n    ic2 = information_content(synset2, ic)\n    if ic1 == 0 or ic2 == 0:\n        return 0.0\n    return 2 * information_content(lcs, ic) / (ic1 + ic2)\n\n\n# Helper functions\n\n\ndef _least_common_subsumers(\n    synset1: Synset, synset2: Synset, simulate_root: bool\n) -> list[Synset]:\n    lcs = synset1.lowest_common_hypernyms(synset2, simulate_root=simulate_root)\n    if not lcs:\n        raise wn.Error(f\"no common hypernyms for {synset1!r} and {synset2!r}\")\n    return lcs\n\n\ndef _most_informative_lcs(synset1: Synset, synset2: Synset, ic: Freq) -> Synset:\n    pos_ic = ic[synset1.pos]\n    lcs = _least_common_subsumers(synset1, synset2, False)\n    return max(lcs, key=lambda ss: pos_ic[ss.id])\n\n\ndef _check_if_pos_compatible(pos1: str, pos2: str) -> None:\n    _pos1 = ADJ if pos1 == ADJ_SAT else pos1\n    _pos2 = ADJ if pos2 == ADJ_SAT else pos2\n    if _pos1 != _pos2:\n        raise wn.Error(\"synsets must have the same part of speech\")\n"
  },
  {
    "path": "wn/taxonomy.py",
    "content": "\"\"\"Functions for working with hypernym/hyponym taxonomies.\"\"\"\n\nfrom __future__ import annotations\n\nimport wn\nfrom wn._util import flatten\nfrom wn.constants import ADJ, ADJ_SAT\n\n_FAKE_ROOT = \"*ROOT*\"\n\n\ndef roots(wordnet: wn.Wordnet, pos: str | None = None) -> list[wn.Synset]:\n    \"\"\"Return the list of root synsets in *wordnet*.\n\n    Arguments:\n\n        wordnet: The wordnet from which root synsets are found.\n\n        pos: If given, only return synsets with the specified part of\n            speech.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> len(wn.taxonomy.roots(ewn, pos=\"v\"))\n        573\n\n\n    \"\"\"\n    return [ss for ss in _synsets_for_pos(wordnet, pos) if not ss.hypernyms()]\n\n\ndef leaves(wordnet: wn.Wordnet, pos: str | None = None) -> list[wn.Synset]:\n    \"\"\"Return the list of leaf synsets in *wordnet*.\n\n    Arguments:\n\n        wordnet: The wordnet from which leaf synsets are found.\n\n        pos: If given, only return synsets with the specified part of\n            speech.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> len(wn.taxonomy.leaves(ewn, pos=\"v\"))\n        10525\n\n    \"\"\"\n    return [ss for ss in _synsets_for_pos(wordnet, pos) if not ss.hyponyms()]\n\n\ndef taxonomy_depth(wordnet: wn.Wordnet, pos: str) -> int:\n    \"\"\"Return the maximum depth of the taxonomy for the given part of speech.\n\n    Arguments:\n\n        wordnet: The wordnet for which the taxonomy depth will be\n            calculated.\n\n        pos: The part of speech for which the taxonomy depth will be\n            calculated.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> ewn = wn.Wordnet(\"ewn:2020\")\n        >>> wn.taxonomy.taxonomy_depth(ewn, \"n\")\n        19\n\n    \"\"\"\n    seen: set[wn.Synset] = set()\n    depth = 0\n    for ss in _synsets_for_pos(wordnet, pos):\n        if all(hyp in seen for hyp in ss.hypernyms()):\n            continue\n        paths = ss.hypernym_paths()\n        if paths:\n            depth = max(depth, max(len(path) for path in paths))\n            seen.update(hyp for path in paths for hyp in path)\n    return depth\n\n\ndef _synsets_for_pos(wordnet: wn.Wordnet, pos: str | None) -> list[wn.Synset]:\n    \"\"\"Get the list of synsets for a part of speech. If *pos* is 'a' or\n    's', also include those for the other.\n\n    \"\"\"\n    synsets = wordnet.synsets(pos=pos)\n    if pos == ADJ:\n        synsets.extend(wordnet.synsets(pos=ADJ_SAT))\n    elif pos == ADJ_SAT:\n        synsets.extend(wordnet.synsets(pos=ADJ))\n    return synsets\n\n\ndef _hypernym_paths(\n    synset: wn.Synset,\n    simulate_root: bool,\n    include_self: bool,\n) -> list[list[wn.Synset]]:\n    paths = list(synset.relation_paths(\"hypernym\", \"instance_hypernym\"))\n    if include_self:\n        paths = [[synset, *path] for path in paths] or [[synset]]\n    if simulate_root and synset.id != _FAKE_ROOT:\n        root = wn.Synset.empty(\n            id=_FAKE_ROOT, _lexicon=synset._lexicon, _lexconf=synset._lexconf\n        )\n        paths = [[*path, root] for path in paths] or [[root]]\n    return paths\n\n\ndef hypernym_paths(\n    synset: wn.Synset,\n    simulate_root: bool = False,\n) -> list[list[wn.Synset]]:\n    \"\"\"Return the list of hypernym paths to a root synset.\n\n    Arguments:\n\n        synset: The starting synset for paths to a root.\n\n        simulate_root: If :python:`True`, find the path to a simulated\n            root node.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = wn.synsets(\"dog\", pos=\"n\")[0]\n        >>> for path in wn.taxonomy.hypernym_paths(dog):\n        ...     for i, ss in enumerate(path):\n        ...         print(\" \" * i, ss, ss.lemmas()[0])\n         Synset('pwn-02083346-n') canine\n          Synset('pwn-02075296-n') carnivore\n           Synset('pwn-01886756-n') eutherian mammal\n            Synset('pwn-01861778-n') mammalian\n             Synset('pwn-01471682-n') craniate\n              Synset('pwn-01466257-n') chordate\n               Synset('pwn-00015388-n') animal\n                Synset('pwn-00004475-n') organism\n                 Synset('pwn-00004258-n') animate thing\n                  Synset('pwn-00003553-n') unit\n                   Synset('pwn-00002684-n') object\n                    Synset('pwn-00001930-n') physical entity\n                     Synset('pwn-00001740-n') entity\n         Synset('pwn-01317541-n') domesticated animal\n          Synset('pwn-00015388-n') animal\n           Synset('pwn-00004475-n') organism\n            Synset('pwn-00004258-n') animate thing\n             Synset('pwn-00003553-n') unit\n              Synset('pwn-00002684-n') object\n               Synset('pwn-00001930-n') physical entity\n                Synset('pwn-00001740-n') entity\n\n    \"\"\"\n    return _hypernym_paths(synset, simulate_root, False)\n\n\ndef min_depth(synset: wn.Synset, simulate_root: bool = False) -> int:\n    \"\"\"Return the minimum taxonomy depth of the synset.\n\n    Arguments:\n\n        synset: The starting synset for paths to a root.\n\n        simulate_root: If :python:`True`, find the depth to a\n            simulated root node.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = wn.synsets(\"dog\", pos=\"n\")[0]\n        >>> wn.taxonomy.min_depth(dog)\n        8\n\n    \"\"\"\n    return min(\n        (len(path) for path in synset.hypernym_paths(simulate_root=simulate_root)),\n        default=0,\n    )\n\n\ndef max_depth(synset: wn.Synset, simulate_root: bool = False) -> int:\n    \"\"\"Return the maximum taxonomy depth of the synset.\n\n    Arguments:\n\n        synset: The starting synset for paths to a root.\n\n        simulate_root: If :python:`True`, find the depth to a\n            simulated root node.\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = wn.synsets(\"dog\", pos=\"n\")[0]\n        >>> wn.taxonomy.max_depth(dog)\n        13\n\n    \"\"\"\n    return max(\n        (len(path) for path in synset.hypernym_paths(simulate_root=simulate_root)),\n        default=0,\n    )\n\n\ndef _shortest_hyp_paths(\n    synset: wn.Synset,\n    other: wn.Synset,\n    simulate_root: bool,\n) -> dict[tuple[wn.Synset, int], list[wn.Synset]]:\n    if synset == other:\n        return {(synset, 0): []}\n\n    from_self = _hypernym_paths(synset, simulate_root, True)\n    from_other = _hypernym_paths(other, simulate_root, True)\n    common = set(flatten(from_self)).intersection(flatten(from_other))\n\n    if not common:\n        return {}\n\n    # Compute depths of common hypernyms from their distances.\n    # Doing this now avoid more expensive lookups later.\n    depths: dict[wn.Synset, int] = {}\n    # subpaths accumulates paths to common hypernyms from both sides\n    subpaths: dict[wn.Synset, tuple[list[list[wn.Synset]], list[list[wn.Synset]]]]\n    subpaths = {ss: ([], []) for ss in common}\n    for which, paths in (0, from_self), (1, from_other):\n        for path in paths:\n            for dist, ss in enumerate(path):\n                if ss in common:\n                    # synset or other subpath to ss (not including ss)\n                    subpaths[ss][which].append(path[: dist + 1])\n                    # keep maximum depth\n                    depth = len(path) - dist - 1\n                    if ss not in depths or depths[ss] < depth:\n                        depths[ss] = depth\n\n    shortest: dict[tuple[wn.Synset, int], list[wn.Synset]] = {}\n    for ss in common:\n        from_self_subpaths, from_other_subpaths = subpaths[ss]\n        shortest_from_self = min(from_self_subpaths, key=len)\n        # for the other path, we need to reverse it and remove the pivot synset\n        # (ty doesn't infer the result of min() correctly, hence the ignore)\n        shortest_from_other = min(from_other_subpaths, key=len)[-2::-1]  # type: ignore\n        shortest[(ss, depths[ss])] = shortest_from_self + shortest_from_other\n\n    return shortest\n\n\ndef shortest_path(\n    synset: wn.Synset,\n    other: wn.Synset,\n    simulate_root: bool = False,\n) -> list[wn.Synset]:\n    \"\"\"Return the shortest path from *synset* to the *other* synset.\n\n    Arguments:\n        other: endpoint synset of the path\n        simulate_root: if :python:`True`, ensure any two synsets\n          are always connected by positing a fake root node\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = ewn.synsets(\"dog\", pos=\"n\")[0]\n        >>> squirrel = ewn.synsets(\"squirrel\", pos=\"n\")[0]\n        >>> for ss in wn.taxonomy.shortest_path(dog, squirrel):\n        ...     print(ss.lemmas())\n        ['canine', 'canid']\n        ['carnivore']\n        ['eutherian mammal', 'placental', 'placental mammal', 'eutherian']\n        ['rodent', 'gnawer']\n        ['squirrel']\n\n    \"\"\"\n    pathmap = _shortest_hyp_paths(synset, other, simulate_root)\n    key = min(pathmap, key=lambda key: len(pathmap[key]), default=None)\n    if key is None:\n        raise wn.Error(f\"no path between {synset!r} and {other!r}\")\n    return pathmap[key][1:]\n\n\ndef common_hypernyms(\n    synset: wn.Synset,\n    other: wn.Synset,\n    simulate_root: bool = False,\n) -> list[wn.Synset]:\n    \"\"\"Return the common hypernyms for the current and *other* synsets.\n\n    Arguments:\n        other: synset that is a hyponym of any shared hypernyms\n        simulate_root: if :python:`True`, ensure any two synsets\n          always share a hypernym by positing a fake root node\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = ewn.synsets(\"dog\", pos=\"n\")[0]\n        >>> squirrel = ewn.synsets(\"squirrel\", pos=\"n\")[0]\n        >>> for ss in wn.taxonomy.common_hypernyms(dog, squirrel):\n        ...     print(ss.lemmas())\n        ['entity']\n        ['physical entity']\n        ['object', 'physical object']\n        ['unit', 'whole']\n        ['animate thing', 'living thing']\n        ['organism', 'being']\n        ['fauna', 'beast', 'animate being', 'brute', 'creature', 'animal']\n        ['chordate']\n        ['craniate', 'vertebrate']\n        ['mammalian', 'mammal']\n        ['eutherian mammal', 'placental', 'placental mammal', 'eutherian']\n\n    \"\"\"\n    from_self = _hypernym_paths(synset, simulate_root, True)\n    from_other = _hypernym_paths(other, simulate_root, True)\n    common = set(flatten(from_self)).intersection(flatten(from_other))\n    return sorted(common, key=lambda ss: ss.id)\n\n\ndef lowest_common_hypernyms(\n    synset: wn.Synset,\n    other: wn.Synset,\n    simulate_root: bool = False,\n) -> list[wn.Synset]:\n    \"\"\"Return the common hypernyms furthest from the root.\n\n    Arguments:\n        other: synset that is a hyponym of any shared hypernyms\n        simulate_root: if :python:`True`, ensure any two synsets\n          always share a hypernym by positing a fake root node\n\n    Example:\n\n        >>> import wn, wn.taxonomy\n        >>> dog = ewn.synsets(\"dog\", pos=\"n\")[0]\n        >>> squirrel = ewn.synsets(\"squirrel\", pos=\"n\")[0]\n        >>> len(wn.taxonomy.lowest_common_hypernyms(dog, squirrel))\n        1\n        >>> wn.taxonomy.lowest_common_hypernyms(dog, squirrel)[0].lemmas()\n        ['eutherian mammal', 'placental', 'placental mammal', 'eutherian']\n\n    \"\"\"\n    pathmap = _shortest_hyp_paths(synset, other, simulate_root)\n    # keys of pathmap are (synset, depth_of_synset)\n    max_depth: int = max([depth for _, depth in pathmap], default=-1)\n    if max_depth == -1:\n        return []\n    else:\n        return [ss for ss, d in pathmap if d == max_depth]\n"
  },
  {
    "path": "wn/util.py",
    "content": "\"\"\"Wn utility classes.\"\"\"\n\nimport sys\nfrom collections.abc import Callable\nfrom typing import TextIO\n\n\ndef synset_id_formatter(fmt: str = \"{prefix}-{offset:08}-{pos}\", **kwargs) -> Callable:\n    \"\"\"Return a function for formatting synset ids.\n\n    The *fmt* argument can be customized. It will be formatted using\n    any other keyword arguments given to this function and any given\n    to the resulting function. By default, the format string expects a\n    ``prefix`` string argument for the namespace (such as a lexicon\n    id), an ``offset`` integer argument (such as a WNDB offset), and a\n    ``pos`` string argument.\n\n    Arguments:\n        fmt: A Python format string\n        **kwargs: Keyword arguments for the format string.\n\n    Example:\n\n        >>> pwn_synset_id = synset_id_formatter(prefix=\"pwn\")\n        >>> pwn_synset_id(offset=1174, pos=\"n\")\n        'pwn-00001174-n'\n\n    \"\"\"\n\n    def format_synset_id(**_kwargs) -> str:\n        return fmt.format(**kwargs, **_kwargs)\n\n    return format_synset_id\n\n\nclass ProgressHandler:\n    \"\"\"An interface for updating progress in long-running processes.\n\n    Long-running processes in Wn, such as :func:`wn.download` and\n    :func:`wn.add`, call to a progress handler object as they go.  The\n    default progress handler used by Wn is :class:`ProgressBar`, which\n    updates progress by formatting and printing a textual bar to\n    stderr. The :class:`ProgressHandler` class may be used directly,\n    which does nothing, or users may create their own subclasses for,\n    e.g., updating a GUI or some other handler.\n\n    The initialization parameters, except for ``file``, are stored in\n    a :attr:`kwargs` member and may be updated after the handler is\n    created through the :meth:`set` method. The :meth:`update` method\n    is the primary way a counter is updated. The :meth:`flash` method\n    is sometimes called for simple messages. When the process is\n    complete, the :meth:`close` method is called, optionally with a\n    message.\n\n    \"\"\"\n\n    def __init__(\n        self,\n        *,\n        message: str = \"\",\n        count: int = 0,\n        total: int = 0,\n        refresh_interval: int = 0,\n        unit: str = \"\",\n        status: str = \"\",\n        file: TextIO = sys.stderr,\n    ):\n        self.file = file\n        self.kwargs = {\n            \"count\": count,\n            \"total\": total,\n            \"refresh_interval\": refresh_interval,\n            \"message\": message,\n            \"unit\": unit,\n            \"status\": status,\n        }\n        self._refresh_quota: int = refresh_interval\n\n    def update(self, n: int = 1, force: bool = False) -> None:\n        \"\"\"Update the counter with the increment value *n*.\n\n        This method should update the ``count`` key of :attr:`kwargs`\n        with the increment value *n*. After this, it is expected to\n        update some user-facing progress indicator.\n\n        If *force* is :python:`True`, any indicator will be refreshed\n        regardless of the value of the refresh interval.\n\n        \"\"\"\n        self.kwargs[\"count\"] += n  # type: ignore\n\n    def set(self, **kwargs) -> None:\n        \"\"\"Update progress handler parameters.\n\n        Calling this method also runs :meth:`update` with an increment\n        of 0, which causes a refresh of any indicator without changing\n        the counter.\n\n        \"\"\"\n        self.kwargs.update(**kwargs)\n        self.update(0, force=True)\n\n    def flash(self, message: str) -> None:\n        \"\"\"Issue a message unrelated to the current counter.\n\n        This may be useful for multi-stage processes to indicate the\n        move to a new stage, or to log unexpected situations.\n\n        \"\"\"\n        pass\n\n    def close(self) -> None:\n        \"\"\"Close the progress handler.\n\n        This might be useful for closing file handles or cleaning up\n        resources.\n\n        \"\"\"\n        pass\n\n\nclass ProgressBar(ProgressHandler):\n    \"\"\"A :class:`ProgressHandler` subclass for printing a progress bar.\n\n    Example:\n        >>> p = ProgressBar(message=\"Progress: \", total=10, unit=\" units\")\n        >>> p.update(3)\n        Progress: [#########                     ] (3/10 units)\n\n    See :meth:`format` for a description of how the progress bar is\n    formatted.\n\n    \"\"\"\n\n    #: The default formatting template.\n    FMT = \"\\r{message}{bar}{counter}{status}\"\n\n    def update(self, n: int = 1, force: bool = False) -> None:\n        \"\"\"Increment the count by *n* and print the reformatted bar.\"\"\"\n        self.kwargs[\"count\"] += n  # type: ignore\n        self._refresh_quota -= n\n        if force or self._refresh_quota <= 0:\n            self._refresh_quota = self.kwargs[\"refresh_interval\"]  # type: ignore\n            s = self.format()\n            if self.file:\n                print(\"\\r\\033[K\", end=\"\", file=self.file)\n                print(s, end=\"\", file=self.file)\n\n    def format(self) -> str:\n        \"\"\"Format and return the progress bar.\n\n        The bar is is formatted according to :attr:`FMT`, using\n        variables from :attr:`kwargs` and two computed variables:\n\n        - ``bar``: visualization of the progress bar, empty when\n          ``total`` is 0\n\n        - ``counter``: display of ``count``, ``total``, and ``units``\n\n        >>> p = ProgressBar(message=\"Progress\", count=2, total=10, unit=\"K\")\n        >>> p.format()\n        '\\\\rProgress [######                        ] (2/10K) '\n        >>> p = ProgressBar(count=2, status=\"Counting...\")\n        >>> p.format()\n        '\\\\r (2) Counting...'\n\n        \"\"\"\n        _kw = self.kwargs\n        width = 30\n        total: int = _kw[\"total\"]  # type: ignore\n        count: int = _kw[\"count\"]  # type: ignore\n\n        if total > 0:\n            num = min(count, total) * width\n            fill = (num // total) * \"#\"\n            part = ((num % total) * 3) // total\n            if part:\n                fill += \"-=\"[part - 1]\n            bar = f\" [{fill:<{width}}]\"\n            counter = f\" ({count}/{total}{_kw['unit']}) \"\n        else:\n            bar = \"\"\n            counter = f\" ({count}{_kw['unit']}) \"\n\n        return self.FMT.format(bar=bar, counter=counter, **_kw)\n\n    def flash(self, message: str) -> None:\n        \"\"\"Overwrite the progress bar with *message*.\"\"\"\n        print(f\"\\r\\033[K{message}\", end=\"\", file=self.file)\n\n    def close(self) -> None:\n        \"\"\"Print a newline so the last printed bar remains on screen.\"\"\"\n        print(file=self.file)\n"
  },
  {
    "path": "wn/validate.py",
    "content": "\"\"\"Wordnet lexicon validation.\n\nThis module is for checking whether the the contents of a lexicon are\nvalid according to a series of checks. Those checks are:\n\n====  ==========================================================\nCode  Message\n====  ==========================================================\nE101  ID is not unique within the lexicon.\nW201  Lexical entry has no senses.\nW202  Redundant sense between lexical entry and synset.\nW203  Redundant lexical entry with the same lemma and synset.\nE204  Synset of sense is missing.\nW301  Synset is empty (not associated with any lexical entries).\nW302  ILI is repeated across synsets.\nW303  Proposed ILI is missing a definition.\nW304  Existing ILI has a spurious definition.\nW305  Synset has a blank definition.\nW306  Synset has a blank example.\nW307  Synset repeats an existing definition.\nE401  Relation target is missing or invalid.\nW402  Relation type is invalid for the source and target.\nW403  Redundant relation between source and target.\nW404  Reverse relation is missing.\nW501  Synset's part-of-speech is different from its hypernym's.\nW502  Relation is a self-loop.\n====  ==========================================================\n\n\"\"\"\n\nfrom collections import Counter\nfrom collections.abc import (\n    Callable,\n    Iterator,\n    Sequence,\n)\nfrom itertools import chain\nfrom typing import TypedDict, cast\n\nfrom wn import lmf\nfrom wn.constants import (\n    REVERSE_RELATIONS,\n    SENSE_RELATIONS,\n    SENSE_SYNSET_RELATIONS,\n    SYNSET_RELATIONS,\n)\nfrom wn.util import ProgressBar, ProgressHandler\n\n_Ids = dict[str, Counter]\n_Result = dict[str, dict]\n_CheckFunction = Callable[[lmf.Lexicon, _Ids], _Result]\n\n\nclass _Check(TypedDict):\n    message: str\n    items: _Result\n\n\n_Report = dict[str, _Check]\n\n\ndef _non_unique_id(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"ID is not unique within the lexicon\"\"\"\n    return _multiples(\n        chain(\n            [lex[\"id\"]],\n            (f[\"id\"] for e in _entries(lex) for f in _forms(e) if f.get(\"id\")),\n            (sb[\"id\"] for sb in lex.get(\"frames\", []) if sb.get(\"id\")),\n            ids[\"entry\"].elements(),\n            ids[\"sense\"].elements(),\n            ids[\"synset\"].elements(),\n        )\n    )\n\n\ndef _has_no_senses(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"lexical entry has no senses\"\"\"\n    return {e[\"id\"]: {} for e in _entries(lex) if not _senses(e)}\n\n\ndef _redundant_sense(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"redundant sense between lexical entry and synset\"\"\"\n    result: _Result = {}\n    for e in _entries(lex):\n        redundant = _multiples(s[\"synset\"] for s in _senses(e))\n        result.update(\n            (s[\"id\"], {\"entry\": e[\"id\"], \"synset\": s[\"synset\"]})\n            for s in _senses(e)\n            if s[\"synset\"] in redundant\n        )\n    return result\n\n\ndef _redundant_entry(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"redundant lexical entry with the same lemma and synset\"\"\"\n    redundant = _multiples(\n        (e[\"lemma\"][\"writtenForm\"], s[\"synset\"])\n        for e in _entries(lex)\n        for s in _senses(e)\n    )\n    return {form: {\"synset\": synset} for form, synset in redundant}\n\n\ndef _missing_synset(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset of sense is missing\"\"\"\n    synset_ids = ids[\"synset\"]\n    return {\n        s[\"id\"]: {\"synset\": s[\"synset\"]}\n        for e in _entries(lex)\n        for s in _senses(e)\n        if s[\"synset\"] not in synset_ids\n    }\n\n\ndef _empty_synset(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset is empty (not associated with any lexical entries)\"\"\"\n    synsets = {s[\"synset\"] for e in _entries(lex) for s in _senses(e)}\n    return {ss[\"id\"]: {} for ss in _synsets(lex) if ss[\"id\"] not in synsets}\n\n\ndef _repeated_ili(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"ILI is repeated across synsets\"\"\"\n    repeated = _multiples(\n        ss[\"ili\"] for ss in _synsets(lex) if ss[\"ili\"] and ss[\"ili\"] != \"in\"\n    )\n    return {\n        ss[\"id\"]: {\"ili\": ss[\"ili\"]} for ss in _synsets(lex) if ss[\"ili\"] in repeated\n    }\n\n\ndef _missing_ili_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"proposed ILI is missing a definition\"\"\"\n    return {\n        ss[\"id\"]: {}\n        for ss in _synsets(lex)\n        if ss[\"ili\"] == \"in\" and not ss.get(\"ili_definition\")\n    }\n\n\ndef _spurious_ili_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"existing ILI has a spurious definition\"\"\"\n    return {\n        ss[\"id\"]: {\"ili_definitin\": ss[\"ili_definition\"]}\n        for ss in _synsets(lex)\n        if ss[\"ili\"] and ss[\"ili\"] != \"in\" and ss.get(\"ili_definition\")\n    }\n\n\ndef _blank_synset_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset has a blank definition\"\"\"\n    return {\n        ss[\"id\"]: {}\n        for ss in _synsets(lex)\n        if any(dfn[\"text\"].strip() == \"\" for dfn in ss.get(\"definitions\", []))\n    }\n\n\ndef _blank_synset_example(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset has a blank example\"\"\"\n    return {\n        ss[\"id\"]: {}\n        for ss in _synsets(lex)\n        if any(ex[\"text\"].strip() == \"\" for ex in ss.get(\"examples\", []))\n    }\n\n\ndef _repeated_synset_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset repeats an existing definition\"\"\"\n    repeated = _multiples(\n        dfn[\"text\"] for ss in _synsets(lex) for dfn in ss.get(\"definitions\", [])\n    )\n    return {\n        ss[\"id\"]: {}\n        for ss in _synsets(lex)\n        if any(dfn[\"text\"] in repeated for dfn in ss.get(\"definitions\", []))\n    }\n\n\ndef _missing_relation_target(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"relation target is missing or invalid\"\"\"\n    result = {\n        s[\"id\"]: {\"type\": r[\"relType\"], \"target\": r[\"target\"]}\n        for s, r in _sense_relations(lex)\n        if r[\"target\"] not in ids[\"sense\"] and r[\"target\"] not in ids[\"synset\"]\n    }\n    result.update(\n        (ss[\"id\"], {\"type\": r[\"relType\"], \"target\": r[\"target\"]})\n        for ss, r in _synset_relations(lex)\n        if r[\"target\"] not in ids[\"synset\"]\n    )\n    return result\n\n\ndef _invalid_relation_type(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"relation type is invalid for the source and target\"\"\"\n    result = {\n        s[\"id\"]: {\"type\": r[\"relType\"], \"target\": r[\"target\"]}\n        for s, r in _sense_relations(lex)\n        if (r[\"target\"] in ids[\"sense\"] and r[\"relType\"] not in SENSE_RELATIONS)\n        or (r[\"target\"] in ids[\"synset\"] and r[\"relType\"] not in SENSE_SYNSET_RELATIONS)\n    }\n    result.update(\n        (ss[\"id\"], {\"type\": r[\"relType\"], \"target\": r[\"target\"]})\n        for ss, r in _synset_relations(lex)\n        if r[\"relType\"] not in SYNSET_RELATIONS\n    )\n    return result\n\n\ndef _redundant_relation(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"redundant relation between source and target\"\"\"\n    redundant = _multiples(\n        chain(\n            (\n                (s[\"id\"], r[\"relType\"], r[\"target\"], _get_dc_type(r))\n                for s, r in _sense_relations(lex)\n            ),\n            (\n                (ss[\"id\"], r[\"relType\"], r[\"target\"], _get_dc_type(r))\n                for ss, r in _synset_relations(lex)\n            ),\n        )\n    )\n    return {\n        src: ({\"type\": typ, \"target\": tgt} | ({\"dc:type\": dctyp} if dctyp else {}))\n        for src, typ, tgt, dctyp in redundant\n    }\n\n\ndef _missing_reverse_relation(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"reverse relation is missing\"\"\"\n    regular = {\n        (s[\"id\"], r[\"relType\"], r[\"target\"])\n        for s, r in _sense_relations(lex)\n        if r[\"target\"] in ids[\"sense\"]\n    }\n    regular.update(\n        (ss[\"id\"], r[\"relType\"], r[\"target\"]) for ss, r in _synset_relations(lex)\n    )\n    return {\n        tgt: {\"type\": REVERSE_RELATIONS[typ], \"target\": src}\n        for src, typ, tgt in regular\n        if typ in REVERSE_RELATIONS\n        and (tgt, REVERSE_RELATIONS[typ], src) not in regular\n    }\n\n\ndef _hypernym_wrong_pos(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"synset's part-of-speech is different from its hypernym's\"\"\"\n    sspos = {ss[\"id\"]: ss.get(\"partOfSpeech\") for ss in _synsets(lex)}\n    return {\n        ss[\"id\"]: {\"type\": r[\"relType\"], \"target\": r[\"target\"]}\n        for ss, r in _synset_relations(lex)\n        if r[\"relType\"] == \"hypernym\" and ss.get(\"partOfSpeech\") != sspos[r[\"target\"]]\n    }\n\n\ndef _self_loop(lex: lmf.Lexicon, ids: _Ids) -> _Result:\n    \"\"\"relation is a self-loop\"\"\"\n    relations = chain(_sense_relations(lex), _synset_relations(lex))\n    return {\n        x[\"id\"]: {\"type\": r[\"relType\"], \"target\": r[\"target\"]}\n        for x, r in relations\n        if x[\"id\"] == r[\"target\"]\n    }\n\n\n# Helpers\n\n\ndef _multiples(iterable):\n    counts = Counter(iterable)\n    return {x: {\"count\": cnt} for x, cnt in counts.items() if cnt > 1}\n\n\ndef _entries(lex: lmf.Lexicon) -> list[lmf.LexicalEntry]:\n    return lex.get(\"entries\", [])\n\n\ndef _forms(e: lmf.LexicalEntry) -> list[lmf.Form]:\n    return e.get(\"forms\", [])\n\n\ndef _senses(e: lmf.LexicalEntry) -> list[lmf.Sense]:\n    return e.get(\"senses\", [])\n\n\ndef _synsets(lex: lmf.Lexicon) -> list[lmf.Synset]:\n    return lex.get(\"synsets\", [])\n\n\ndef _sense_relations(lex: lmf.Lexicon) -> Iterator[tuple[lmf.Sense, lmf.Relation]]:\n    for e in _entries(lex):\n        for s in _senses(e):\n            for r in s.get(\"relations\", []):\n                yield (s, r)\n\n\ndef _synset_relations(lex: lmf.Lexicon) -> Iterator[tuple[lmf.Synset, lmf.Relation]]:\n    for ss in _synsets(lex):\n        for r in ss.get(\"relations\", []):\n            yield (ss, r)\n\n\ndef _get_dc_type(r: lmf.Relation) -> str | None:\n    return (r.get(\"meta\") or {}).get(\"type\")\n\n\n# Check codes and messages\n#\n# categories:\n#   E - errors\n#   W - warnings\n# subcategories:\n#   100 - general\n#   200 - words and senses\n#   300 - synsets and ilis\n#   400 - relations\n#   500 - graph and taxonomy\n\n_codes: dict[str, _CheckFunction] = {\n    # 100 - general\n    \"E101\": _non_unique_id,\n    # 200 - words and senses\n    \"W201\": _has_no_senses,\n    \"W202\": _redundant_sense,\n    \"W203\": _redundant_entry,\n    \"E204\": _missing_synset,\n    # 300 - synsets and ilis\n    \"W301\": _empty_synset,\n    \"W302\": _repeated_ili,\n    \"W303\": _missing_ili_definition,\n    \"W304\": _spurious_ili_definition,\n    \"W305\": _blank_synset_definition,\n    \"W306\": _blank_synset_example,\n    \"W307\": _repeated_synset_definition,\n    # 400 - relations\n    \"E401\": _missing_relation_target,\n    \"W402\": _invalid_relation_type,\n    \"W403\": _redundant_relation,\n    \"W404\": _missing_reverse_relation,\n    # 500 - graph\n    \"W501\": _hypernym_wrong_pos,\n    \"W502\": _self_loop,\n}\n\n\ndef _select_checks(select: Sequence[str]) -> list[tuple[str, _CheckFunction, str]]:\n    selectset = set(select)\n    return [\n        (code, func, func.__doc__ or \"\")\n        for code, func in _codes.items()\n        if code in selectset or code[0] in selectset\n    ]\n\n\n# Main function\n\n\ndef validate(\n    lex: lmf.Lexicon | lmf.LexiconExtension,\n    select: Sequence[str] = (\"E\", \"W\"),\n    progress_handler: type[ProgressHandler] | None = ProgressBar,\n) -> _Report:\n    \"\"\"Check *lex* for validity and return a report of the results.\n\n    The *select* argument is a sequence of check codes (e.g.,\n    ``E101``) or categories (``E`` or ``W``).\n\n    The *progress_handler* parameter takes a subclass of\n    :class:`wn.util.ProgressHandler`. An instance of the class will be\n    created, used, and closed by this function.\n    \"\"\"\n    if lex.get(\"extends\"):\n        print(\"validation of lexicon extensions is not supported\")\n        return {}\n    lex = cast(\"lmf.Lexicon\", lex)\n\n    if progress_handler is None:\n        progress_handler = ProgressHandler\n\n    ids: _Ids = {\n        \"entry\": Counter(entry[\"id\"] for entry in _entries(lex)),\n        \"sense\": Counter(\n            sense[\"id\"] for entry in _entries(lex) for sense in _senses(entry)\n        ),\n        \"synset\": Counter(synset[\"id\"] for synset in _synsets(lex)),\n    }\n\n    checks = _select_checks(select)\n\n    progress = progress_handler(message=\"Validate\", total=len(checks))\n\n    report: _Report = {}\n    for code, func, message in checks:\n        progress.set(\n            status=getattr(func, \"__name__\", \"(unknown test)\").replace(\"_\", \" \")\n        )\n        report[code] = _Check(message=message, items=func(lex, ids))\n        progress.update()\n    progress.set(status=\"\")\n    progress.close()\n    return report\n"
  }
]