Showing preview only (687K chars total). Download the full file or copy to clipboard to get everything.
Repository: goodmami/wn
Branch: main
Commit: 22073eacc478
Files: 118
Total size: 651.7 KB
Directory structure:
gitextract_ym42fg_3/
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ ├── data-issue.md
│ │ └── feature_request.md
│ └── workflows/
│ ├── checks.yml
│ └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── CITATION.cff
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── bench/
│ ├── README.md
│ ├── conftest.py
│ └── test_bench.py
├── docs/
│ ├── .readthedocs.yaml
│ ├── Makefile
│ ├── _static/
│ │ ├── css/
│ │ │ └── svg.css
│ │ └── demo.ipynb
│ ├── api/
│ │ ├── wn.compat.rst
│ │ ├── wn.compat.sensekey.rst
│ │ ├── wn.constants.rst
│ │ ├── wn.ic.rst
│ │ ├── wn.ili.rst
│ │ ├── wn.lmf.rst
│ │ ├── wn.morphy.rst
│ │ ├── wn.project.rst
│ │ ├── wn.rst
│ │ ├── wn.similarity.rst
│ │ ├── wn.taxonomy.rst
│ │ ├── wn.util.rst
│ │ └── wn.validate.rst
│ ├── cli.rst
│ ├── conf.py
│ ├── docutils.conf
│ ├── faq.rst
│ ├── guides/
│ │ ├── basic.rst
│ │ ├── interlingual.rst
│ │ ├── lemmatization.rst
│ │ ├── lexicons.rst
│ │ ├── nltk-migration.rst
│ │ └── wordnet.rst
│ ├── index.rst
│ ├── make.bat
│ ├── requirements.txt
│ └── setup.rst
├── pyproject.toml
├── tests/
│ ├── _config_test.py
│ ├── _util_test.py
│ ├── compat_sensekey_test.py
│ ├── conftest.py
│ ├── data/
│ │ ├── E101-0.xml
│ │ ├── E101-1.xml
│ │ ├── E101-2.xml
│ │ ├── E101-3.xml
│ │ ├── README.md
│ │ ├── W305-0.xml
│ │ ├── W306-0.xml
│ │ ├── W307-0.xml
│ │ ├── mini-ili-with-status.tsv
│ │ ├── mini-ili.tsv
│ │ ├── mini-lmf-1.0.xml
│ │ ├── mini-lmf-1.1.xml
│ │ ├── mini-lmf-1.3.xml
│ │ ├── mini-lmf-1.4.xml
│ │ ├── sense-key-variations.xml
│ │ ├── sense-key-variations2.xml
│ │ ├── sense-member-order.xml
│ │ └── test-package/
│ │ ├── LICENSE
│ │ ├── README.md
│ │ ├── citation.bib
│ │ └── test-wn.xml
│ ├── db_test.py
│ ├── export_test.py
│ ├── ic_test.py
│ ├── ili_test.py
│ ├── lmf_test.py
│ ├── morphy_test.py
│ ├── primary_query_test.py
│ ├── project_test.py
│ ├── relations_test.py
│ ├── secondary_query_test.py
│ ├── similarity_test.py
│ ├── taxonomy_test.py
│ ├── util_test.py
│ ├── validate_test.py
│ └── wordnet_test.py
└── wn/
├── __init__.py
├── __main__.py
├── _add.py
├── _config.py
├── _core.py
├── _db.py
├── _download.py
├── _exceptions.py
├── _export.py
├── _lexicon.py
├── _metadata.py
├── _module_functions.py
├── _queries.py
├── _types.py
├── _util.py
├── _wordnet.py
├── compat/
│ ├── __init__.py
│ └── sensekey.py
├── constants.py
├── ic.py
├── ili.py
├── index.toml
├── lmf.py
├── metrics.py
├── morphy.py
├── project.py
├── py.typed
├── schema.sql
├── similarity.py
├── taxonomy.py
├── util.py
└── validate.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
:warning: If this is a question about Wn or how to use it, please create a [discussion](https://github.com/goodmami/wn/discussions) instead of an issue.
**To Reproduce**
Please enter a minimal working example of the command or Python code that illustrates the problem. To avoid formatting issues, enter the code in a Markdown code block:
```console
$ python -m wn ...
output...
```
or
```pycon
>>> import wn
>>> ...
output
```
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment**
Please enter the versions of Python and Wn you are using as well as the installed lexicons. You can find these by executing the following commands (adjust your platform-specific Python command as necessary, e.g., `python3` or `py -3`):
```console
python --version
python -m wn --version
python -m wn lexicons
```
**Additional context**
Add any other context about the problem here.
================================================
FILE: .github/ISSUE_TEMPLATE/data-issue.md
================================================
---
name: Data issue
about: Report an issue Wn's data index
title: ''
labels: data
assignees: ''
---
**If your issue is regarding the contents of the data** (e.g., a lexicon is missing a word, synset, relation, etc.), then please find the upstream project and file the issue there. You can find links to the projects on Wn's [README](https://github.com/goodmami/wn/). Projects without links are probably managed by the [Open Multilingual Wordnet](https://github.com/omwn/omw-data).
**Use this issue template for the following kinds of issues:**
1. Request a wordnet lexicon (including new versions of existing lexicons) to be indexed by Wn
Please provide:
- the project name
- the name and contact info of the current maintainer
- the language of the lexicon (BCP-47 code preferred)
- a URL to the project (e.g., on GitHub or other homepage)
- a URL to the [WN-LMF](https://github.com/globalwordnet/schemas/) resource
2. Report an issue with an indexed lexicon (e.g., the source URL has changed)
Please indicate the lexicon id and version and the correct project information, if available.
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.
================================================
FILE: .github/workflows/checks.yml
================================================
name: tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install Hatch
run: pipx install hatch
- name: Lint
run: hatch fmt --linter --check
- name: Type Check
run: hatch run mypy:check
- name: Check Buildable
run: hatch build
tests:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
os: [ubuntu-latest, windows-latest]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install Hatch
run: pipx install hatch
- name: Test
run: hatch test
================================================
FILE: .github/workflows/publish.yml
================================================
name: Build and Publish to PyPI or TestPyPI
on: push
jobs:
build:
name: Build distribution
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.x"
- name: Install Hatch
run: pipx install hatch
- name: Build
run: hatch build
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: dist/
publish-to-pypi:
name: Publish distributions to PyPI
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes
needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/wn
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing
steps:
- name: Download the dists
uses: actions/download-artifact@v4.1.8
with:
name: python-package-distributions
path: dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
publish-to-testpypi:
name: Publish distributions to TestPyPI
needs:
- build
runs-on: ubuntu-latest
environment:
name: testpypi
url: https://test.pypi.org/p/wn
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing
steps:
- name: Download the dists
uses: actions/download-artifact@v4.1.8
with:
name: python-package-distributions
path: dist/
- name: Publish to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
skip-existing: true
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Ruff (has its own .gitignore, but in case that ever changes...)
.ruff_cache
# Sphinx documentation
docs/_build/
# Jupyter Notebook
.ipynb_checkpoints
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# PyCharm
.idea/
# VS Code
.vscode/
# benchmarking results
.benchmarks/
================================================
FILE: CHANGELOG.md
================================================
# Change Log
## [Unreleased][unreleased]
## [v1.1.0]
**Release date: 2026-03-21**
### Added
* `cache` subcommand ([#313])
* `wn.config.list_cache_entries()` method ([#313])
* Support for `WN_DATA_DIR` environment variable ([#314])
### Changed
* The schema hashing function is now resilient to ordering and SQL DB
operations ([#319])
### Fixed
* `Synset.translate()` resets the lexicon configuration of the
translated sysets ([#316]); `Sense.translate()` and
`Word.translate()` derive from `Synset.translate()` so nothing
special needs to be done for them.
## [v1.0.0]
**Release date: 2026-01-31**
Notable changes in this release:
* A new version of the database schema requires a database rebuild
* A new `wn.ili` module deals with ILI files and objects; interlingual
queries still use the `Synset.ili` member, which is now a simple `str`
* The Open English Wordnet versions 2025 and 2025+ are added to the index
* The Open Multilingual Wordnet 2.0 is added to the index
### Index
* Add `oewn:2025` ([#294])
* Add `oewn:2025+` ([#294])
* Add `omw:2.0`, including `2.0` versions of individual OMW lexicons ([#300])
### Schema
* Add `specifier` column to `lexicon` table ([#234])
* Remove `lexicalized` column from `synsets` and `senses` ([#248])
* Add `unlexicalized_synsets` and `unlexicalized_senses` tables ([#248])
* Add `lexicon_rowid` column to `pronunciations` and `tags` ([#303])
### Added
* `wn.lemmas()` function and `Wordnet.lemmas()` method to query all
lemmas at once.
* Support for WN-LMF 1.4 ([#260])
- Sense ordering: `index` on `<LexicalEntry>` and `n` on `<Sense>`
- New sense relations:
- `metaphor`
- `has_metaphor`
- `metonym`
- `has_metonym`
- `agent`
- `material`
- `event`
- `instrument`
- `location`
- `by_means_of`
- `undergoer`
- `property`
- `result`
- `state`
- `uses`
- `destination`
- `body_part`
- `vehicle`
- `ref` attribute for `<Requires>` and `<Extends>` ([#301])
* `wn.ili` module
* `wn.Sense.synset_relations()` ([#271])
* `wn.Pronunciation.lexicon()` method ([#303])
* `wn.Tag.lexicon()` method ([#303])
* Support for exporting lexicon extensions ([#103])
* `wn.compat.sensekey` supports the `oewn-v2` flavor for escaping and
unescaping for the scheme used by OEWN 2025 ([#292])
* `wn.compat.sensekey` supports the `oewn:2025` and `oewn:2025+` lexicons for
the `sense_key_getter` and `sense_getter` functions ([#292])
* `wn.reset_database()` function for reinitializing an outdated database.
### Removed
* `wn.web` module ([#295])
* `wn.Synset.relation_map()` method ([#271])
* `wn.Sense.relation_map()` method ([#271])
### Changed
* Default form normalizer uses casefold instead of lower ([#233])
* `Synset.ili` is a `str` instead of an `ILI` object.
* `Wordnet.synsets()` method and `wn.synsets()` function's only accepts `ili`
`str` arguments for the `ili` parameter again, reverting a change from
v0.12.0. This is because `Synset.ili` is now a simple string and `ILI`
objects are no longer part of the core `wn` package namespace.
* `wn.Synset.relations()`: return `wn.Relation` to `wn.Synset` mapping when
using `data=True` ([#271])
* `wn.Sense.relations()`: return `wn.Relation` to `wn.Sense` mapping when
using `data=True` ([#271])
* Queries of relations can specify different lexicons for source and target
(part of [#103]; not a user-facing change)
### Fixed
* WN-LMF 1.1+ `<Pronunciation>` exported properly ([#302])
* WN-LMF 1.1+ `subcat` attribute exported properly ([#302])
### Documentation
* Correct docstring for `wn.taxonomy.taxonomy_depth()` ([#291])
## [v0.14.0]
**Release date: 2025-11-16**
### Python Support
* Removed support for Python 3.9
* Added support for Python 3.14
### Added
* Preliminary XML-only support for WN-LMF 1.4 ([#260])
* `lexicon()` method on `Form`, `Example`, `Definition`, and `Count` ([#286])
* `confidence()` method ([#263])
- On `Lexicon` defaults to 1.0
- On existing `ILI`, defaults to 1.0
- On `Word`, `Sense`, `Synset`, `Relation`, `Example`, `Definition`, and
`Count`, defaults to the confidence of their lexicon.
* `/` (index) and `/health` endpoints for `wn.web` (see [#268])
### Changed
* `wn.web`: returns `JSONResponse` on most errors ([#277])
### Fixed
* Encode example metadata on export ([#285])
* Update LMF to use `https` in `dc` namespace
### Maintenance
* Added `py.typed` file to repository ([#266])
* Use `tomllib` instead of `tomli` for Python 3.11+
## [v0.13.0]
**Release date: 2025-06-13**
### Added
* Support for WN-LMF 1.4 ([#260])
* `wn.compat` namespace (see [#55])
* `wn.compat.sensekey` module ([#55]) with methods:
- `sense_key_getter()`
- `sense_getter()`
- `unescape_oewn_sense_key()`
- `escape_oewn_sense_key()`
* `wn.project.get_project()` ([#53])
* `wn.project.Project` ([#53])
* `wn.project.ResourceOnlyPackage` ([#53])
* `path` property on `wn.project.Project` classes ([#53])
* `delete` parameter on `wn.project.iterpackages()` ([#53])
### Changed
* `wn.add()` allows synset members to be lexical entry IDs for rank
calculations ([#255])
* `wn.add()` no longer requires `partOfSpeech` on synsets; this was
not a requirement of WN-LMF nor was it enforced in the database
* `wn.export()` defaults to `version="1.4"` instead of `"1.0"`
## [v0.12.0]
**Release date: 2025-04-22**
### Added
* `wn.add_lexical_resource()` to add result of `wn.lmf.load()` to
database rather than from a file (pertinent to [#98])
* `bench/` directory with benchmark tests ([#98])
* `Synset.definitions()` ([#246])
### Fixed
* `wn.web` casts URL objects to strings for JSON serialization ([#238])
* Setting `wn.config.data_directory` to an uninitialized directory no
longer raises a `sqlite3.OperationalError` ([#250])
### Changed
* `Wordnet` and module-level query functions now issue a warning when
the `lang` argument matches more than one lexicon ([#241])
* `Wordnet.synsets()` now accepts `wn.ILI` objects for the `ili`
parameter ([#235])
* DB-internal rowids are no longer used outside of SQL queries ([#226])
* The following methods now return standard `str` objects by default
and custom classes with a `data=True` argument ([#246]):
- `Word.lemma()`
- `Word.forms()`
- `Sense.examples()`
- `Synset.examples()`
- `Synset.definition()`
* `Sense.counts()` now returns a standard `int` object by default and
a custom class with a `data=True` argument ([#246])
* The following classes no longer subclass standard `str` or `int`
types and therefore no longer inherit their behavior or interface
([#246]):
- `Form`
- `Example`
- `Definition`
- `Count`
## [v0.11.0]
**Release date: 2024-12-11**
### Index
* Added `oewn:2024` ([#221])
### Added
* `Relation` class ([#216])
* `Sense.relation_map()` method ([#216])
* `Synset.relation_map()` method ([#167], [#216])
* `W305` blank definition on synset validation ([#151])
* `W306` blank example on synset validation ([#151])
* `W307` repeated definition on synset validation ([#151])
### Fixed
* Enumerate repeated entry, sense, synset IDs for validation ([#228])
## [v0.10.1]
**Release date: 2024-10-29**
### Fixed
* Follow redirects with `httpx.Client` in `wn._download` ([#211])
* Remove reverse relations for `pertainym` and `also` ([#213])
* Validate redundant relations considering `dc:type` ([#215])
### Maintenance
* Added `docs/.readthedocs.yaml` for building docs ([#214])
## [v0.10.0]
**Release date: 2024-10-29**
### Python Support
* Removed support for Python 3.8 ([#202])
* Added support for Python 3.13 ([#202])
### Added
* Support for WN-LMF 1.2 and 1.3 ([#200])
### Fixed
* Don't assume 'id' on form elements in WN-LMF 1.2+ ([#207])
### Maintenance
* Switched packaging from flit to Hatch ([#201])
* Updated dependencies, CI warnings, old workarounds ([#203])
* Change CI publishing to OIDC trusted publishing
## [v0.9.5]
**Release date: 2023-12-05**
### Python Support
* Removed support for Python 3.7 ([#191])
* Added support for Python 3.12 ([#191])
### Index
* Added `oewn:2023` ([#194])
## [v0.9.4]
**Release date: 2023-05-07**
### Index
* Added `oewn:2022` ([#181])
## [v0.9.3]
**Release date: 2022-11-13**
### Python Support
* Removed support for Python 3.6
* Added support for Python 3.11
### Fixed
* `wn.Synset.relations()` no longer raises a `KeyError` when no
relation types are given and relations are found via ILI ([#177])
## [v0.9.2]
**Release date: 2022-10-02**
### Provisional Changes
* The `editor` installation extra installs the `wn-editor`
package. This is not a normal way of using extras, as it installs a
dependent and not a dependency, and may be removed. ([#17])
### Fixed
* `wn.download()` no longer uses Python features unavailable in 3.7
when recovering from download errors
* `Sense.synset()` now creates a `Synset` properly linked to the same
`Wordnet` object ([#157], [#168])
* `Sense.word()` now creates a `Word` properly linked to the same
`Wordnet` object ([#157])
* `Synset.relations()` uses the correct relation type for those
obtained from expand lexicons ([#169])
## [v0.9.1]
**Release date: 2021-11-23**
### Fixed
* Correctly add syntactic behaviours for WN-LMF 1.1 lexicons ([#156])
## [v0.9.0]
**Release date: 2021-11-17**
### Added
* `wn.constants.REVERSE_RELATIONS`
* `wn.validate` module ([#143])
* `validate` subcommand ([#143])
* `wn.Lexicon.describe()` ([#144])
* `wn.Wordnet.describe()` ([#144])
* `wn.ConfigurationError`
* `wn.ProjectError`
### Fixed
* WN-LMF 1.0 Syntactic Behaviours with no `senses` are now assigned to
all senses in the lexical entry. If a WN-LMF 1.1 lexicon extension
puts Syntactic Behaviour elements on lexical entries (which it
shouldn't) it will only be assigned to senses and external senses
listed.
* `wn.Form` now always hashes like `str`, so things like
`set.__contains__` works as expected.
* `wn.download()` raises an exception on bad responses ([#147]])
* Avoid returning duplicate matches when a lemmatizer is used ([#154])
### Removed
* `wn.lmf.dump()` no longer has the `version` parameter
### Changed
* `wn.lmf.load()`
- returns a dictionary for the resource instead of a
list of lexicons, now including the WN-LMF version, as below:
```python
{
'lmf_version': '...',
'lexicons': [...]
}
```
- returned lexicons are modeled with Python lists and dicts instead
of custom classes ([#80])
* `wn.lmf.scan_lexicons()` only returns info about present lexicons,
not element counts ([#113])
* Improper configurations (e.g., invalid data directory, malformed
index) now raise a `wn.ConfigurationError`
* Attempting to get an unknown project or version now raises
`wn.ProjectError` instead of `wn.Error` or `KeyError`
* Projects and versions in the index now take an `error` key. Calling
`wn.config.get_project_info()` on such an entry will raise
`wn.ProjectError`. Such entries may not also specify a url. The
entry can still be viewed without triggering the error via
`wn.config.index`. ([#146])
* Project versions in the index may specify multiple, space-separated
URLs on the url key. If one fails, the next will be attempted when
downloading. ([#142])
* `wn.config.get_project_info()` now returns a `resource_urls` key
mapped to a list of URLs instead of `resource_url` mapped to a
single URL. ([#142])
* `wn.config.get_cache_path()` now only accepts URL arguments
* The `lexicon` parameter in many functions now allows glob patterns
like `omw-*:1.4` ([#155])
### Index
* Added `oewn:2021` new ID, previously `ewn` ([#152])
* Added `own`, `own-pt`, and `own-en` ([#97])
* Added `odenet:1.4`
* Added `omw:1.4`, including `omw-en`, formerly `pwn:3.0` ([#152])
* Added `omw-en31:1.4`, formerly `pwn:3.1` ([#152])
* Removed `omw:1.3`, `pwn:3.0`, and `pwn:3.1` ([#152])
* Added `kurdnet:1.0` ([#140])
## [v0.8.3]
**Release date: 2021-11-03**
### Fixed
* `wn.lmf` now serialized DC and non-DC metadata correctly ([#148])
## [v0.8.2]
**Release date: 2021-11-01**
This release only resolves some dependency issues with the previous
release.
## [v0.8.1]
**Release date: 2021-10-29**
Note: the release on PyPI was yanked because a dependency was not
specified properly.
### Fixed
* `wn.lmf` uses `https://` for the `dc` namespace instead of
`http://`, following the DTD
## [v0.8.0]
**Release date: 2021-07-07**
### Added
* `wn.ic` module ([#40]
* `wn.taxonomy` module ([#125])
* `wn.similarity.res` Resnik similarity ([#122])
* `wn.similarity.jcn` Jiang-Conrath similarity ([#123])
* `wn.similarity.lin` Lin similarity ([#124])
* `wn.util.synset_id_formatter` ([#119])
### Changed
* Taxonomy methods on `wn.Synset` are moved to `wn.taxonomy`, but
shortcut methods remain for compatibility ([#125]).
* Similarity metrics in `wn.similarity` now raise an error when
synsets come from different parts of speech.
## [v0.7.0]
**Release date: 2021-06-09**
### Added
* Support for approximate word searches; on by default, configurable
only by instantiating a `wn.Wordnet` object ([#105])
* `wn.morphy` ([#19])
* `wn.Wordnet.lemmatizer` attribute ([#8])
* `wn.web` ([#116])
* `wn.Sense.relations()` ([#82])
* `wn.Synset.relations()` ([#82])
### Changed
* `wn.lmf.load()` now takes a `progress_handler` parameter ([#46])
* `wn.lmf.scan_lexicons()` no longer returns sets of relation types or
lexfiles; `wn.add()` now gets these from loaded lexicons instead
* `wn.util.ProgressHandler`
- Now has a `refresh_interval` parameter; updates only trigger a
refresh after the counter hits the threshold set by the interval
- The `update()` method now takes a `force` parameter to trigger a
refresh regardless of the refresh interval
* `wn.Wordnet`
- Initialization now takes a `normalizer` parameter ([#105])
- Initialization now takes a `lemmatizer` parameter ([#8])
- Initialization now takes a `search_all_forms` parameter ([#115])
- `Wordnet.words()`, `Wordnet.senses()` and `Wordnet.synsets()` now
use any specified lemmatization or normalization functions to
expand queries on word forms ([#105])
### Fixed
* `wn.Synset.ili` for proposed ILIs now works again (#117)
## [v0.6.2]
**Release date: 2021-03-22**
### Fixed
* Disable `sqlite3` progress reporting after `wn.remove()` ([#108])
## [v0.6.1]
**Release date: 2021-03-05**
### Added
* `wn.DatabaseError` as a more specific error type for schema changes
([#106])
## [v0.6.0]
**Release date: 2021-03-04**
**Notice:** This release introduces backwards-incompatible changes to
the schema that require users upgrading from previous versions to
rebuild their database.
### Added
* For WN-LMF 1.0 support ([#65])
- `wn.Sense.frames()`
- `wn.Sense.adjposition()`
- `wn.Tag`
- `wn.Form.tags()`
- `wn.Count`
- `wn.Sense.counts()`
* For ILI modeling ([#23])
- `wn.ILI` class
- `wn.Wordnet.ili()`
- `wn.Wordnet.ilis()`
- `wn.ili()`
- `wn.ilis()`
- `wn.project.Package.type` property
- Index entries of different types; default is `'wordnet'`, `'ili'`
is also available
- Support for detecting and loading ILI tab-separated-value exports;
not directly accessible through the public API at this time
- Support for adding ILI resources to the database
- A CILI index entry ([#23])
* `wn.lmf` WN-LMF 1.1 support ([#7])
- `<Requires>`
- `<LexiconExtension>`, `<Extends>`, `<ExternalSynset>`,
`<ExternalLexicalEntry>`, `<ExternalSense>`,
`<ExternalLemma>`, `<ExternalForm>`
- `subcat` on `<Sense>`
- `members` on `<Synset>`
- `lexfile` on `<Synset>`
- `<Pronunciation>`
- `id` on `<Form>`
- New relations
* Other WN-LMF 1.1 support
- `wn.Lexicon.requires()`
- `wn.Lexicon.extends()` ([#99])
- `wn.Lexicon.extensions()` ([#99])
- `wn.Pronunciation` ([#7])
- `wn.Form.pronunciations()` ([#7])
- `wn.Form.id` ([#7])
- `wn.Synset.lexfile()`
* `wn.constants.SENSE_SYNSET_RELATIONS`
* `wn.WnWarning` (related to [#92])
* `wn.Lexicon.modified()` ([#17])
### Fixed
* Adding a wordnet with sense relations with invalid target IDs now
raises an error instead of ignoring the relation.
* Detect LMF-vs-CILI projects even when files are uncompressed ([#104])
### Changed
* WN-LMF 1.0 entities now modeled and exported to XML ([#65]):
- Syntactic behaviour ([#65])
- Adjpositions ([#65])
- Form tags
- Sense counts
- Definition source senses
- ILI definitions
* WN-LMF 1.1 entities now modeled and exported to XML ([#89]):
- Lexicon requirements and extensions ([#99])
- Form pronunciations
- Lexicographer files via the `lexfile` attribute
- Form ids
* `wn.Synset.ili` now returns an `ILI` object
* `wn.remove()` now takes a `progess_handler` parameter
* `wn.util.ProgressBar` uses a simpler formatting string with two new
computed variables
* `wn.project.is_package_directory()` and
`wn.project.is_collection_directory()` now detect
packages/collection with ILI resource files ([#23])
* `wn.project.iterpackages()` now includes ILI packages
* `wn.Wordnet` now sets the default `expand` value to a lexicon's
dependencies if they are specified (related to [#92])
### Schema
* General changes:
- Parts of speech are stored as text
- Added indexes and `ON DELETE` actions to speed up `wn.remove()`
- All extendable tables are now linked to their lexicon ([#91])
- Added rowid to tables with metadata
- Preemptively added a `modified` column to `lexicons` table ([#17])
- Preemptively added a `normalized_form` column to `forms` ([#105])
- Relation type tables are combined for synsets and senses ([#75])
* ILI-related changes ([#23]):
- ILIs now have an integer rowid and a status
- Proposed ILIs also have an integer rowid for metadata access
- Added a table for ILI statuses
* WN-LMF 1.0 changes ([#65]):
- SyntacticBehaviour (previously unused) no longer requires an ID and
does not use it in the primary key
- Added table for adjposition values
- Added source-sense to definitions table
* WN-LMF 1.1 changes ([#7], [#89]):
- Added a table for lexicon dependencies
- Added a table for lexicon extensions ([#99])
- Added `logo` column to `lexicons` table
- Added a `synset_rank` column to `senses` table
- Added a `pronunciations` table
- Added column for lexicographer files to the `synsets` table
- Added a table for lexicographer file names
- Added an `id` column to `forms` table
## [v0.5.1]
**Release date: 2021-01-29**
### Fixed
* `wn.lmf` specifies `utf-8` when opening files ([#95])
* `wn.lmf.dump()` casts attribute values to strings
## [v0.5.0]
**Release date: 2021-01-28**
### Added
* `wn.Lexicon.specifier()`
* `wn.config.allow_multithreading` ([#86])
* `wn.util` module for public-API utilities
* `wn.util.ProgressHandler` ([#87])
* `wn.util.ProgressBar` ([#87])
### Removed
* `wn.Wordnet.lang`
### Changed
* `wn.Synset.get_related()` does same-lexicon traversals first, then
ILI expansions ([#90])
* `wn.Synset.get_related()` only targets the source synset lexicon in
default mode ([#90], [#92])
* `wn.Wordnet` has a "default mode", when no lexicon or language is
selected, which searches any lexicon but relation traversals only
target the lexicon of the source synset ([#92]) is used for the
lexicon id ([#92])
* `wn.Wordnet` has an empty expand set when a lexicon or language is
specified and no expand set is specified ([#92])
* `wn.Wordnet` now allows versions in lexicon specifiers when the id
is `*` (e.g., `*:1.3+omw`)
* `wn.Wordnet` class signature has `lexicon` first, `lang` is
keyword-only ([#93])
* `lang` and `lexicon` parameters are keyword-only on `wn.lexicons()`,
`wn.word()`, `wn.words()`, `wn.sense()`, `wn.senses()`,
`wn.synset()`, `wn.synsets()`, and the `translate()` methods of
`wn.Word`, `wn.Sense`, and `wn.Synset` ([#93])
## [v0.4.1]
**Release date: 2021-01-19**
### Removed
* `wn.config.database_filename` (only `wn.config.data_directory` is
configurable now)
### Changed
* Schema validation is now done when creating a new connection,
instead of on import of `wn`
* One connection is shared per database path, rather than storing
connections on the modeling classes ([#81])
### Fixed
* More robustly check for LMF validity ([#83])
## [v0.4.0]
**Release date: 2020-12-29**
### Added
* `wn.export()` to export lexicon(s) from the database ([#15])
* `wn.lmf.dump()` to dump WN-LMF lexicons to disk ([#15])
* `metadata` method on `wn.Word`, `wn.Sense`, and `wn.Synset`
* `lexicalized` method on `wn.Sense` and `wn.Synset`
* `wn.Form` class ([#79])
* `--verbose` / `-v` option for the command-line interface ([#71])
### Changed
* `wn.Lexicon.metadata` is now a method
* `wn.Word.lemma()` returns a `wn.Form` object ([#79])
* `wn.Word.forms()` returns a list of `wn.Form` objects ([#79])
* `wn.project.iterpackages()` raises `wn.Error` on decompression
problems ([#77])
* `wn.lmf.LMFError` now inherits from `wn.Error`
* `wn.lmf.scan_lexicons()` raises `LMFError` on XML parsing errors
([#77])
* `wn.download()` reraises caught `wn.Error` with more informative
message ([#77])
* `wn.add()` improve error message when lexicons are already added
([#77])
* Basic logging added for `wn.download()` and `wn.add()` ([#71])
* `Synset.get_related()` and `Sense.get_related()` may take a `'*'`
parameter to get all relations
* `wn.Wordnet` objects keep an open connection to the database ([#81])
### Fixed
* `wn.projects.iterpackages()` tries harder to prevent potential race
conditions when reading temporary files ([#76])
* `wn.Lexicon.metadata` now returns a dictionary ([#78])
## [v0.3.0]
**Release date: 2020-12-16**
### Added
* `add` parameter to `wn.download()` ([#73])
* `--no-add` option to `wn download` command ([#73])
* `progress_handler` parameter to `wn.download()` ([#70])
* `progress_handler` parameter to `wn.add()` ([#70])
### Fixed
* `Synset.shortest_path()` no longer includes starting node ([#63])
* `Synset.closure()`/`Sense.closure()` may take multiple relations
([#74])
* `Synset.hypernym_paths(simulate_root=True)` returns just the fake
root node if no paths were found (related to [#64])
* `wn.lexicons()` returns empty list on unknown lang/lexicon ([#59])
### Changed
* Renamed `lgcode` parameter to `lang` throughout ([#66])
* Renamed `Wordnet.lgcode` property to `Wordnet.lang` ([#66])
* Renamed `--lgcode` command-line option to `--lang` ([#66])
* Use better-performing/less-safe database options when adding
lexicons ([#69])
## [v0.2.0]
**Release date: 2020-12-02**
### Added
* `wn.config.get_cache_path()` returns the path of a cached resource
* `wn.projects()` returns the info about known projects ([#60])
* `projects` subcommand to command-line interface ([#60])
* Open German WordNet 1.3 to the index
### Changed
* On import, Wn now raises an error if the database has an outdated
schema ([#61])
* `wn.config.get_project_info()` now includes a `cache` key
* Output of `lexicons` CLI subcommand now tab-delimited
## [v0.1.1]
**Release date: 2020-11-26**
### Added
* Command-line interface for downloading and listing lexicons ([#47])
### Fixed
* Cast `pathlib.Path` to `str` for `sqlite3.connect()` ([#58])
* Pass `lgcode` to `Wordnet` object in `wn.synset()`
## [v0.1.0]
**Release date: 2020-11-25**
This is the initial release of the new Wn library. On PyPI it replaces
the https://github.com/nltk/wordnet/ code which had been effectively
abandoned, but this is an entirely new codebase.
[v1.1.0]: ../../releases/tag/v1.1.0
[v1.0.0]: ../../releases/tag/v1.0.0
[v0.14.0]: ../../releases/tag/v0.14.0
[v0.13.0]: ../../releases/tag/v0.13.0
[v0.12.0]: ../../releases/tag/v0.12.0
[v0.11.0]: ../../releases/tag/v0.11.0
[v0.10.1]: ../../releases/tag/v0.10.1
[v0.10.0]: ../../releases/tag/v0.10.0
[v0.9.5]: ../../releases/tag/v0.9.5
[v0.9.4]: ../../releases/tag/v0.9.4
[v0.9.3]: ../../releases/tag/v0.9.3
[v0.9.2]: ../../releases/tag/v0.9.2
[v0.9.1]: ../../releases/tag/v0.9.1
[v0.9.0]: ../../releases/tag/v0.9.0
[v0.8.3]: ../../releases/tag/v0.8.3
[v0.8.2]: ../../releases/tag/v0.8.2
[v0.8.1]: ../../releases/tag/v0.8.1
[v0.8.0]: ../../releases/tag/v0.8.0
[v0.7.0]: ../../releases/tag/v0.7.0
[v0.6.2]: ../../releases/tag/v0.6.2
[v0.6.1]: ../../releases/tag/v0.6.1
[v0.6.0]: ../../releases/tag/v0.6.0
[v0.5.1]: ../../releases/tag/v0.5.1
[v0.5.0]: ../../releases/tag/v0.5.0
[v0.4.1]: ../../releases/tag/v0.4.1
[v0.4.0]: ../../releases/tag/v0.4.0
[v0.3.0]: ../../releases/tag/v0.3.0
[v0.2.0]: ../../releases/tag/v0.2.0
[v0.1.1]: ../../releases/tag/v0.1.1
[v0.1.0]: ../../releases/tag/v0.1.0
[unreleased]: ../../tree/main
[#7]: https://github.com/goodmami/wn/issues/7
[#8]: https://github.com/goodmami/wn/issues/8
[#15]: https://github.com/goodmami/wn/issues/15
[#17]: https://github.com/goodmami/wn/issues/17
[#19]: https://github.com/goodmami/wn/issues/19
[#23]: https://github.com/goodmami/wn/issues/23
[#40]: https://github.com/goodmami/wn/issues/40
[#46]: https://github.com/goodmami/wn/issues/46
[#47]: https://github.com/goodmami/wn/issues/47
[#53]: https://github.com/goodmami/wn/issues/53
[#55]: https://github.com/goodmami/wn/issues/55
[#58]: https://github.com/goodmami/wn/issues/58
[#59]: https://github.com/goodmami/wn/issues/59
[#60]: https://github.com/goodmami/wn/issues/60
[#61]: https://github.com/goodmami/wn/issues/61
[#63]: https://github.com/goodmami/wn/issues/63
[#64]: https://github.com/goodmami/wn/issues/64
[#65]: https://github.com/goodmami/wn/issues/65
[#66]: https://github.com/goodmami/wn/issues/66
[#69]: https://github.com/goodmami/wn/issues/69
[#70]: https://github.com/goodmami/wn/issues/70
[#71]: https://github.com/goodmami/wn/issues/71
[#73]: https://github.com/goodmami/wn/issues/73
[#74]: https://github.com/goodmami/wn/issues/74
[#75]: https://github.com/goodmami/wn/issues/75
[#76]: https://github.com/goodmami/wn/issues/76
[#77]: https://github.com/goodmami/wn/issues/77
[#78]: https://github.com/goodmami/wn/issues/78
[#79]: https://github.com/goodmami/wn/issues/79
[#80]: https://github.com/goodmami/wn/issues/80
[#81]: https://github.com/goodmami/wn/issues/81
[#82]: https://github.com/goodmami/wn/issues/82
[#83]: https://github.com/goodmami/wn/issues/83
[#86]: https://github.com/goodmami/wn/issues/86
[#87]: https://github.com/goodmami/wn/issues/87
[#89]: https://github.com/goodmami/wn/issues/89
[#90]: https://github.com/goodmami/wn/issues/90
[#91]: https://github.com/goodmami/wn/issues/91
[#92]: https://github.com/goodmami/wn/issues/92
[#93]: https://github.com/goodmami/wn/issues/93
[#95]: https://github.com/goodmami/wn/issues/95
[#97]: https://github.com/goodmami/wn/issues/97
[#98]: https://github.com/goodmami/wn/issues/98
[#99]: https://github.com/goodmami/wn/issues/99
[#103]: https://github.com/goodmami/wn/issues/103
[#104]: https://github.com/goodmami/wn/issues/104
[#105]: https://github.com/goodmami/wn/issues/105
[#106]: https://github.com/goodmami/wn/issues/106
[#108]: https://github.com/goodmami/wn/issues/108
[#113]: https://github.com/goodmami/wn/issues/113
[#115]: https://github.com/goodmami/wn/issues/115
[#116]: https://github.com/goodmami/wn/issues/116
[#117]: https://github.com/goodmami/wn/issues/117
[#119]: https://github.com/goodmami/wn/issues/119
[#122]: https://github.com/goodmami/wn/issues/122
[#123]: https://github.com/goodmami/wn/issues/123
[#124]: https://github.com/goodmami/wn/issues/124
[#125]: https://github.com/goodmami/wn/issues/125
[#140]: https://github.com/goodmami/wn/issues/140
[#142]: https://github.com/goodmami/wn/issues/142
[#143]: https://github.com/goodmami/wn/issues/143
[#144]: https://github.com/goodmami/wn/issues/144
[#146]: https://github.com/goodmami/wn/issues/146
[#147]: https://github.com/goodmami/wn/issues/147
[#148]: https://github.com/goodmami/wn/issues/148
[#151]: https://github.com/goodmami/wn/issues/151
[#152]: https://github.com/goodmami/wn/issues/152
[#154]: https://github.com/goodmami/wn/issues/154
[#155]: https://github.com/goodmami/wn/issues/155
[#156]: https://github.com/goodmami/wn/issues/156
[#157]: https://github.com/goodmami/wn/issues/157
[#167]: https://github.com/goodmami/wn/issues/167
[#168]: https://github.com/goodmami/wn/issues/168
[#169]: https://github.com/goodmami/wn/issues/169
[#177]: https://github.com/goodmami/wn/issues/177
[#181]: https://github.com/goodmami/wn/issues/181
[#191]: https://github.com/goodmami/wn/issues/191
[#194]: https://github.com/goodmami/wn/issues/194
[#200]: https://github.com/goodmami/wn/issues/200
[#201]: https://github.com/goodmami/wn/issues/201
[#202]: https://github.com/goodmami/wn/issues/202
[#203]: https://github.com/goodmami/wn/issues/203
[#207]: https://github.com/goodmami/wn/issues/207
[#211]: https://github.com/goodmami/wn/issues/211
[#213]: https://github.com/goodmami/wn/issues/213
[#214]: https://github.com/goodmami/wn/issues/214
[#215]: https://github.com/goodmami/wn/issues/215
[#216]: https://github.com/goodmami/wn/issues/216
[#221]: https://github.com/goodmami/wn/issues/221
[#226]: https://github.com/goodmami/wn/issues/226
[#228]: https://github.com/goodmami/wn/issues/228
[#233]: https://github.com/goodmami/wn/issues/233
[#234]: https://github.com/goodmami/wn/issues/234
[#235]: https://github.com/goodmami/wn/issues/235
[#238]: https://github.com/goodmami/wn/issues/238
[#241]: https://github.com/goodmami/wn/issues/241
[#246]: https://github.com/goodmami/wn/issues/246
[#248]: https://github.com/goodmami/wn/issues/248
[#250]: https://github.com/goodmami/wn/issues/250
[#255]: https://github.com/goodmami/wn/issues/255
[#260]: https://github.com/goodmami/wn/issues/260
[#263]: https://github.com/goodmami/wn/issues/263
[#266]: https://github.com/goodmami/wn/issues/266
[#268]: https://github.com/goodmami/wn/pull/268
[#271]: https://github.com/goodmami/wn/issues/271
[#277]: https://github.com/goodmami/wn/issues/277
[#285]: https://github.com/goodmami/wn/issues/285
[#286]: https://github.com/goodmami/wn/issues/286
[#291]: https://github.com/goodmami/wn/issues/291
[#292]: https://github.com/goodmami/wn/issues/292
[#294]: https://github.com/goodmami/wn/issues/294
[#295]: https://github.com/goodmami/wn/issues/295
[#300]: https://github.com/goodmami/wn/issues/300
[#301]: https://github.com/goodmami/wn/issues/301
[#302]: https://github.com/goodmami/wn/issues/302
[#303]: https://github.com/goodmami/wn/issues/303
[#313]: https://github.com/goodmami/wn/issues/313
[#314]: https://github.com/goodmami/wn/issues/314
[#316]: https://github.com/goodmami/wn/issues/316
[#319]: https://github.com/goodmami/wn/issues/319
================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
title: Wn
message: >-
Please cite this software using the metadata from
'preferred-citation'.
type: software
authors:
- given-names: Michael Wayne
family-names: Goodman
email: goodman.m.w@gmail.com
orcid: 'https://orcid.org/0000-0002-2896-5141'
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
repository-code: 'https://github.com/goodmami/wn/'
preferred-citation:
type: conference-paper
authors:
- given-names: Michael Wayne
family-names: Goodman
email: goodmami@uw.edu
orcid: 'https://orcid.org/0000-0002-2896-5141'
affiliation: Nanyang Technological University
- given-names: Francis
family-names: Bond
email: bond@ieee.org
orcid: 'https://orcid.org/0000-0003-4973-8068'
affiliation: Nanyang Technological University
start: 100 # First page number
end: 107 # Last page number
conference:
name: "Proceedings of the 11th Global Wordnet Conference"
title: "Intrinsically Interlingual: The Wn Python Library for Wordnets"
year: 2021
month: 1
url: 'https://aclanthology.org/2021.gwc-1.12/'
publisher: "Global Wordnet Association"
================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to Wn
Thanks for helping to make Wn better!
**Quick Links:**
- [Report a bug or request a features](https://github.com/goodmami/wn/issues/new)
- [Ask a question](https://github.com/goodmami/wn/discussions)
- [View documentation](https://wn.readthedocs.io/)
**Developer Information:**
- Versioning scheme: [Semantic Versioning](https://semver.org/)
- Branching scheme: [GitHub Flow](https://guides.github.com/introduction/flow/)
- Changelog: [keep a changelog](https://keepachangelog.com/en/1.0.0/)
- Documentation framework: [Sphinx](https://www.sphinx-doc.org/)
- Docstring style: [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) (via [sphinx.ext.napoleon](https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html))
- Unit/regression testing: [pytest](https://pytest.org/)
- Benchmarking: [pytest-benchmark](https://pytest-benchmark.readthedocs.io/)
- Packaging framework: [Hatch](https://hatch.pypa.io/)
- Coding style: [PEP-8](https://www.python.org/dev/peps/pep-0008/) (via [Ruff](https://beta.ruff.rs/docs/))
- Type checking: [Mypy](http://mypy-lang.org/)
## Get Help
Confused about wordnets in general? See the [Global Wordnet
Association Documentation](https://globalwordnet.github.io/gwadoc/)
Confused about using Wn or wish to share some tips? [Start a
discussion](https://github.com/goodmami/wn/discussions)
Encountering a problem with Wn or wish to propose a new features? [Raise an
issue](https://github.com/goodmami/wn/issues/new)
## Report a Bug
When reporting a bug, please provide enough information for someone to
reproduce the problem. This might include the version of Python you're
running, the version of Wn you have installed, the wordnet lexicons
you have installed, and possibly the platform (Linux, Windows, macOS)
you're on. Please give a minimal working example that illustrates the
problem. For example:
> I'm using Wn 0.9.5 with Python 3.11 on Linux and [description of
> problem...]. Here's what I have tried:
>
> ```pycon
> >>> import wn
> >>> # some code
> ... # some result or error
> ```
## Request a Feature
If there's a feature that you think would make a good addition to Wn,
raise an issue describing what the feature is and what problems it
would address.
## Guidelines for Contributing
See the "developer information" above for a brief description of
guidelines and conventions used in Wn. If you have a fix, please
submit a pull request to the `main` branch. In general, every pull
request should have an associated issue.
Developers should run and test Wn locally from source using
[Hatch](https://hatch.pypa.io/). Hatch may be installed
system-wide or within a virtual environment:
```bash
$ pip install hatch
```
You can then use the `hatch` commands like the following:
```console
$ hatch shell # activate a Wn virtual environment
$ hatch fmt --check # lint the code and check code style
$ hatch run mypy:check # type check with mypy
$ hatch test # run unit tests
$ hatch test bench # run benchmarks
$ hatch build # build a source distribution and wheel
$ hatch publish # publish build artifacts to PyPI
```
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2020 Michael Wayne Goodman
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
<p align="center">
<img src="https://raw.githubusercontent.com/goodmami/wn/main/docs/_static/wn-logo.svg" alt="Wn logo">
<br>
<strong>a Python library for wordnets</strong>
<br>
<a href="https://pypi.org/project/wn/"><img src="https://img.shields.io/pypi/v/wn.svg?style=flat-square" alt="PyPI link"></a>
<img src="https://img.shields.io/pypi/pyversions/wn.svg?style=flat-square" alt="Python Support">
<a href="https://github.com/goodmami/wn/actions?query=workflow%3A%22tests%22"><img src="https://github.com/goodmami/wn/workflows/tests/badge.svg" alt="tests"></a>
<a href="https://wn.readthedocs.io/en/latest/?badge=latest"><img src="https://readthedocs.org/projects/wn/badge/?version=latest&style=flat-square" alt="Documentation Status"></a>
<br>
<a href="https://github.com/goodmami/wn#available-wordnets">Available Wordnets</a>
| <a href="https://wn.readthedocs.io/">Documentation</a>
| <a href="https://wn.readthedocs.io/en/latest/faq.html">FAQ</a>
| <a href="https://wn.readthedocs.io/en/latest/guides/nltk-migration.html">Migrating from NLTK</a>
| <a href="https://github.com/goodmami/wn#citation">Citation</a>
</p>
---
Wn is a Python library for exploring information in wordnets.
## Installation
Install it from PyPI using **pip**:
```sh
pip install wn
```
or **uv**:
```
uv add wn
```
> [!IMPORTANT]
> Existing users of Wn may encounter an error about an incompatible database schema.
> The remedy is to rebuild the database. There is a new function to help with this:
> ```pycon
> >>> wn.reset_database(rebuild=True) # re-add any indexed lexicons
> ```
> or
> ```pycon
> >>> wn.reset_database() # initialize without re-adding; start from scratch
> ```
## Getting Started
First, download some data:
```sh
python -m wn download oewn:2025+ # the Open English WordNet 2025+
```
Now start exploring:
```python
>>> import wn
>>> en = wn.Wordnet('oewn:2025+') # Create Wordnet object to query
>>> ss = en.synsets('win', pos='v')[0] # Get the first synset for 'win'
>>> ss.definition() # Get the synset's definition
'be the winner in a contest or competition; be victorious'
```
## Features
- Multilingual by design; first-class support for wordnets in any language
- Interlingual queries via the [Collaborative Interlingual Index](https://github.com/globalwordnet/cili/)
- Six [similarity metrics](https://wn.readthedocs.io/en/latest/api/wn.similarity.html)
- Functions for [exploring taxonomies](https://wn.readthedocs.io/en/latest/api/wn.taxonomy.html)
- Support for [lemmatization] ([Morphy] for English is built-in) and unicode [normalization]
- Full support of the [WN-LMF 1.4](https://globalwordnet.github.io/schemas/) format, including word pronunciations and lexicon extensions
- SQL-based backend offers very fast startup and improved performance on many kinds of queries
[lemmatization]: https://wn.readthedocs.io/en/latest/guides/lemmatization.html#lemmatization
[normalization]: https://wn.readthedocs.io/en/latest/guides/lemmatization.html#normalization
[Morphy]: https://wn.readthedocs.io/en/latest/api/wn.morphy.html
## Available Wordnets
Any WN-LMF-formatted wordnet can be added to Wn's database from a local
file or remote URL, but Wn also maintains an index (see
[wn/index.toml](https://github.com/goodmami/wn/blob/main/wn/index.toml))
of available projects, similar to a package manager for software, to aid
in the discovery and downloading of new wordnets. The projects in this
index are listed below.
### English Wordnets
There are several English wordnets available. In general it is
recommended to use the latest [Open English Wordnet], but if you have
stricter compatibility needs for, e.g., experiment replicability, you
may try the [OMW English Wordnet based on WordNet 3.0] (compatible with
the Princeton WordNet 3.0 and with the [NLTK]), or [OpenWordnet-EN] (for
use with the Portuguese wordnet [OpenWordnet-PT]).
| Name | Specifier | # Synsets | Notes |
| -------------------------------------------- | ---------------------- | --------: | ----- |
| [Open English WordNet] | `oewn:2025+`<br/> `oewn:2025`</br> `oewn:2024`<br/> `oewn:2023`<br/> `oewn:2022`<br/> `oewn:2021`<br/> `ewn:2020`<br/> `ewn:2019` | 120564<br/>107519<br/>120630<br/>120135<br/>120068<br/>120039<br/>120053<br/>117791 | ← Recommended<br/> <br/> <br/> <br/> <br/> <br/> <br/> |
| [OMW English Wordnet based on WordNet 1.5] | `omw-en15:2.0` | 91591 | |
| [OMW English Wordnet based on WordNet 1.6] | `omw-en16:2.0` | 99642 | |
| [OMW English Wordnet based on WordNet 1.7] | `omw-en17:2.0` | 109377 | |
| [OMW English Wordnet based on WordNet 1.7.1] | `omw-en171:2.0` | 111223 | |
| [OMW English Wordnet based on WordNet 2.0] | `omw-en20:2.0` | 115424 | |
| [OMW English Wordnet based on WordNet 2.1] | `omw-en21:2.0` | 117597 | |
| [OMW English Wordnet based on WordNet 3.0] | `omw-en:2.0`</br> `omw-en:1.4` | 117659</br> 117659 | Included with `omw:2.0`<br/> Included with `omw:1.4` |
| [OMW English Wordnet based on WordNet 3.1] | `omw-en31:2.0`</br> `omw-en31:1.4` | 117791</br> 117791 | |
| [OpenWordnet-EN] | `own-en:1.0.0` | 117659 | Included with `own:1.0.0` |
[Open English WordNet]: https://en-word.net
[Open Multilingual Wordnet]: https://github.com/omwn
[OMW English Wordnet based on WordNet 1.5]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 1.6]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 1.7]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 1.7.1]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 2.0]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 2.1]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 3.0]: https://github.com/omwn/omw-data
[OMW English Wordnet based on WordNet 3.1]: https://github.com/omwn/omw-data
[OpenWordnet-EN]: https://github.com/own-pt/openWordnet-PT
[OpenWordnet-PT]: https://github.com/own-pt/openWordnet-PT
[NLTK]: https://www.nltk.org/
### Other Wordnets and Collections
These are standalone non-English wordnets and collections. The wordnets
of each collection are listed further down.
| Name | Specifier | # Synsets | Language |
| ------------------------------------------ | ----------------------------- | --------------: | ---------------- |
| [Open Multilingual Wordnet] | `omw:1.4` | n/a | multiple [[mul]] |
| [Open German WordNet] | `odenet:1.4`<br/>`odenet:1.3` | 36268<br/>36159 | German [de] |
| [Open Wordnets for Portuguese and English] | `own:1.0.0` | n/a | multiple [[mul]] |
| [KurdNet] | `kurdnet:1.0` | 2144 | Kurdish [ckb] |
[Open English WordNet]: https://github.com/globalwordnet/english-wordnet
[Open Multilingual Wordnet]: https://github.com/omwn
[OMW English Wordnet based on WordNet 3.0]: https://github.com/omwn
[OMW English Wordnet based on WordNet 3.1]: https://github.com/omwn
[Open German WordNet]: https://github.com/hdaSprachtechnologie/odenet
[Open Wordnets for Portuguese and English]: https://github.com/own-pt
[mul]: https://iso639-3.sil.org/code/mul
[KurdNet]: https://sinaahmadi.github.io/resources/kurdnet.html
### Open Multilingual Wordnet (OMW) Collection
The *Open Multilingual Wordnet* collection (`omw:1.4`) installs the
following lexicons (from
[here](https://github.com/omwn/omw-data/releases/tag/v1.4)) which can
also be downloaded and installed independently:
| Name | Specifier | # Synsets | Language |
| ---------------------------------------- | -------------------------------- | -----------------: | -------------------------------- |
| Albanet | `omw-sq:2.0`<br/> `omw-sq:1.4` | 4679<br/> 4675 | Albanian [sq] |
| Arabic WordNet (AWN v2) | `omw-arb:2.0`<br/> `omw-arb:1.4` | 9916<br/> 9916 | Arabic [arb] |
| BulTreeBank Wordnet (BTB-WN) | `omw-bg:2.0`<br/> `omw-bg:1.4` | 4959<br/> 4959 | Bulgarian [bg] |
| Chinese Open Wordnet | `omw-cmn:2.0`<br/> `omw-cmn:1.4` | 42300<br/> 42312 | Mandarin (Simplified) [cmn-Hans] |
| Croatian Wordnet | `omw-hr:2.0`<br/> `omw-hr:1.4` | 23115<br/> 23120 | Croatian [hr] |
| DanNet | `omw-da:2.0`<br/> `omw-da:1.4` | 4476<br/> 4476 | Danish [da] |
| FinnWordNet | `omw-fi:2.0`<br/> `omw-fi:1.4` | 116763<br/> 116763 | Finnish [fi] |
| Greek Wordnet | `omw-el:2.0`<br/> `omw-el:1.4` | 18113<br/> 18049 | Greek [el] |
| Hebrew Wordnet | `omw-he:2.0`<br/> `omw-he:1.4` | 5448<br/> 5448 | Hebrew [he] |
| IceWordNet | `omw-is:2.0`<br/> `omw-is:1.4` | 4951<br/> 4951 | Icelandic [is] |
| Italian Wordnet | `omw-iwn:2.0`<br/> `omw-iwn:1.4` | 15563<br/> 15563 | Italian [it] |
| Japanese Wordnet | `omw-ja:2.0`<br/> `omw-ja:1.4` | 117659<br/> 57184 | Japanese [ja] |
| Lithuanian WordNet | `omw-lt:2.0`<br/> `omw-lt:1.4` | 9462<br/> 9462 | Lithuanian [lt] |
| Multilingual Central Repository | `omw-ca:2.0`<br/> `omw-ca:1.4` | 60765<br/> 45826 | Catalan [ca] |
| Multilingual Central Repository | `omw-eu:2.0`<br/> `omw-eu:1.4` | 29420<br/> 29413 | Basque [eu] |
| Multilingual Central Repository | `omw-gl:2.0`<br/> `omw-gl:1.4` | 34776<br/> 19312 | Galician [gl] |
| Multilingual Central Repository | `omw-es:2.0`<br/> `omw-es:1.4` | 78948<br/> 38512 | Spanish [es] |
| MultiWordNet | `omw-it:2.0`<br/> `omw-it:1.4` | 35001<br/> 35001 | Italian [it] |
| Norwegian Wordnet | `omw-nb:2.0`<br/> `omw-nb:1.4` | 4455<br/> 4455 | Norwegian (Bokmål) [nb] |
| Norwegian Wordnet | `omw-nn:2.0`<br/> `omw-nn:1.4` | 3671<br/> 3671 | Norwegian (Nynorsk) [nn] |
| OMW English Wordnet based on WordNet 3.0 | `omw-en:2.0`<br/> `omw-en:1.4` | 117659<br/> 117659 | English [en] |
| Open Dutch WordNet | `omw-nl:2.0`<br/> `omw-nl:1.4` | 30177<br/> 30177 | Dutch [nl] |
| OpenWN-PT | `omw-pt:2.0`<br/> `omw-pt:1.4` | 43895<br/> 43895 | Portuguese [pt] |
| plWordNet | `omw-pl:2.0`<br/> `omw-pl:1.4` | 33826<br/> 33826 | Polish [pl] |
| Romanian Wordnet | `omw-ro:2.0`<br/> `omw-ro:1.4` | 58754<br/> 56026 | Romanian [ro] |
| Slovak WordNet | `omw-sk:2.0`<br/> `omw-sk:1.4` | 18507<br/> 18507 | Slovak [sk] |
| sloWNet | `omw-sl:2.0`<br/> `omw-sl:1.4` | 42590<br/> 42583 | Slovenian [sl] |
| Swedish (SALDO) | `omw-sv:2.0`<br/> `omw-sv:1.4` | 6796<br/> 6796 | Swedish [sv] |
| Thai Wordnet | `omw-th:2.0`<br/> `omw-th:1.4` | 73350<br/> 73350 | Thai [th] |
| WOLF (Wordnet Libre du Français) | `omw-fr:2.0`<br/> `omw-fr:1.4` | 59091<br/> 59091 | French [fr] |
| Wordnet Bahasa | `omw-id:2.0`<br/> `omw-id:1.4` | 46774<br/> 38085 | Indonesian [id] |
| Wordnet Bahasa | `omw-zsm:2.0`<br/> `omw-zsm:1.4` | 36911<br/> 36911 | Malaysian [zsm] |
### Open Wordnet (OWN) Collection
The *Open Wordnets for Portuguese and English* collection (`own:1.0.0`)
installs the following lexicons (from
[here](https://github.com/own-pt/openWordnet-PT/releases/tag/v1.0.0))
which can also be downloaded and installed independently:
| Name | Specifier | # Synsets | Language |
| -------------- | -------------- | --------: | --------------- |
| OpenWordnet-PT | `own-pt:1.0.0` | 52670 | Portuguese [pt] |
| OpenWordnet-EN | `own-en:1.0.0` | 117659 | English [en] |
### Collaborative Interlingual Index
While not a wordnet, the [Collaborative Interlingual Index] (CILI)
represents the interlingual backbone of many wordnets. Wn, including
interlingual queries, will function without CILI loaded, but adding it
to the database makes available the full list of concepts, their status
(active, deprecated, etc.), and their definitions.
| Name | Specifier | # Concepts |
| ---------------------------------- | ---------- | ---------: |
| [Collaborative Interlingual Index] | `cili:1.0` | 117659 |
[Collaborative Interlingual Index]: https://github.com/globalwordnet/cili/
## Changes to the Index
### `ewn` → `oewn`
The 2021 version of the *Open English WordNet* (`oewn:2021`) has
changed its lexicon ID from `ewn` to `oewn`, so the index is updated
accordingly. The previous versions are still available as `ewn:2019`
and `ewn:2020`.
### `pwn` → `omw-en`, `omw-en31`
The wordnet formerly called the *Princeton WordNet* (`pwn:3.0`,
`pwn:3.1`) is now called the *OMW English Wordnet based on WordNet
3.0* (`omw-en`) and the *OMW English Wordnet based on WordNet 3.1*
(`omw-en31`). This is more accurate, as it is a OMW-produced
derivative of the original WordNet data, and it also avoids license or
trademark issues.
### `*wn` → `omw-*` for OMW wordnets
All OMW wordnets have changed their ID scheme from `...wn` to `omw-..` and the version no longer
includes `+omw` (e.g., `bulwn:1.3+omw` is now `omw-bg:1.4`).
## Citation
Michael Wayne Goodman and Francis Bond. 2021. [Intrinsically Interlingual: The Wn Python Library for Wordnets](https://aclanthology.org/2021.gwc-1.12/) In *Proceedings of the 11th Global Wordnet Conference*, pages 100–107, University of South Africa (UNISA). Global Wordnet Association.
================================================
FILE: bench/README.md
================================================
# Wn Benchmarking
This directory contains code and data for running benchmarks for
Wn. The benchmarks are implemented using
[pytest-benchmarks](https://github.com/ionelmc/pytest-benchmark/), so
they are run using pytest as follows (from the top-level project
directory):
```console
$ hatch test bench/ # run the benchmarks
$ hatch test bench/ --benchmark-autosave # run benchmarks and store results
$ hatch test bench/ --benchmark-compare # run benchmarks and compare to stored result
$ hatch test -- --help # get help on options (look for those prefixed `--benchmark-`)
```
Notes:
* The tests are not exhaustive; when making a change that may affect
performance, consider making a new test if one doesn't exist
already. It would be helpful to check in the test to Git, but not
the benchmark results since those are dependent on the machine.
* Benchmark the code before and after the changes. Store the results
locally for comparison.
* Ensure the testing environment has a steady load (wait for
long-running processes to finish, close any active web browser tabs,
etc.) prior to and while running the test.
* Expect high variance for IO-bound tasks.
================================================
FILE: bench/conftest.py
================================================
from collections.abc import Iterator
from itertools import cycle, product
from pathlib import Path
import pytest
import wn
from wn import lmf
@pytest.fixture
def clean_db():
def clean_db():
wn.remove("*")
dummy_lex = lmf.Lexicon(
id="dummy",
version="1",
label="placeholder to initialize the db",
language="zxx",
email="",
license="",
)
wn.add_lexical_resource(
lmf.LexicalResource(lmf_version="1.3", lexicons=[dummy_lex])
)
return clean_db
@pytest.fixture(scope="session")
def datadir():
return Path(__file__).parent.parent / "tests" / "data"
@pytest.fixture
def empty_db(clean_db, tmp_path):
dir = tmp_path / "wn_data_empty"
with pytest.MonkeyPatch.context() as m:
m.setattr(wn.config, "data_directory", dir)
clean_db()
yield
@pytest.fixture(scope="session")
def mock_lmf():
synsets: list[lmf.Synset] = [
*_make_synsets("n", 20000),
*_make_synsets("v", 10000),
*_make_synsets("a", 2000),
*_make_synsets("r", 1000),
]
entries = _make_entries(synsets)
lexicon = lmf.Lexicon(
id="mock",
version="1",
label="",
language="zxx",
email="",
license="",
entries=entries,
synsets=synsets,
)
return lmf.LexicalResource(lmf_version="1.3", lexicons=[lexicon])
@pytest.fixture(scope="session")
def mock_db_dir(mock_lmf, tmp_path_factory):
dir = tmp_path_factory.mktemp("wn_data_empty")
with pytest.MonkeyPatch.context() as m:
m.setattr(wn.config, "data_directory", dir)
wn.add_lexical_resource(mock_lmf, progress_handler=None)
wn._db.clear_connections()
return Path(dir)
@pytest.fixture
def mock_db(monkeypatch, mock_db_dir):
with monkeypatch.context() as m:
m.setattr(wn.config, "data_directory", mock_db_dir)
yield
wn._db.clear_connections()
def _make_synsets(pos: str, n: int) -> list[lmf.Synset]:
synsets: list[lmf.Synset] = [
lmf.Synset(
id=f"{i}-{pos}",
ili="",
partOfSpeech=pos,
relations=[],
meta={},
)
for i in range(1, n + 1)
]
# add relations for nouns and verbs
if pos in "nv":
total = len(synsets)
tgt_i = 1 # index of next target synset
n = cycle([2]) # how many targets to relate
for cur_i in range(total):
if tgt_i <= cur_i:
tgt_i = cur_i + 1
source = synsets[cur_i]
for cur_k in range(tgt_i, tgt_i + next(n)):
if cur_k >= total:
break
target = synsets[cur_k]
source["relations"].append(
lmf.Relation(target=target["id"], relType="hyponym", meta={})
)
target["relations"].append(
lmf.Relation(target=source["id"], relType="hypernym", meta={})
)
tgt_i = cur_k + 1
return synsets
def _words() -> Iterator[str]:
consonants = "kgtdpbfvszrlmnhw"
vowels = "aeiou"
while True:
yield from map("".join, product(consonants, vowels, consonants, vowels))
def _make_entries(synsets: list[lmf.Synset]) -> list[lmf.LexicalEntry]:
words = _words()
member_count = cycle(range(1, 4)) # 1, 2, or 3 synset members
entries: dict[str, lmf.LexicalEntry] = {}
prev_synsets: list[lmf.Synset] = []
for synset in synsets:
ssid = synset["id"]
pos = synset["partOfSpeech"]
for _ in range(next(member_count)):
word = next(words)
senses = [lmf.Sense(id=f"{word}-{ssid}", synset=ssid, meta={})]
# add some polysemy
if prev_synsets:
ssid2 = prev_synsets.pop()["id"]
senses.append(lmf.Sense(id=f"{word}-{ssid2}", synset=ssid2, meta={}))
eid = f"{word}-{pos}"
if eid not in entries:
entries[eid] = lmf.LexicalEntry(
id=eid,
lemma=lmf.Lemma(
writtenForm=word,
partOfSpeech=pos,
),
senses=[],
meta={},
)
entries[eid]["senses"].extend(senses)
prev_synsets.append(synset)
return list(entries.values())
================================================
FILE: bench/test_bench.py
================================================
import pytest
import wn
from wn import lmf
@pytest.mark.benchmark(group="lmf.load", warmup=True)
def test_load(datadir, benchmark):
benchmark(lmf.load, datadir / "mini-lmf-1.0.xml")
@pytest.mark.benchmark(group="wn.add_lexical_resource")
@pytest.mark.usefixtures("empty_db")
def test_add_lexical_resource(mock_lmf, benchmark):
# TODO: when pytest-benchmark's teardown option is released, use
# that here with more rounds
benchmark.pedantic(
wn.add_lexical_resource,
args=(mock_lmf,),
# teardown=clean_db,
iterations=1,
rounds=1,
)
@pytest.mark.benchmark(group="wn.add_lexical_resource")
@pytest.mark.usefixtures("empty_db")
def test_add_lexical_resource_no_progress(mock_lmf, benchmark):
# TODO: when pytest-benchmark's teardown option is released, use
# that here with more rounds
benchmark.pedantic(
wn.add_lexical_resource,
args=(mock_lmf,),
kwargs={"progress_handler": None},
# teardown=clean_db,
iterations=1,
rounds=1,
)
@pytest.mark.benchmark(group="primary queries")
@pytest.mark.usefixtures("mock_db")
def test_synsets(benchmark):
benchmark(wn.synsets)
@pytest.mark.benchmark(group="primary queries")
@pytest.mark.usefixtures("mock_db")
def test_words(benchmark):
benchmark(wn.words)
@pytest.mark.benchmark(group="secondary queries")
@pytest.mark.usefixtures("mock_db")
def test_word_senses_no_wordnet(benchmark):
word = wn.words()[0]
benchmark(word.senses)
@pytest.mark.benchmark(group="secondary queries")
@pytest.mark.usefixtures("mock_db")
def test_word_senses_with_wordnet(benchmark):
w = wn.Wordnet("mock:1")
word = w.words()[0]
benchmark(word.senses)
================================================
FILE: docs/.readthedocs.yaml
================================================
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
formats:
- pdf
- epub
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/_static/css/svg.css
================================================
svg {
width: 500px;
height: 300px;
position: relative;
left: 20%;
-webkit-transform: translateX(-20%);
-ms-transform: translateX(-20%);
transform: translateX(-20%);
}
================================================
FILE: docs/_static/demo.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"# Wn Demonstration\n",
"\n",
"This is a demonstration of the [Wn](https://github.com/goodmami/wn/) library for working with wordnets in Python. To run this notebook locally, you will need to install the `wn` and `jupyter` packages, and download some wordnet data:\n",
"\n",
"* Linux/macOS\n",
"\n",
" ```console\n",
" $ python3 -m pip install wn jupyter\n",
" $ python3 -m wn download omw oewn:2021\n",
" ```\n",
" \n",
"* Windows\n",
"\n",
" ```console\n",
" > py -3 -m pip install wn jupyter\n",
" > py -3 -m wn download omw oewn:2021\n",
" ```\n",
"\n",
"Now you should be able to import the `wn` package:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import wn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Primary Queries\n",
"\n",
"A **primary query** of the database is when basic parameters such as word forms, parts of speech, or public identifiers (e.g., synset IDs) are used to retrieve basic wordnet entities. You can perform these searches via module-level functions such as [wn.words()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.words), [wn.senses()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.senses), and [wn.synsets()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.synsets):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('oewn-Malacca-n')]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.words(\"Malacca\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('oewn-08985168-n')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.synsets(\"Malacca\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtering by Language / Lexicon\n",
"\n",
"Once you've added multiple wordnets, however, you will often get many results for such queries. If that's not clear, then the following will give you some idea(s):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-idea-n'),\n",
" Word('omw-sk-idea-n'),\n",
" Word('omw-pl-idea-n'),\n",
" Word('omw-is-ídea-n'),\n",
" Word('omw-zsm-idea-n'),\n",
" Word('omw-iwn-idea-n'),\n",
" Word('omw-it-idea-n'),\n",
" Word('omw-gl-idea-n'),\n",
" Word('omw-fi-idea-n'),\n",
" Word('omw-ca-idea-n'),\n",
" Word('omw-eu-idea-n'),\n",
" Word('omw-es-idea-n'),\n",
" Word('oewn-idea-n')]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.words(\"idea\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can filter down the results by language, but that may not be enough if you have multiple wordnets for the same language (e.g., the [OMW English Wordnet based on WordNet 3.0](https://github.com/omwn/omw-data/) and the [Open English WordNet](https://en-word.net/)):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-idea-n'), Word('oewn-idea-n')]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.words(\"idea\", lang=\"en\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [wn.lexicons()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.lexicons) function can show which lexicons have been added for a language:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<Lexicon omw-en:1.4 [en]>, <Lexicon oewn:2021 [en]>]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.lexicons(lang=\"en\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the `id:version` string to restrict queries to a particular lexicon:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-idea-n')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wn.words(\"idea\", lexicon=\"omw-en:1.4\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But it can become tedious to enter these specifiers each time. Instead, a [wn.Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object can be used to make the language/lexicon filters persistent:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-idea-n')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en = wn.Wordnet(lexicon=\"omw-en:1.4\")\n",
"en.words(\"idea\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtering by Word Form and Part of Speech\n",
"\n",
"Even within a single lexicon a word may return multiple results:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-pencil-n'), Word('omw-en-pencil-v')]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.words(\"pencil\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can restrict results by part of speech, as well. E.g., to get the verbal sense of *pencil* (e.g., *to pencil in an appointment*), use the `pos` filter:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Word('omw-en-pencil-v')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.words(\"pencil\", pos=\"v\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This works for getting senses and synsets, too:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Sense('omw-en-pencil-03908204-n'),\n",
" Sense('omw-en-pencil-14796748-n'),\n",
" Sense('omw-en-pencil-13863020-n'),\n",
" Sense('omw-en-pencil-03908456-n'),\n",
" Sense('omw-en-pencil-01688604-v')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.senses(\"pencil\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Sense('omw-en-pencil-01688604-v')]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.senses(\"pencil\", pos=\"v\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-en-01688604-v')]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.synsets(\"pencil\", pos=\"v\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The wordform itself is just a filter on the results. Leaving it off, you can get all results for a particular part of speech:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11531"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(en.words(pos=\"v\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or all results, regardless of the part of speech:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"156584"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(en.words())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Secondary Queries\n",
"\n",
"**Secondary queries** are used when you want to get additional information from a retrieved entity, such as the forms of a word or the definition of a synset. They are also used for finding links between entities, such as the senses of a word or the relations of a sense or synset."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'pencil'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil = en.words(\"pencil\", pos=\"v\")[0]\n",
"pencil.lemma()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['pencil', 'pencilled', 'pencilling']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil.forms()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'v'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil.pos"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Sense('omw-en-pencil-01688604-v')]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil.senses()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Synset('omw-en-01688604-v')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil.senses()[0].synset()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-en-01688604-v')]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil.synsets() # shorthand for the above"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'write, draw, or trace with a pencil'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss = pencil.synsets()[0]\n",
"pencil_ss.definition()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['he penciled a figure']"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.examples()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-en-01690294-v')]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.hypernyms()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['draw']"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.hypernyms()[0].lemmas()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taxonomy Queries\n",
"\n",
"A common usage of wordnets is exploring the taxonomic structure via hypernym and hyponym relations. These operations thus have some more dedicated functions. For instance, path functions show the synsets from the starting synset to some other synset or the taxonomic root, such as [Synset.hypernym_paths()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.hypernym_paths):"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Synset('omw-en-01690294-v') ['draw']\n",
" Synset('omw-en-01686132-v') ['represent', 'interpret']\n",
" Synset('omw-en-01619354-v') ['re-create']\n",
" Synset('omw-en-01617192-v') ['make', 'create']\n"
]
}
],
"source": [
"for path in pencil_ss.hypernym_paths():\n",
" for i, ss in enumerate(path):\n",
" print(\" \" * i, ss, ss.lemmas())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Paths do not include the starting synset, so the length of the path (i.e., number of edges) is the length of the list of synsets. The length from a synset to the root is called the *depth*. However, as some synsets have multiple paths to the root, there is not always one single depth. Instead, the [Synset.min_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.min_depth) and [Synset.max_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.max_depth) methods find the lengths of the shortest and longest paths."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dog = en.synsets(\"dog\", pos=\"n\")[0]\n",
"len(dog.hypernym_paths()) # two paths"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(8, 13)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dog.min_depth(), dog.max_depth()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is also possible to find paths between two synsets by their lowest common hypernym (also called *least common subsumer*). Here I compare the verbs *pencil* and *pen*:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Synset('omw-en-01697816-v') ['create verbally']\n",
" Synset('omw-en-01617192-v') ['make', 'create']\n"
]
}
],
"source": [
"pen_ss = en.synsets(\"pen\", pos=\"v\")[0]\n",
"for path in pen_ss.hypernym_paths():\n",
" for i, ss in enumerate(path):\n",
" print(\" \" * i, ss, ss.lemmas())"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-en-01617192-v')]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.lowest_common_hypernyms(pen_ss)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Synset('omw-en-01690294-v') ['draw']\n",
"Synset('omw-en-01686132-v') ['represent', 'interpret']\n",
"Synset('omw-en-01619354-v') ['re-create']\n",
"Synset('omw-en-01617192-v') ['make', 'create']\n",
"Synset('omw-en-01697816-v') ['create verbally']\n",
"Synset('omw-en-01698271-v') ['write', 'compose', 'pen', 'indite']\n"
]
}
],
"source": [
"for ss in pencil_ss.shortest_path(pen_ss):\n",
" print(ss, ss.lemmas())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Interlingual Queries\n",
"\n",
"In Wn, each wordnet (lexicon) added to the database is given its own, independent structure. All queries that traverse across wordnets make use of the Interlingual index (ILI) on synsets."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'a thin cylindrical pointed writing implement; a rod of marking substance encased in wood'"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss = en.synsets(\"pencil\", pos=\"n\")[0] # for this we'll use the nominal sense\n",
"pencil_ss.definition()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get the corresponding words, senses, or synsets in some other lexicon, use the [Word.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Word.translate), [Sense.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Sense.translate), and [Synset.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.translate) functions. Of these, the function on the sense is the most natural, as it translates a specific meaning of a specific word, although all translations go through the synsets. As a word may have many senses, translating a word returns a mapping of each sense to its list of translations."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['lapis', 'matita']"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.translate(lang=\"it\")[0].lemmas()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['ペンシル', '木筆', '鉛筆']"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pencil_ss.translate(lexicon=\"omw-ja\")[0].lemmas()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{Sense('omw-en-pencil-03908204-n'): [Word('omw-ja-ペンシル-n'),\n",
" Word('omw-ja-木筆-n'),\n",
" Word('omw-ja-鉛筆-n')],\n",
" Sense('omw-en-pencil-14796748-n'): [Word('omw-ja-鉛筆-n')],\n",
" Sense('omw-en-pencil-13863020-n'): [],\n",
" Sense('omw-en-pencil-03908456-n'): [Word('omw-ja-ペンシル-n')]}"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"en.words(\"pencil\", pos=\"n\")[0].translate(lexicon=\"omw-ja\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interlingual synsets are also used to traversing relations from another wordnet. For instance, many of the lexicons in the [Open Multilingual Wordnet](https://github.com/omwn/omw-data) were created using the *expand* method where only words were translated on top of Princeton WordNet synsets. All relations (hypernyms, hyponyms, etc.) then depend on those from WordNet. In Wn, a [Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object may be instantiated with an `expand` parameter which selects lexicons containing such relations. By default, all lexicons are used (i.e., `expand='*'`), but you can tell Wn to not use any expand lexicons (`expand=''`) or to use a specific lexicon (`expand='omw-en:1.4'`). By being specific, you can better control the behaviour of your program, e.g., for experimental reproducibility."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-ja-14796575-n')]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# by default, any other installed lexicon may be used\n",
"wn.Wordnet(lexicon=\"omw-ja\").synsets(\"鉛筆\")[0].hypernyms()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# disable interlingual query expansion\n",
"wn.Wordnet(lexicon=\"omw-ja\", expand=\"\").synsets(\"鉛筆\")[0].hypernyms()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Synset('omw-ja-14796575-n')]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# specify the expand set\n",
"wn.Wordnet(lexicon=\"omw-ja\", expand=\"omw-en:1.4\").synsets(\"鉛筆\")[0].hypernyms()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/api/wn.compat.rst
================================================
wn.compat
=========
Compatibility modules for Wn.
This subpackage is a namespace for compatibility modules when working
with particular lexicons. Wn is designed to be agnostic to the
language or lexicon and not favor one over the other (with the
exception of :mod:`wn.morphy`, which is English-specific). However,
there are some kinds of functionality that would be useful to
include in Wn, even if they don't generalize to all lexicons.
Included modules
----------------
.. toctree::
:maxdepth: 1
wn.compat.sensekey.rst
================================================
FILE: docs/api/wn.compat.sensekey.rst
================================================
wn.compat.sensekey
==================
.. automodule:: wn.compat.sensekey
.. autofunction:: escape
.. autofunction:: unescape
.. autofunction:: sense_key_getter
.. autofunction:: sense_getter
================================================
FILE: docs/api/wn.constants.rst
================================================
wn.constants
============
.. automodule:: wn.constants
Synset Relations
----------------
.. data:: SYNSET_RELATIONS
- ``agent``
- ``also``
- ``attribute``
- ``be_in_state``
- ``causes``
- ``classified_by``
- ``classifies``
- ``co_agent_instrument``
- ``co_agent_patient``
- ``co_agent_result``
- ``co_instrument_agent``
- ``co_instrument_patient``
- ``co_instrument_result``
- ``co_patient_agent``
- ``co_patient_instrument``
- ``co_result_agent``
- ``co_result_instrument``
- ``co_role``
- ``direction``
- ``domain_region``
- ``domain_topic``
- ``exemplifies``
- ``entails``
- ``eq_synonym``
- ``has_domain_region``
- ``has_domain_topic``
- ``is_exemplified_by``
- ``holo_location``
- ``holo_member``
- ``holo_part``
- ``holo_portion``
- ``holo_substance``
- ``holonym``
- ``hypernym``
- ``hyponym``
- ``in_manner``
- ``instance_hypernym``
- ``instance_hyponym``
- ``instrument``
- ``involved``
- ``involved_agent``
- ``involved_direction``
- ``involved_instrument``
- ``involved_location``
- ``involved_patient``
- ``involved_result``
- ``involved_source_direction``
- ``involved_target_direction``
- ``is_caused_by``
- ``is_entailed_by``
- ``location``
- ``manner_of``
- ``mero_location``
- ``mero_member``
- ``mero_part``
- ``mero_portion``
- ``mero_substance``
- ``meronym``
- ``similar``
- ``other``
- ``patient``
- ``restricted_by``
- ``restricts``
- ``result``
- ``role``
- ``source_direction``
- ``state_of``
- ``target_direction``
- ``subevent``
- ``is_subevent_of``
- ``antonym``
- ``feminine``
- ``has_feminine``
- ``masculine``
- ``has_masculine``
- ``young``
- ``has_young``
- ``diminutive``
- ``has_diminutive``
- ``augmentative``
- ``has_augmentative``
- ``anto_gradable``
- ``anto_simple``
- ``anto_converse``
- ``ir_synonym``
Sense Relations
---------------
.. data:: SENSE_RELATIONS
- ``antonym``
- ``also``
- ``participle``
- ``pertainym``
- ``derivation``
- ``domain_topic``
- ``has_domain_topic``
- ``domain_region``
- ``has_domain_region``
- ``exemplifies``
- ``is_exemplified_by``
- ``similar``
- ``other``
- ``feminine``
- ``has_feminine``
- ``masculine``
- ``has_masculine``
- ``young``
- ``has_young``
- ``diminutive``
- ``has_diminutive``
- ``augmentative``
- ``has_augmentative``
- ``anto_gradable``
- ``anto_simple``
- ``anto_converse``
- ``simple_aspect_ip``
- ``secondary_aspect_ip``
- ``simple_aspect_pi``
- ``secondary_aspect_pi``
.. data:: SENSE_SYNSET_RELATIONS
- ``domain_topic``
- ``domain_region``
- ``exemplifies``
- ``other``
.. data:: REVERSE_RELATIONS
.. code-block:: python
{
'hypernym': 'hyponym',
'hyponym': 'hypernym',
'instance_hypernym': 'instance_hyponym',
'instance_hyponym': 'instance_hypernym',
'antonym': 'antonym',
'eq_synonym': 'eq_synonym',
'similar': 'similar',
'meronym': 'holonym',
'holonym': 'meronym',
'mero_location': 'holo_location',
'holo_location': 'mero_location',
'mero_member': 'holo_member',
'holo_member': 'mero_member',
'mero_part': 'holo_part',
'holo_part': 'mero_part',
'mero_portion': 'holo_portion',
'holo_portion': 'mero_portion',
'mero_substance': 'holo_substance',
'holo_substance': 'mero_substance',
'also': 'also',
'state_of': 'be_in_state',
'be_in_state': 'state_of',
'causes': 'is_caused_by',
'is_caused_by': 'causes',
'subevent': 'is_subevent_of',
'is_subevent_of': 'subevent',
'manner_of': 'in_manner',
'in_manner': 'manner_of',
'attribute': 'attribute',
'restricts': 'restricted_by',
'restricted_by': 'restricts',
'classifies': 'classified_by',
'classified_by': 'classifies',
'entails': 'is_entailed_by',
'is_entailed_by': 'entails',
'domain_topic': 'has_domain_topic',
'has_domain_topic': 'domain_topic',
'domain_region': 'has_domain_region',
'has_domain_region': 'domain_region',
'exemplifies': 'is_exemplified_by',
'is_exemplified_by': 'exemplifies',
'role': 'involved',
'involved': 'role',
'agent': 'involved_agent',
'involved_agent': 'agent',
'patient': 'involved_patient',
'involved_patient': 'patient',
'result': 'involved_result',
'involved_result': 'result',
'instrument': 'involved_instrument',
'involved_instrument': 'instrument',
'location': 'involved_location',
'involved_location': 'location',
'direction': 'involved_direction',
'involved_direction': 'direction',
'target_direction': 'involved_target_direction',
'involved_target_direction': 'target_direction',
'source_direction': 'involved_source_direction',
'involved_source_direction': 'source_direction',
'co_role': 'co_role',
'co_agent_patient': 'co_patient_agent',
'co_patient_agent': 'co_agent_patient',
'co_agent_instrument': 'co_instrument_agent',
'co_instrument_agent': 'co_agent_instrument',
'co_agent_result': 'co_result_agent',
'co_result_agent': 'co_agent_result',
'co_patient_instrument': 'co_instrument_patient',
'co_instrument_patient': 'co_patient_instrument',
'co_result_instrument': 'co_instrument_result',
'co_instrument_result': 'co_result_instrument',
'pertainym': 'pertainym',
'derivation': 'derivation',
'simple_aspect_ip': 'simple_aspect_pi',
'simple_aspect_pi': 'simple_aspect_ip',
'secondary_aspect_ip': 'secondary_aspect_pi',
'secondary_aspect_pi': 'secondary_aspect_ip',
'feminine': 'has_feminine',
'has_feminine': 'feminine',
'masculine': 'has_masculine',
'has_masculine': 'masculine',
'young': 'has_young',
'has_young': 'young',
'diminutive': 'has_diminutive',
'has_diminutive': 'diminutive',
'augmentative': 'has_augmentative',
'has_augmentative': 'augmentative',
'anto_gradable': 'anto_gradable',
'anto_simple': 'anto_simple',
'anto_converse': 'anto_converse',
'ir_synonym': 'ir_synonym',
}
.. _parts-of-speech:
Parts of Speech
---------------
.. data:: PARTS_OF_SPEECH
- ``n`` -- Noun
- ``v`` -- Verb
- ``a`` -- Adjective
- ``r`` -- Adverb
- ``s`` -- Adjective Satellite
- ``t`` -- Phrase
- ``c`` -- Conjunction
- ``p`` -- Adposition
- ``x`` -- Other
- ``u`` -- Unknown
.. autodata:: NOUN
.. autodata:: VERB
.. autodata:: ADJECTIVE
.. data:: ADJ
Alias of :py:data:`ADJECTIVE`
.. autodata:: ADJECTIVE_SATELLITE
.. data:: ADJ_SAT
Alias of :py:data:`ADJECTIVE_SATELLITE`
.. autodata:: PHRASE
.. autodata:: CONJUNCTION
.. data:: CONJ
Alias of :py:data:`CONJUNCTION`
.. autodata:: ADPOSITION
.. autodata:: ADP
Alias of :py:data:`ADPOSITION`
.. autodata:: OTHER
.. autodata:: UNKNOWN
Adjective Positions
-------------------
.. data:: ADJPOSITIONS
- ``a`` -- Attributive
- ``ip`` -- Immediate Postnominal
- ``p`` -- Predicative
Lexicographer Files
-------------------
.. data:: LEXICOGRAPHER_FILES
.. code-block:: python
{
'adj.all': 0,
'adj.pert': 1,
'adv.all': 2,
'noun.Tops': 3,
'noun.act': 4,
'noun.animal': 5,
'noun.artifact': 6,
'noun.attribute': 7,
'noun.body': 8,
'noun.cognition': 9,
'noun.communication': 10,
'noun.event': 11,
'noun.feeling': 12,
'noun.food': 13,
'noun.group': 14,
'noun.location': 15,
'noun.motive': 16,
'noun.object': 17,
'noun.person': 18,
'noun.phenomenon': 19,
'noun.plant': 20,
'noun.possession': 21,
'noun.process': 22,
'noun.quantity': 23,
'noun.relation': 24,
'noun.shape': 25,
'noun.state': 26,
'noun.substance': 27,
'noun.time': 28,
'verb.body': 29,
'verb.change': 30,
'verb.cognition': 31,
'verb.communication': 32,
'verb.competition': 33,
'verb.consumption': 34,
'verb.contact': 35,
'verb.creation': 36,
'verb.emotion': 37,
'verb.motion': 38,
'verb.perception': 39,
'verb.possession': 40,
'verb.social': 41,
'verb.stative': 42,
'verb.weather': 43,
'adj.ppl': 44,
}
================================================
FILE: docs/api/wn.ic.rst
================================================
wn.ic
=====
.. automodule:: wn.ic
The mathematical formulae for information content are defined in
`Formal Description`_, and the corresponding Python API function are
described in `Calculating Information Content`_. These functions
require information content weights obtained either by `computing them
from a corpus <Computing Corpus Weights_>`_, or by `loading
pre-computed weights from a file <Reading Pre-computed Information
Content Files_>`_.
.. note::
The term *information content* can be ambiguous. It often, and most
accurately, refers to the result of the :func:`information_content`
function (:math:`\text{IC}(c)` in the mathematical notation), but
is also sometimes used to refer to the corpus frequencies/weights
(:math:`\text{freq}(c)` in the mathematical notation) returned by
:func:`load` or :func:`compute`, as these weights are the basis of
the value computed by :func:`information_content`. The Wn
documentation tries to consistently refer to former as the
*information content value*, or just *information content*, and the
latter as *information content weights*, or *weights*.
Formal Description
------------------
The Information Content (IC) of a concept (synset) is a measure of its
specificity computed from the wordnet's taxonomy structure and corpus
frequencies. It is defined by Resnik 1995 ([RES95]_), following
information theory, as the negative log-probability of a concept:
.. math::
\text{IC}(c) = -\log{p(c)}
A concept's probability is the empirical probability over a corpus:
.. math::
p(c) = \frac{\text{freq}(c)}{N}
Here, :math:`N` is the total count of words of the same category as
concept :math:`c` ([RES95]_ only considered nouns) where each word has
some representation in the wordnet, and :math:`\text{freq}` is defined
as the sum of corpus counts of words in :math:`\text{words}(c)`, which
is the set of words subsumed by concept :math:`c`:
.. math::
\text{freq}(c) = \sum_{w \in \text{words}(c)}{\text{count}(w)}
It is common for :math:`\text{freq}` to not contain actual frequencies
but instead weights distributed evenly among the synsets for a
word. These weights are calculated as the word frequency divided by
the number of synsets for the word:
.. math::
\text{freq}_{\text{distributed}}(c)
= \sum_{w \in \text{words}(c)}{\frac{\text{count}(w)}{|\text{synsets}(w)|}}
.. [RES95] Resnik, Philip. "Using information content to evaluate
semantic similarity." In Proceedings of the 14th International
Joint Conference on Artificial Intelligence (IJCAI-95), Montreal,
Canada, pp. 448-453. 1995.
Example
-------
In the Princeton WordNet 3.0 (hereafter *WordNet*, but note that the
equivalent lexicon in Wn is the *OMW English Wordnet based on WordNet
3.0* with specifier ``omw-en:1.4``), the frequency of a concept like
**stone fruit** is not just the number of occurrences of *stone
fruit*, but also includes the counts of the words for its hyponyms
(*almond*, *olive*, etc.) and other taxonomic descendants (*Jordan
almond*, *green olive*, etc.). The word *almond* has two synsets: one
for the fruit or nut, another for the plant. Thus, if the word
*almond* is encountered :math:`n` times in a corpus, then the weight
(either the frequency :math:`n` or distributed weight
:math:`\frac{n}{2}`) is added to the total weights for both synsets
and to those of their ancestors, but not for descendant synsets, such
as for **Jordan almond**. The fruit/nut synset of almond has two
hypernym paths which converge on **fruit**:
1. **almond** ⊃ **stone fruit** ⊃ **fruit**
2. **almond** ⊃ **nut** ⊃ **seed** ⊃ **fruit**
The weight is added to each ancestor (**stone fruit**, **nut**,
**seed**, **fruit**, ...) once. That is, the weight is not added to
the convergent ancestor for **fruit** twice, but only once.
Calculating Information Content
-------------------------------
.. autofunction:: information_content
.. autofunction:: synset_probability
Computing Corpus Weights
------------------------
If pre-computed weights are not available for a wordnet or for some
domain, they can be computed given a corpus and a wordnet.
The corpus is an iterable of words. For large corpora it may help to
use a generator for this iterable, but the entire vocabulary (i.e.,
unique words and counts) will be held at once in memory. Multi-word
expressions are also possible if they exist in the wordnet. For
instance, WordNet has *stone fruit*, with a single space delimiting
the words, as an entry.
The :class:`wn.Wordnet` object must be instantiated with a single
lexicon, although it may have expand-lexicons for relation
traversal. For best results, the wordnet should use a lemmatizer to
help it deal with inflected wordforms from running text.
.. autofunction:: compute
Reading Pre-computed Information Content Files
----------------------------------------------
The :func:`load` function reads pre-computed information content
weights files as used by the `WordNet::Similarity
<http://wn-similarity.sourceforge.net/>`_ Perl module or the `NLTK
<http://www.nltk.org/>`_ Python package. These files are computed for
a specific version of a wordnet using the synset offsets from the
`WNDB <https://wordnet.princeton.edu/documentation/wndb5wn>`_ format,
which Wn does not use. These offsets therefore must be converted into
an identifier that matches those used by the wordnet. By default,
:func:`load` uses the lexicon identifier from its *wordnet* argument
with synset offsets (padded with 0s to make 8 digits) and
parts-of-speech from the weights file to format an identifier, such as
``omw-en-00001174-n``. For wordnets that use a different identifier
scheme, the *get_synset_id* parameter of :func:`load` can be given a
callable created with :func:`wn.util.synset_id_formatter`. It can also
be given another callable with the same signature as shown below:
.. code-block:: python
get_synset_id(*, offset: int, pos: str) -> str
When loading pre-computed information content files, it is recommended
to use the ones with smoothing (i.e., ``*-add1.dat`` or
``*-resnik-add1.dat``) to avoid math domain errors when computing the
information content value.
.. warning::
The weights files are only valid for the version of wordnet for
which they were created. Files created for WordNet 3.0 do not work
for WordNet 3.1 because the offsets used in its identifiers are
different, although the *get_synset_id* parameter of :func:`load`
could be given a function that performs a suitable mapping. Some
`Open Multilingual Wordnet <https://github.com/omwn/omw-data>`_
wordnets use the WordNet 3.0 offsets in their identifiers and can
therefore technically use the weights, but this usage is
discouraged because the distributional properties of text in
another language and the structure of the other wordnet will not be
compatible with that of the English WordNet. For these cases, it is
recommended to compute new weights using :func:`compute`.
.. autofunction:: load
================================================
FILE: docs/api/wn.ili.rst
================================================
wn.ili
======
.. automodule:: wn.ili
.. note::
See :doc:`../guides/interlingual` for background and usage information about
ILIs.
Functions for Getting ILI Objects
---------------------------------
The following functions are for getting individual :class:`ILI` and
:class:`ProposedILI` objects from ILI identifiers or synsets, respectively, or
to list all such known objects.
.. autofunction:: get
.. autofunction:: get_all
.. autofunction:: get_proposed
.. autofunction:: get_all_proposed
ILI Status
----------
The status of an ILI object (:attr:`ILI.status` or :attr:`ProposedILI.status`)
indicates what is known about its validity. Explicit information about ILIs can
be added to Wn with :func:`wn.add` (e.g., :python:`wn.add("cili")`), but without
it Wn can only make a guess.
If a lexicon has synsets referencing some ILI identifier and no ILI file has
been loaded, that ILI would have a status of :attr:`ILIStatus.PRESUPPOSED`. If
an ILI file has been loaded that lists the identifier, it would have a status of
:attr:`ILIStatus.ACTIVE`, whether or not a lexicon has been added that uses
the ILI. Both of these cases use :class:`ILI` objects.
A synset in the WN-LMF format may also propose a new ILI. It won't have an
identifier, but it should have a definition. These have the status of
:attr:`ILIStatus.PROPOSED`. The :class:`ProposedILI` is used for these objects,
and that is the only status they have.
The :attr:`ILIStatus.UNKNOWN` status is just a default (e.g., when manually
creating an :class:`ILI` object) and won't be encountered in normal scenarios.
.. autoclass:: ILIStatus
.. autoattribute:: UNKNOWN
.. autoattribute:: ACTIVE
.. autoattribute:: PRESUPPOSED
.. autoattribute:: PROPOSED
ILI Classes
-----------
.. autoclass:: ILI
.. autoattribute:: id
The ILI identifier.
.. autoattribute:: status
The status of the ILI.
.. automethod:: definition
.. autoclass:: ProposedILI
.. autoproperty:: id
.. autoproperty:: status
.. automethod:: definition
.. automethod:: synset
.. automethod:: lexicon
ILI Definitions
---------------
Most likely someone inspecting the definition of an :class:`ILI` or
:class:`ProposedILI` only cares about the definition text, but for
completeness' sake the :class:`ILIDefinition` object models the text
along with any metadata that may have appeared in the WN-LMF lexicon
file. ILI files do not currently model metadata.
.. autoclass:: ILIDefinition
.. autoattribute:: text
.. automethod:: metadata
================================================
FILE: docs/api/wn.lmf.rst
================================================
wn.lmf
======
.. automodule:: wn.lmf
.. autofunction:: load
.. autofunction:: scan_lexicons
.. autofunction:: is_lmf
================================================
FILE: docs/api/wn.morphy.rst
================================================
wn.morphy
=========
.. automodule:: wn.morphy
.. seealso::
The Princeton WordNet `documentation
<https://wordnet.princeton.edu/documentation/morphy7wn>`_ describes
the original implementation of Morphy.
The :doc:`../guides/lemmatization` guide describes how Wn handles
lemmatization in general.
Initialized and Uninitialized Morphy
------------------------------------
There are two ways of using Morphy in Wn: initialized and
uninitialized.
Unintialized Morphy is a simple callable that returns lemma
*candidates* for some given wordform. That is, the results might not
be valid lemmas, but this is not a problem in practice because
subsequent queries against the database will filter out the invalid
ones. This callable is obtained by creating a :class:`Morphy` object
with no arguments:
>>> from wn import morphy
>>> m = morphy.Morphy()
As an uninitialized Morphy cannot predict which lemmas in the result
are valid, it always returns the original form and any transformations
it can find for each part of speech:
>>> m('lemmata', pos='n') # exceptional form
{'n': {'lemmata'}}
>>> m('lemmas', pos='n') # regular morphology with part-of-speech
{'n': {'lemma', 'lemmas'}}
>>> m('lemmas') # regular morphology for any part-of-speech
{None: {'lemmas'}, 'n': {'lemma'}, 'v': {'lemma'}}
>>> m('wolves') # invalid forms may be returned
{None: {'wolves'}, 'n': {'wolf', 'wolve'}, 'v': {'wolve', 'wolv'}}
This lemmatizer can also be used with a :class:`wn.Wordnet` object to
expand queries:
>>> import wn
>>> ewn = wn.Wordnet('ewn:2020')
>>> ewn.words('lemmas')
[]
>>> ewn = wn.Wordnet('ewn:2020', lemmatizer=morphy.Morphy())
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]
An initialized Morphy is created with a :class:`wn.Wordnet` object as
its argument. It then uses the wordnet to build lists of valid lemmas
and exceptional forms (this takes a few seconds). Once this is done,
it will only return lemmas it knows about:
>>> ewn = wn.Wordnet('ewn:2020')
>>> m = morphy.Morphy(ewn)
>>> m('lemmata', pos='n') # exceptional form
{'n': {'lemma'}}
>>> m('lemmas', pos='n') # regular morphology with part-of-speech
{'n': {'lemma'}}
>>> m('lemmas') # regular morphology for any part-of-speech
{'n': {'lemma'}}
>>> m('wolves') # invalid forms are pre-filtered
{'n': {'wolf'}}
In order to use an initialized Morphy lemmatizer with a
:class:`wn.Wordnet` object, it must be assigned to the object after
creation:
>>> ewn = wn.Wordnet('ewn:2020') # default: lemmatizer=None
>>> ewn.words('lemmas')
[]
>>> ewn.lemmatizer = morphy.Morphy(ewn)
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]
There is little to no difference in the results obtained from a
:class:`wn.Wordnet` object using an initialized or uninitialized
:class:`Morphy` object, but there may be slightly different
performance profiles for future queries.
Default Morphy Lemmatizer
-------------------------
As a convenience, an uninitialized Morphy lemmatizer is provided in
this module via the :data:`morphy` member.
.. data:: morphy
A :class:`Morphy` object created without a :class:`wn.Wordnet`
object.
The Morphy Class
----------------
.. autoclass:: Morphy
================================================
FILE: docs/api/wn.project.rst
================================================
wn.project
==========
.. automodule:: wn.project
.. autofunction:: get_project
.. autofunction:: iterpackages
.. autofunction:: is_package_directory
.. autofunction:: is_collection_directory
Project Classes
---------------
Projects can be simple resource files, :class:`Package` directories,
or :class:`Collection` directories. For API consistency, resource
files are modeled as a virtual package (:class:`ResourceOnlyPackage`).
.. class:: Project
The base class for packages and collections.
This class is not used directly, but all subclasses will implement
the methods listed here.
.. autoproperty:: path
.. automethod:: readme
.. automethod:: license
.. automethod:: citation
.. autoclass:: Package
:show-inheritance:
.. autoproperty:: type
.. automethod:: resource_file
.. autoclass:: ResourceOnlyPackage
:show-inheritance:
.. autoclass:: Collection
:show-inheritance:
.. automethod:: packages
================================================
FILE: docs/api/wn.rst
================================================
wn
===
.. automodule:: wn
Project Management Functions
----------------------------
.. autofunction:: download
.. autofunction:: add
.. autofunction:: add_lexical_resource
.. autofunction:: remove
.. autofunction:: export
.. autofunction:: projects
.. autofunction:: reset_database
Wordnet Query Functions
-----------------------
While it is best to first instantiate a :class:`Wordnet` object with a
specific lexicon and use that for querying (see :ref:`default-mode`),
the following functions are also available for quick and simple
queries.
.. autofunction:: word
.. autofunction:: words
.. autofunction:: lemmas
.. autofunction:: sense
.. autofunction:: senses
.. autofunction:: synset
.. autofunction:: synsets
.. autofunction:: lexicons
The Wordnet Class
-----------------
.. autoclass:: Wordnet
.. automethod:: word
.. automethod:: words
.. automethod:: lemmas
.. automethod:: sense
.. automethod:: senses
.. automethod:: synset
.. automethod:: synsets
.. automethod:: lexicons
.. automethod:: expanded_lexicons
.. automethod:: describe
Words, Senses, and Synsets
--------------------------
The results of primary queries against a lexicon are :class:`Word`,
:class:`Sense`, or :class:`Synset` objects. See
:doc:`../guides/wordnet` for more information about the concepts these
object represent.
Word Objects
''''''''''''
.. class:: Word
:class:`Word` (or "lexical entry") objects encode information about
word forms independent from their meaning.
.. autoattribute:: id
The identifier used within a lexicon.
.. autoattribute:: pos
The part of speech of the Word.
.. automethod:: lemma
.. automethod:: forms
.. automethod:: senses
.. automethod:: synsets
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
.. automethod:: derived_words
.. automethod:: translate
Sense Objects
'''''''''''''
.. class:: Sense
:class:`Sense` objects represent a pairing of a :class:`Word` and a
:class:`Synset`.
.. autoattribute:: id
The identifier used within a lexicon.
.. automethod:: word
.. automethod:: synset
.. automethod:: examples
.. automethod:: lexicalized
.. automethod:: adjposition
.. automethod:: frames
.. automethod:: counts
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
.. automethod:: relations
.. automethod:: synset_relations
.. automethod:: get_related
.. automethod:: get_related_synsets
.. automethod:: closure
.. automethod:: relation_paths
.. automethod:: translate
Synset Objects
''''''''''''''
.. class:: Synset
:class:`Synset` objects represent a set of words that share a
meaning.
.. autoattribute:: id
The identifier used within a lexicon.
.. autoattribute:: pos
The part of speech of the Synset.
.. autoproperty:: ili
The interlingual index of the Synset.
.. automethod:: definition
.. automethod:: definitions
.. automethod:: examples
.. automethod:: senses
.. automethod:: lexicalized
.. automethod:: lexfile
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
.. automethod:: words
.. automethod:: lemmas
.. automethod:: hypernyms
.. automethod:: hyponyms
.. automethod:: holonyms
.. automethod:: meronyms
.. automethod:: relations
.. automethod:: get_related
.. automethod:: closure
.. automethod:: relation_paths
.. automethod:: translate
.. The taxonomy methods below have been moved to wn.taxonomy
.. method:: hypernym_paths(simulate_root=False)
Shortcut for :func:`wn.taxonomy.hypernym_paths`.
.. method:: min_depth(simulate_root=False)
Shortcut for :func:`wn.taxonomy.min_depth`.
.. method:: max_depth(simulate_root=False)
Shortcut for :func:`wn.taxonomy.max_depth`.
.. method:: shortest_path(other, simulate_root=False)
Shortcut for :func:`wn.taxonomy.shortest_path`.
.. method:: common_hypernyms(other, simulate_root=False)
Shortcut for :func:`wn.taxonomy.common_hypernyms`.
.. method:: lowest_common_hypernyms(other, simulate_root=False)
Shortcut for :func:`wn.taxonomy.lowest_common_hypernyms`.
Relations
---------
The :meth:`Sense.relation_map` and :meth:`Synset.relation_map` methods
return a dictionary mapping :class:`Relation` objects to resolved
target senses or synsets. They differ from :meth:`Sense.relations`
and :meth:`Synset.relations` in two main ways:
1. Relation objects map 1-to-1 to their targets instead of to a list
of targets sharing the same relation name.
2. Relation objects encode not just relation names, but also the
identifiers of sources and targets, the lexicons they came from, and
any metadata they have.
One reason why :class:`Relation` objects are useful is for inspecting
relation metadata, particularly in order to distinguish ``other``
relations that differ only by the value of their ``dc:type`` metadata:
>>> oewn = wn.Wordnet('oewn:2024')
>>> alloy = oewn.senses("alloy", pos="v")[0]
>>> alloy.relations() # appears to only have one 'other' relation
{'derivation': [Sense('oewn-alloy__1.27.00..')], 'other': [Sense('oewn-alloy__1.27.00..')]}
>>> for rel in alloy.relation_map(): # but in fact there are two
... print(rel, rel.subtype)
...
Relation('derivation', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') None
Relation('other', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') material
Relation('other', 'oewn-alloy__2.30.00..', 'oewn-alloy__1.27.00..') result
Another reason why they are useful is to determine the source of a
relation used in :doc:`interlingual queries <../guides/interlingual>`.
>>> es = wn.Wordnet("omw-es", expand="omw-en")
>>> mapa = es.synsets("mapa", pos="n")[0]
>>> rel, tgt = next(iter(mapa.relation_map().items()))
>>> rel, rel.lexicon() # relation comes from omw-en
(Relation('hypernym', 'omw-en-03720163-n', 'omw-en-04076846-n'), <Lexicon omw-en:1.4 [en]>)
>>> tgt, tgt.words(), tgt.lexicon() # target is in omw-es
(Synset('omw-es-04076846-n'), [Word('omw-es-representación-n')], <Lexicon omw-es:1.4 [es]>)
.. class:: Relation
:class:`Relation` objects model relations between senses or synsets.
.. attribute:: name
The name of the relation. Also called the relation "type".
.. attribute:: source_id
The identifier of the source entity of the relation.
.. attribute:: target_id
The identifier of the target entity of the relation.
.. autoattribute:: subtype
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
Additional Classes
------------------
.. class:: Form
:class:`Form` objects are returned by :meth:`Word.lemma` and
:meth:`Word.forms` when the :python:`data=True` argument is used,
and they make accessible several optional properties of word forms.
The word form itself is available via the :attr:`value` attribute.
>>> inu = wn.words('犬', lexicon='wnja')[0]
>>> inu.forms(data=True)[3]
Form(value='いぬ')
>>> inu.forms(data=True)[3].script
'hira'
The :attr:`script` is often unspecified (i.e., :python:`None`) and
this carries the implicit meaning that the form uses the canonical
script for the word's language or wordnet, whatever it may be.
.. attribute:: value
The word form string.
.. attribute:: id
An optional form identifier used within a lexicon. These
identifiers are often :python:`None`.
.. attribute:: script
The script of the word form. This should be an `ISO 15924
<https://en.wikipedia.org/wiki/ISO_15924>`_ code, or :python:`None`.
.. method:: pronunciations
Return the list of :class:`Pronunciation` objects.
.. method:: tags
Return the list of :class:`Tag` objects.
.. automethod:: lexicon
.. class:: Pronunciation
:class:`Pronunciation` objects encode a text or audio
representation of how a word is pronounced. They are returned by
:meth:`Form.pronunciations`.
.. autoattribute:: value
The encoded pronunciation.
.. autoattribute:: variety
The language variety this pronunciation belongs to.
.. autoattribute:: notation
The notation used to encode the pronunciation. For example: the
International Phonetic Alphabet (IPA).
.. autoattribute:: phonemic
:python:`True` when the encoded pronunciation is a generalized
phonemic description, or :python:`False` for more precise
phonetic transcriptions.
.. autoattribute:: audio
A URI to an associated audio file.
.. automethod:: lexicon
.. autoclass:: Tag
:class:`Tag` objects encode categorical information about word
forms. They are returned by :meth:`Form.tags`.
.. autoattribute:: tag
The text value of the tag.
.. autoattribute:: category
The category, or kind, of the tag.
.. automethod:: lexicon
.. autoclass:: Count
:class:`Count` objects model sense counts previously computed over
some corpus. They are returned by :meth:`Sense.counts`.
.. autoattribute:: value
The count of sense occurrences.
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
.. class:: Example
:class:`Example` objects model example phrases for senses and
synsets. They are returned by :meth:`Sense.examples` and
:meth:`Synset.examples` when the :python:`data=True` argument is
given.
.. autoattribute:: text
The example text.
.. autoattribute:: language
The language of the example.
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
.. class:: Definition
:class:`Definition` objects model synset definitions. They are
returned by :meth:`Synset.definition` when the :python:`data=True`
argument is given.
.. autoattribute:: text
The example text.
.. autoattribute:: language
The language of the example.
.. autoattribute:: source_sense_id
The id of the particular sense the definition is for.
.. automethod:: lexicon
.. automethod:: metadata
.. automethod:: confidence
Interlingual Indices
--------------------
As of Wn v1.0.0, see :mod:`wn.ili` classes and functions for ILIs
Lexicon Objects
---------------
.. class:: Lexicon
Lexicon objects contain attributes and metadata about a single
:doc:`lexicon <../guides/lexicons>`.
.. autoattribute:: id
The lexicon's identifier.
.. autoattribute:: label
The full name of lexicon.
.. autoattribute:: language
The BCP 47 language code of lexicon.
.. autoattribute:: email
The email address of the wordnet maintainer.
.. autoattribute:: license
The URL or name of the wordnet's license.
.. autoattribute:: version
The version string of the resource.
.. autoattribute:: url
The project URL of the wordnet.
.. autoattribute:: citation
The canonical citation for the project.
.. autoattribute:: logo
A URL or path to a project logo.
.. automethod:: metadata
.. automethod:: confidence
.. automethod:: specifier
.. automethod:: modified
.. automethod:: requires
.. automethod:: extends
.. automethod:: extensions
.. automethod:: describe
The wn.config Object
--------------------
Wn's data storage and retrieval can be configured through the
:data:`wn.config` object.
.. seealso::
:doc:`../setup` describes how to configure Wn using the
:data:`wn.config` instance.
.. autodata:: config
It is an instance of the :class:`~wn._config.WNConfig` class, which is
defined in a non-public module and is not meant to be instantiated
directly. Configuration should occur through the single
:data:`wn.config` instance.
.. autoclass:: wn._config.WNConfig
.. autoattribute:: data_directory
.. autoattribute:: database_path
.. attribute:: allow_multithreading
If set to :python:`True`, the database connection may be shared
across threads. In this case, it is the user's responsibility to
ensure that multiple threads don't try to write to the database
at the same time. The default is :python:`False`.
.. autoattribute:: downloads_directory
.. automethod:: add_project
.. automethod:: add_project_version
.. automethod:: get_project_info
.. automethod:: get_cache_path
.. automethod:: list_cache_entries
.. automethod:: update
.. automethod:: load_index
Auxiliary WNConfig Types
''''''''''''''''''''''''
The following classes are argument or return types of
:class:`~wn._config.WNConfig` objects. They are documented here for reference,
but are not meant to be created directly.
.. autoclass:: wn._config.ResourceType
Enumeration of resource types.
.. autoattribute:: WORDNET
.. autoattribute:: ILI
.. autoclass:: wn._config.ProjectInfo
:members:
:undoc-members:
Dictionary of information about a project.
.. autoclass:: wn._config.VersionInfo
:members:
:undoc-members:
Dictionary of information about a resource version.
.. autoclass:: wn._config.ResolvedProjectInfo
:members:
:undoc-members:
Dictionary of information about a specific project resource.
.. autoclass:: wn._config.CacheEntry
:members:
:undoc-members:
Dictionary of information about files in the download cache.
Exceptions
----------
.. autoexception:: Error
.. autoexception:: DatabaseError
.. autoexception:: WnWarning
================================================
FILE: docs/api/wn.similarity.rst
================================================
wn.similarity
=============
.. automodule:: wn.similarity
Taxonomy-based Metrics
----------------------
The `Path <Path Similarity_>`_, `Leacock-Chodorow <Leacock-Chodorow
Similarity_>`_, and `Wu-Palmer <Wu-Palmer Similarity_>`_ similarity
metrics work by finding path distances in the hypernym/hyponym
taxonomy. As such, they are most useful when the synsets are, in fact,
arranged in a taxonomy. For the Princeton WordNet and derivative
wordnets, such as the `Open English Wordnet`_ and `OMW English Wordnet
based on WordNet 3.0`_ available to Wn, synsets for nouns and verbs
are arranged taxonomically: the nouns mostly form a single structure
with a single root while verbs form many smaller structures with many
roots. Synsets for the other parts of speech do not use
hypernym/hyponym relations at all. This situation may be different for
other wordnet projects or future versions of the English wordnets.
.. _Open English Wordnet: https://en-word.net
.. _OMW English Wordnet based on WordNet 3.0: https://github.com/omwn/omw-data
The similarity metrics tend to fail when the synsets are not connected
by some path. When the synsets are in different parts of speech, or
even in separate lexicons, this failure is acceptable and
expected. But for cases like the verbs in the Princeton WordNet, it
might be more useful to pretend that there is some unique root for all
verbs so as to create a path connecting any two of them. For this
purpose, the *simulate_root* parameter is available on the
:func:`path`, :func:`lch`, and :func:`wup` functions, where it is
passed on to calls to :meth:`wn.Synset.shortest_path` and
:meth:`wn.Synset.lowest_common_hypernyms`. Setting *simulate_root* to
:python:`True` can, however, give surprising results if the words are
from a different lexicon. Currently, computing similarity for synsets
from a different part of speech raises an error.
Path Similarity
'''''''''''''''
When :math:`p` is the length of the shortest path between two synsets,
the path similarity is:
.. math::
\frac{1}{p + 1}
The similarity score ranges between 0.0 and 1.0, where the higher the
score is, the more similar the synsets are. The score is 1.0 when a
synset is compared to itself, and 0.0 when there is no path between
the two synsets (i.e., the path distance is infinite).
.. autofunction:: path
.. _leacock-chodorow-similarity:
Leacock-Chodorow Similarity
'''''''''''''''''''''''''''
When :math:`p` is the length of the shortest path between two synsets
and :math:`d` is the maximum taxonomy depth, the Leacock-Chodorow
similarity is:
.. math::
-\text{log}\left(\frac{p + 1}{2d}\right)
.. autofunction:: lch
Wu-Palmer Similarity
''''''''''''''''''''
When *LCS* is the lowest common hypernym (also called "least common
subsumer") between two synsets, :math:`i` is the shortest path
distance from the first synset to *LCS*, :math:`j` is the shortest
path distance from the second synset to *LCS*, and :math:`k` is the
number of nodes (distance + 1) from *LCS* to the root node, then the
Wu-Palmer similarity is:
.. math::
\frac{2k}{i + j + 2k}
.. autofunction:: wup
Information Content-based Metrics
---------------------------------
The `Resnik <Resnik Similarity_>`_, `Jiang-Conrath <Jiang-Conrath
Similarity_>`_, and `Lin <Lin Similarity_>`_ similarity metrics work
by computing the information content of the synsets and/or that of
their lowest common hypernyms. They therefore require information
content weights (see :mod:`wn.ic`), and the values returned
necessarily depend on the weights used.
Resnik Similarity
'''''''''''''''''
The Resnik similarity (`Resnik 1995
<https://arxiv.org/pdf/cmp-lg/9511007.pdf>`_) is the maximum
information content value of the common subsumers (hypernym ancestors)
of the two synsets. Formally it is defined as follows, where
:math:`c_1` and :math:`c_2` are the two synsets being compared.
.. math::
\text{max}_{c \in \text{S}(c_1, c_2)} \text{IC}(c)
Since a synset's information content is always equal or greater than
the information content of its hypernyms, :math:`S(c_1, c_2)` above is
more efficiently computed using the lowest common hypernyms instead of
all common hypernyms.
.. autofunction:: res
Jiang-Conrath Similarity
''''''''''''''''''''''''
The Jiang-Conrath similarity metric (`Jiang and Conrath, 1997
<https://www.aclweb.org/anthology/O97-1002.pdf>`_) combines the ideas
of the taxonomy-based and information content-based metrics. It is
defined as follows, where :math:`c_1` and :math:`c_2` are the two
synsets being compared and :math:`c_0` is the lowest common hypernym
of the two with the highest information content weight:
.. math::
\frac{1}{\text{IC}(c_1) + \text{IC}(c_2) - 2(\text{IC}(c_0))}
This equation is the simplified form given in the paper were several
parameterized terms are cancelled out because the full form is not
often used in practice.
There are two special cases:
1. If the information content of :math:`c_0`, :math:`c_1`, and
:math:`c_2` are all zero, the metric returns zero. This occurs when
both :math:`c_1` and :math:`c_2` are the root node, but it can also
occur if the synsets did not occur in the corpus and the smoothing
value was set to zero.
2. Otherwise if :math:`c_1 + c_2 = 2c_0`, the metric returns
infinity. This occurs when the two synsets are the same, one is a
descendant of the other, etc., such that they have the same
frequency as each other and as their lowest common hypernym.
.. autofunction:: jcn
Lin Similarity
''''''''''''''
Another formulation of information content-based similarity is the Lin
metric (`Lin 1997 <https://www.aclweb.org/anthology/P97-1009.pdf>`_),
which is defined as follows, where :math:`c_1` and :math:`c_2` are the
two synsets being compared and :math:`c_0` is the lowest common
hypernym with the highest information content weight:
.. math::
\frac{2(\text{IC}(c_0))}{\text{IC}(c_1) + \text{IC}(c_0)}
One special case is if either synset has an information content value
of zero, in which case the metric returns zero.
.. autofunction:: lin
================================================
FILE: docs/api/wn.taxonomy.rst
================================================
wn.taxonomy
===========
.. automodule:: wn.taxonomy
Overview
--------
Among the valid synset relations for wordnets (see
:data:`wn.constants.SYNSET_RELATIONS`), those used for describing
*is-a* `taxonomies <https://en.wikipedia.org/wiki/Taxonomy>`_ are
given special treatment and they are generally the most
well-developed relations in any wordnet. Typically these are the
``hypernym`` and ``hyponym`` relations, which encode *is-a-type-of*
relationships (e.g., a *hermit crab* is a type of *decapod*, which is
a type of *crustacean*, etc.). They also include ``instance_hypernym``
and ``instance_hyponym``, which encode *is-an-instance-of*
relationships (e.g., *Oregon* is an instance of *American state*).
The taxonomy forms a multiply-inheriting hierarchy with the synsets as
nodes. In the English wordnets, such as the Princeton WordNet and its
derivatives, nearly all nominal synsets form such a hierarchy with
single root node, while verbal synsets form many smaller hierarchies
without a common root. Other wordnets may have different properties,
but as many are based off of the Princeton WordNet, they tend to
follow this structure.
Functions to find paths within the taxonomies form the basis of all
:mod:`wordnet similarity measures <wn.similarity>`. For instance, the
:ref:`leacock-chodorow-similarity` measure uses both
:func:`shortest_path` and (indirectly) :func:`taxonomy_depth`.
Wordnet-level Functions
-----------------------
Root and leaf synsets in the taxonomy are those with no ancestors
(``hypernym``, ``instance_hypernym``, etc.) or hyponyms (``hyponym``,
``instance_hyponym``, etc.), respectively.
Finding root and leaf synsets
'''''''''''''''''''''''''''''
.. autofunction:: roots
.. autofunction:: leaves
Computing the taxonomy depth
''''''''''''''''''''''''''''
The taxonomy depth is the maximum depth from a root node to a leaf
node within synsets for a particular part of speech.
.. autofunction:: taxonomy_depth
Synset-level Functions
----------------------
.. autofunction:: hypernym_paths
.. autofunction:: min_depth
.. autofunction:: max_depth
.. autofunction:: shortest_path
.. autofunction:: common_hypernyms
.. autofunction:: lowest_common_hypernyms
================================================
FILE: docs/api/wn.util.rst
================================================
wn.util
=======
.. automodule:: wn.util
.. autofunction:: synset_id_formatter
.. autoclass:: ProgressHandler
:members:
.. attribute:: kwargs
A dictionary storing the updateable parameters for the progress
handler. The keys are:
- ``message`` (:class:`str`) -- a generic message or name
- ``count`` (:class:`int`) -- the current progress counter
- ``total`` (:class:`int`) -- the expected final value of the counter
- ``unit`` (:class:`str`) -- the unit of measurement
- ``status`` (:class:`str`) -- the current status of the process
.. autoclass:: ProgressBar
:members:
================================================
FILE: docs/api/wn.validate.rst
================================================
wn.validate
===========
.. automodule:: wn.validate
.. autofunction:: validate
================================================
FILE: docs/cli.rst
================================================
Command Line Interface
======================
Some of Wn's functionality is exposed via the command line.
Global Options
--------------
.. option:: -d DIR, --dir DIR
Change to use ``DIR`` as the data directory prior to invoking any
commands.
Subcommands
-----------
download
--------
Download and add projects to the database given one or more project
specifiers or URLs.
.. code-block:: console
$ python -m wn download oewn:2021 omw:1.4 cili
$ python -m wn download https://en-word.net/static/english-wordnet-2021.xml.gz
.. option:: --index FILE
Use the index at ``FILE`` to resolve project specifiers.
.. code-block:: console
$ python -m wn download --index my-index.toml mywn
.. option:: --no-add
Download and cache the remote file, but don't add it to the
database.
cache
-----
View the files in the download cache. The ``download`` command caches the (often
compressed) files to the filesystem prior to adding to Wn's database. The files
are renamed with a hash of the URL to avoid name clashes, but this also makes it
hard to determine what a particular file is. This command cross-references the
downloaded files with what is in the index. An optional project specifier
argument can help narrow down the results.
.. code-block:: console
$ python -m wn cache # many results; abbreviated here
af909070c29845b952d1799551bffc302e28d2c5 own-en 1.0.0 https://github.com/own-pt/openWordnet-PT/releases/download/v1.0.0/own-en.tar.gz
e25af66e46775b00d689619787013e6a35e5cbf7 oewn 2025 https://en-word.net/static/english-wordnet-2025.xml.gz
5a26d97a0081996db4cd621638a8a9b0da09aa25 odenet 1.4 https://github.com/hdaSprachtechnologie/odenet/releases/download/v1.4/odenet-1.4.tar.xz
[...]
$ python -m wn cache "oewn:2025*" # narrowed results
e25af66e46775b00d689619787013e6a35e5cbf7 oewn 2025 https://en-word.net/static/english-wordnet-2025.xml.gz
0f5371187dcfe7e05f2a93ab85b4e1168859a5c2 oewn 2025+ https://en-word.net/static/english-wordnet-2025-plus.xml.gz
.. option:: --full-paths-only
Only print the full path of each cache file. This can be useful when one
wants to pipe the results to other commands. For example, on Unix-like
systems, the following will delete matching cache entries:
.. code-block:: console
$ python -m wn cache --full-paths-only "omw*:1.4" | xargs rm
lexicons
--------
The ``lexicons`` subcommand lets you quickly see what is installed:
.. code-block:: console
$ python -m wn lexicons
omw-en 1.4 [en] OMW English Wordnet based on WordNet 3.0
omw-sk 1.4 [sk] Slovak WordNet
omw-pl 1.4 [pl] plWordNet
omw-is 1.4 [is] IceWordNet
omw-zsm 1.4 [zsm] Wordnet Bahasa (Malaysian)
omw-sl 1.4 [sl] sloWNet
omw-ja 1.4 [ja] Japanese Wordnet
...
.. option:: -l LG, --lang LG
.. option:: --lexicon SPEC
The ``--lang`` or ``--lexicon`` option can help you narrow down
the results:
.. code-block:: console
$ python -m wn lexicons --lang en
oewn 2021 [en] Open English WordNet
omw-en 1.4 [en] OMW English Wordnet based on WordNet 3.0
$ python -m wn lexicons --lexicon "omw-*"
omw-en 1.4 [en] OMW English Wordnet based on WordNet 3.0
omw-sk 1.4 [sk] Slovak WordNet
omw-pl 1.4 [pl] plWordNet
omw-is 1.4 [is] IceWordNet
omw-zsm 1.4 [zsm] Wordnet Bahasa (Malaysian)
projects
--------
The ``projects`` subcommand lists all known projects in Wn's
index. This is helpful to see what is available for downloading.
.. code-block::
$ python -m wn projects
ic cili 1.0 [---] Collaborative Interlingual Index
ic oewn 2025+ [en] Open English WordNet
ic oewn 2025 [en] Open English WordNet
ic oewn 2024 [en] Open English WordNet
ic oewn 2023 [en] Open English WordNet
ic oewn 2022 [en] Open English WordNet
ic oewn 2021 [en] Open English WordNet
ic ewn 2020 [en] Open English WordNet
ic ewn 2019 [en] Open English WordNet
ic odenet 1.4 [de] Open German WordNet
i- odenet 1.3 [de] Open German WordNet
ic omw 2.0 [mul] Open Multilingual Wordnet
ic omw 1.4 [mul] Open Multilingual Wordnet
...
validate
--------
Given a path to a WN-LMF XML file, check the file for structural
problems and print a report.
.. code-block::
$ python -m wn validate english-wordnet-2021.xml
.. option:: --select CHECKS
Run the checks with the given comma-separated list of check codes
or categories.
.. code-block::
$ python -m wn validate --select E,W201,W204 deWordNet.xml
.. option:: --output-file FILE
Write the report to FILE as a JSON object instead of printing the
report to stdout.
================================================
FILE: docs/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# -- Project information -----------------------------------------------------
import wn
project = "wn"
copyright = "2020, Michael Wayne Goodman"
author = "Michael Wayne Goodman"
# The short X.Y version
version = ".".join(wn.__version__.split(".")[:2])
# The full version, including alpha/beta/rc tags
release = wn.__version__
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.coverage",
# 'sphinx.ext.viewcode',
"sphinx.ext.githubpages",
"sphinx.ext.napoleon",
"sphinx_copybutton",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# Global definitions
rst_prolog = """
.. role:: python(code)
:language: python
:class: highlight
"""
# smartquotes = False
smartquotes_action = "De" # D = en- and em-dash; e = ellipsis
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.#
html_theme = "furo"
html_theme_options = {
"light_css_variables": {
"color-brand-primary": "#006699",
"color-brand-content": "#006699",
# "color-background": "#f0f0f0",
# "color-sidebar-background": "#ddd",
},
"dark_css_variables": {
"color-brand-primary": "#00CCFF",
"color-brand-content": "#00CCFF",
},
}
html_logo = "_static/wn-logo.svg"
pygments_style = "manni"
pygments_dark_style = "monokai"
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
html_css_files = [
"css/svg.css",
]
# Don't offer to show the source of the current page
html_show_sourcelink = False
# -- Options for autodoc extension -------------------------------------------
# autodoc_typehints = 'description'
autodoc_typehints = "signature"
# autodoc_typehints = 'none'
# -- Options for intersphinx extension ---------------------------------------
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"httpx": ("https://httpx.readthedocs.io/en/latest/", None),
}
# -- Options for sphinx_copybutton extension ---------------------------------
copybutton_prompt_text = (
r">>> " # regular Python prompt
r"|\.\.\. " # Python continuation prompt
r"|\$ " # Basic shell
r"|In \[\d*\]: " # Jupyter notebook
)
copybutton_prompt_is_regexp = True
================================================
FILE: docs/docutils.conf
================================================
[restructuredtext parser]
syntax_highlight = short
================================================
FILE: docs/faq.rst
================================================
FAQ
===
Is Wn related to the NLTK's `nltk.corpus.wordnet` module?
---------------------------------------------------------
Only in spirit. There was an effort to develop the `NLTK`_\ 's module as
a standalone package (see https://github.com/nltk/wordnet/), but
development had slowed. Wn has the same broad goals and a similar API
as that standalone package, but fundamental architectural differences
demanded a complete rewrite, so Wn was created as a separate
project. With approval from the other package's maintainer, Wn
acquired the `wn <https://pypi.org/project/wn>`_ project on PyPI and
can be seen as its successor.
Is Wn compatible with the NLTK's module?
----------------------------------------
The API is intentionally similar, but not exactly the same (for
instance see the next question), and there are differences in the ways
that results are retrieved, particularly for non-English wordnets. See
:doc:`guides/nltk-migration` for more information. Also see
:ref:`princeton-wordnet`.
Where are the ``Lemma`` objects? What are ``Word`` and ``Sense`` objects?
-------------------------------------------------------------------------
Unlike the original `WNDB`_ data format of the original WordNet, the
`WN-LMF`_ XML format grants words (called *lexical entries* in WN-LMF
and a :class:`~wn.Word` object in Wn) and word senses
(:class:`~wn.Sense` in Wn) explicit, first-class status alongside
synsets. While senses are essentially links between words and
synsets, they may contain metadata and be the source or target of
sense relations, so in some ways they are more like nodes than edges
when the wordnet is viewed as a graph. The `NLTK`_\ 's module, using
the WNDB format, combines the information of a word and a sense into a
single object called a ``Lemmas``. Wn also has an unrelated concept
called a :meth:`~wn.Word.lemma`, but it is merely the canonical form
of a word.
.. _princeton-wordnet:
Where is the Princeton WordNet data?
------------------------------------
The original English wordnet, named simply *WordNet* but often
referred to as the *Princeton WordNet* to better distinguish it from
other projects, is specifically the data distributed by Princeton in
the `WNDB`_ format. The `Open Multilingual Wordnet <OMW_>`_ (OMW)
packages an export of the WordNet data as the *OMW English Wordnet
based on WordNet 3.0* which is used by Wn (with the lexicon ID
``omw-en``). It also has a similar export for WordNets 1.5, 1.6, 1.7,
1.7.1, 2.0, 2.1, and 3.1 data (``omw-en15``, ``omw-en16``, ``omw-en17``,
``omw-en171``, ``omw-en20``, ``omw-en21``, and ``omw-en31``,
respectively). All of these are highly compatible with the original
data and can be used as drop-in replacements.
Prior to Wn version 0.9 (and, correspondingly, prior to the `OMW
data`_ version 1.4), the ``pwn:3.0`` and ``pwn:3.1`` English wordnets
distributed by OMW were incorrectly called the *Princeton WordNet*
(for WordNet 3.0 and 3.1, respectively). From Wn version 0.9 (and from
version 1.4 of the OMW data), these are called the *OMW English
Wordnet based on WordNet 3.0/3.1* (``omw-en:1.4`` and
``omw-en31:1.4``, respectively). These lexicons are intentionally
compatible with the original WordNet data, and the 1.4 versions are
even more compatible than the previous ``pwn:3.0`` and ``pwn:3.1``
lexicons, so it is strongly recommended to use them over the previous
versions. Similarly, the 2.0 version of OMW is more compatible yet.
The data corresponding to WordNet versions 1.5 through 2.1 are only
available from OMW 2.0.
.. _OMW data: https://github.com/omwn/omw-data
Why don't all wordnets share the same synsets?
----------------------------------------------
The `Open Multilingual Wordnet <OMW_>`_ (OMW) contains wordnets for
many languages created using the *expand* methodology [VOSSEN1998]_,
where non-English wordnets provide words on top of the English
wordnet's synset structure. This allows new wordnets to be built in
much less time than starting from scratch, but with a few drawbacks,
such as that words cannot be added if they do not have a synset in the
English wordnet, and that it is difficult to version the wordnets
independently (e.g., for reproducibility of experiments involving
wordnet data) as all are interconnected. Wn, therefore, creates new
synsets for each wordnet added to its database, and synsets then
specify which resource they belong to. Queries can specify which
resources may be examined. Also see :doc:`guides/interlingual`.
Why does Wn's database get so big?
----------------------------------
The *OMW English Wordnet based on WordNet 3.0* takes about 114 MiB of
disk space in Wn's database, which is only about 8 MiB more than it
takes as a `WN-LMF`_ XML file. The `NLTK`_, however, uses the obsolete
`WNDB`_ format which is more compact, requiring only 35 MiB of disk
space. The difference with the Open Multilingual Wordnet 1.4 is more
striking: it takes about 659 MiB of disk space in the database, but
only 49 MiB in the NLTK. Part of the difference here is that the OMW
files in the NLTK are simple tab-separated-value files listing only
the words added to each synset for each language. In addition, Wn
creates new synsets for each wordnet added (see the previous
question). One more reason is that Wn creates various indexes in the
database for efficient lookup.
.. _NLTK: https://www.nltk.org/
.. _OMW: http://github.com/omwn
.. [VOSSEN1998] Piek Vossen. 1998. *Introduction to EuroWordNet.* Computers and the Humanities, 32(2): 73--89.
.. _Open English Wordnet 2021: https://en-word.net/
.. _WNDB: https://wordnet.princeton.edu/documentation/wndb5wn
.. _WN-LMF: https://globalwordnet.github.io/schemas/
================================================
FILE: docs/guides/basic.rst
================================================
Basic Usage
===========
.. seealso::
This document covers the basics of querying wordnets, filtering
results, and performing secondary queries on the results. For
adding, removing, or inspecting lexicons, see :doc:`lexicons`. For
more information about interlingual queries, see
:doc:`interlingual`.
For the most basic queries, Wn provides several module functions for
retrieving words, senses, and synsets:
>>> import wn
>>> wn.words('pike')
[Word('ewn-pike-n')]
>>> wn.senses('pike')
[Sense('ewn-pike-n-03311555-04'), Sense('ewn-pike-n-07795351-01'), Sense('ewn-pike-n-03941974-01'), Sense('ewn-pike-n-03941726-01'), Sense('ewn-pike-n-02563739-01')]
>>> wn.synsets('pike')
[Synset('ewn-03311555-n'), Synset('ewn-07795351-n'), Synset('ewn-03941974-n'), Synset('ewn-03941726-n'), Synset('ewn-02563739-n')]
Once you start working with multiple wordnets, these simple queries
may return more than desired:
>>> wn.words('pike')
[Word('ewn-pike-n'), Word('wnja-n-66614')]
>>> wn.words('chat')
[Word('ewn-chat-n'), Word('ewn-chat-v'), Word('frawn-lex14803'), Word('frawn-lex21897')]
You can specify which language or lexicon you wish to query:
>>> wn.words('pike', lang='ja')
[Word('wnja-n-66614')]
>>> wn.words('chat', lexicon='frawn')
[Word('frawn-lex14803'), Word('frawn-lex21897')]
But it might be easier to create a :class:`~wn.Wordnet` object and use
it for queries:
>>> wnja = wn.Wordnet(lang='ja')
>>> wnja.words('pike')
[Word('wnja-n-66614')]
>>> frawn = wn.Wordnet(lexicon='frawn')
>>> frawn.words('chat')
[Word('frawn-lex14803'), Word('frawn-lex21897')]
In fact, the simple queries above implicitly create such a
:class:`~wn.Wordnet` object, but one that includes all installed
lexicons.
.. _primary-queries:
Primary Queries
---------------
The queries shown above are "primary" queries, meaning they are the
first step in a user's interaction with a wordnet. Operations
performed on the resulting objects are then `secondary
queries`_. Primary queries optionally take several fields for
filtering the results, namely the word form and part of
speech. Synsets may also be filtered by an interlingual index (ILI).
Searching for Words
'''''''''''''''''''
The :func:`wn.words()` function returns a list of :class:`~wn.Word`
objects that match the given word form or part of speech:
>>> wn.words('pencil')
[Word('ewn-pencil-n'), Word('ewn-pencil-v')]
>>> wn.words('pencil', pos='v')
[Word('ewn-pencil-v')]
Calling the function without a word form will return all words in the
database:
>>> len(wn.words())
311711
>>> len(wn.words(pos='v'))
29419
>>> len(wn.words(pos='v', lexicon='ewn'))
11595
If you know the word identifier used by a lexicon, you can retrieve
the word directly with the :func:`wn.word()` function. Identifiers are
guaranteed to be unique within a single lexicon, but not across
lexicons, so it's best to call this function from an instantiated
:class:`~wn.Wordnet` object or with the ``lexicon`` parameter
specified. If multiple words are found when querying multiple
lexicons, only the first is returned.
>>> wn.word('ewn-pencil-n', lexicon='ewn')
Word('ewn-pencil-n')
Searching for Senses
''''''''''''''''''''
The :func:`wn.senses()` and :func:`wn.sense()` functions behave
similarly to :func:`wn.words()` and :func:`wn.word()`, except that
they return matching :class:`~wn.Sense` objects.
>>> wn.senses('plow', pos='n')
[Sense('ewn-plow-n-03973894-01')]
>>> wn.sense('ewn-plow-v-01745745-01')
Sense('ewn-plow-v-01745745-01')
Senses represent a relationship between a :class:`~wn.Word` and a
:class:`~wn.Synset`. Seen as an edge between nodes, senses are often
given less prominence than words or synsets, but they are the natural
locus of several interesting features such as sense relations (e.g.,
for derived words) and the natural level of representation for
translations to other languages.
Searching for Synsets
'''''''''''''''''''''
The :func:`wn.synsets()` and :func:`wn.synset()` functions are like
those above but allow the ``ili`` parameter for filtering by
interlingual index, which is useful in interlingual queries:
>>> wn.synsets('scepter')
[Synset('ewn-14467142-n'), Synset('ewn-07282278-n')]
>>> wn.synset('ewn-07282278-n').ili
'i74874'
>>> wn.synsets(ili='i74874')
[Synset('ewn-07282278-n'), Synset('wnja-07267573-n'), Synset('frawn-07267573-n')]
Secondary Queries
-----------------
Once you have gotten some results from a primary query, you can
perform operations on the :class:`~wn.Word`, :class:`~wn.Sense`, or
:class:`~wn.Synset` objects to get at further information in the
wordnet.
Exploring Words
'''''''''''''''
Here are some of the things you can do with :class:`~wn.Word` objects:
>>> w = wn.words('goose')[0]
>>> w.pos # part of speech
'n'
>>> w.forms() # other word forms (e.g., irregular inflections)
['goose', 'geese']
>>> w.lemma() # canonical form
'goose'
>>> w.derived_words()
[Word('ewn-gosling-n'), Word('ewn-goosy-s'), Word('ewn-goosey-s')]
>>> w.senses()
[Sense('ewn-goose-n-01858313-01'), Sense('ewn-goose-n-10177319-06'), Sense('ewn-goose-n-07662430-01')]
>>> w.synsets()
[Synset('ewn-01858313-n'), Synset('ewn-10177319-n'), Synset('ewn-07662430-n')]
Since translations of a word into another language depend on the sense
used, :meth:`Word.translate <wn.Word.translate>` returns a dictionary
mapping each sense to words in the target language:
>>> for sense, ja_words in w.translate(lang='ja').items():
... print(sense, ja_words)
...
Sense('ewn-goose-n-01858313-01') [Word('wnja-n-1254'), Word('wnja-n-33090'), Word('wnja-n-38995')]
Sense('ewn-goose-n-10177319-06') []
Sense('ewn-goose-n-07662430-01') [Word('wnja-n-1254')]
Exploring Senses
''''''''''''''''
Compared to :class:`~wn.Word` and :class:`~wn.Synset` objects, there
are relatively few operations available on :class:`~wn.Sense`
objects. Sense relations and translations, however, are important
operations on senses.
>>> s = wn.senses('dark', pos='n')[0]
>>> s.word() # each sense links to a single word
Word('ewn-dark-n')
>>> s.synset() # each sense links to a single synset
Synset('ewn-14007000-n')
>>> s.get_related('antonym')
[Sense('ewn-light-n-14006789-01')]
>>> s.get_related('derivation')
[Sense('ewn-dark-a-00273948-01')]
>>> s.translate(lang='fr') # translation returns a list of senses
[Sense('frawn-lex52992--13983515-n')]
>>> s.translate(lang='fr')[0].word().lemma()
'obscurité'
Exploring Synsets
'''''''''''''''''
Many of the operations people care about happen on synsets, such as
hierarchical relations and metrics.
>>> ss = wn.synsets('hound', pos='n')[0]
>>> ss.senses()
[Sense('ewn-hound-n-02090203-01'), Sense('ewn-hound_dog-n-02090203-02')]
>>> ss.words()
[Word('ewn-hound-n'), Word('ewn-hound_dog-n')]
>>> ss.lemmas()
['hound', 'hound dog']
>>> ss.definition()
'any of several breeds of dog used for hunting typically having large drooping ears'
>>> ss.hypernyms()
[Synset('ewn-02089774-n')]
>>> ss.hypernyms()[0].lemmas()
['hunting dog']
>>> len(ss.hyponyms())
20
>>> ss.hyponyms()[0].lemmas()
['Afghan', 'Afghan hound']
>>> ss.max_depth()
15
>>> ss.shortest_path(wn.synsets('dog', pos='n')[0])
[Synset('ewn-02090203-n'), Synset('ewn-02089774-n'), Synset('ewn-02086723-n')]
>>> ss.translate(lang='fr') # translation returns a list of synsets
[Synset('frawn-02087551-n')]
>>> ss.translate(lang='fr')[0].lemmas()
['chien', 'chien de chasse']
Filtering by Language
---------------------
The ``lang`` parameter of :func:`wn.words()`, :func:`wn.senses()`,
:func:`wn.synsets()`, and :class:`~wn.Wordnet` allows a single `BCP 47
<https://en.wikipedia.org/wiki/IETF_language_tag>`_ language
code. When this parameter is used, only entries in the specified
language will be returned.
>>> import wn
>>> wn.words('chat')
[Word('ewn-chat-n'), Word('ewn-chat-v'), Word('frawn-lex14803'), Word('frawn-lex21897')]
>>> wn.words('chat', lang='fr')
[Word('frawn-lex14803'), Word('frawn-lex21897')]
If a language code not used by any lexicon is specified, a
:exc:`wn.Error` is raised.
Filtering by Lexicon
--------------------
The ``lexicon`` parameter of :func:`wn.words()`, :func:`wn.senses()`,
:func:`wn.synsets()`, and :class:`~wn.Wordnet` take a string of
space-delimited :ref:`lexicon specifiers
<lexicon-specifiers>`. Entries in a lexicon whose ID matches one of
the lexicon specifiers will be returned. For these, the following
rules are used:
- A full ``id:version`` string (e.g., ``ewn:2020``) selects a specific
lexicon
- Only a lexicon ``id`` (e.g., ``ewn``) selects the most recently
added lexicon with that ID
- A star ``*`` may be used to match any lexicon; a star may not
include a version
>>> wn.words('chat', lexicon='ewn:2020')
[Word('ewn-chat-n'), Word('ewn-chat-v')]
>>> wn.words('chat', lexicon='wnja')
[]
>>> wn.words('chat', lexicon='wnja frawn')
[Word('frawn-lex14803'), Word('frawn-lex21897')]
================================================
FILE: docs/guides/interlingual.rst
================================================
Interlingual Queries
====================
This guide explains how interlingual queries work within Wn. To get
started, you'll need at least two lexicons that use interlingual
indices (ILIs). For this guide, we'll use the Open English WordNet
(``oewn:2024``), the Open German WordNet (``odenet:1.4``), also
known as OdeNet, and the Japanese wordnet (``omw-ja:1.4``).
>>> import wn
>>> wn.download('oewn:2024')
>>> wn.download('odenet:1.4')
>>> wn.download('omw-ja:1.4')
We will query these wordnets with the following :class:`~wn.Wordnet`
objects:
>>> en = wn.Wordnet('oewn:2024')
>>> de = wn.Wordnet('odenet:1.4')
The object for the Japanese wordnet will be discussed and created
below, in :ref:`cross-lingual-relation-traversal`.
What are Interlingual Indices?
------------------------------
It is common for users of the `Princeton WordNet
<https://wordnet.princeton.edu/>`_ to refer to synsets by their `WNDB
<https://wordnet.princeton.edu/documentation/wndb5wn>`_ offset and type,
but this is problematic because the offset is a byte-offset in the
wordnet data files and it will differ for wordnets in other languages
and even between versions of the same wordnet. Interlingual indices
(ILIs) address this issue by providing stable identifiers for concepts,
whether for a synset across versions of a wordnet or across languages.
The idea of ILIs was proposed by [Vossen99]_ and it came to fruition
with the release of the Collaborative Interlingual Index (CILI;
[Bond16]_). CILI therefore represents an instance of, and a namespace
for, ILIs. There could, in theory, be alternative indexes for
particular domains (e.g., names of people or places), but currently
there is only the one.
As an example, the synset for *apricot* (fruit) in WordNet 3.0 is
``07750872-n``, but it is ``07766848-n`` in WordNet 3.1. In OdeNet
1.4, which is not released in the WNDB format and therefore doesn't
use offsets at all, it is ``13235-n`` for the equivalent word
(*Aprikose*). However, all three use the same ILI: ``i77784``.
Generally, only one synset within a wordnet will be mapped to a
particular ILI, but this may not always be true, nor does every synset
necessarily map to an ILI. Some concepts that are lexicalized in one
language may not be in another language. For example, *rice* in English
may refer to the rice plant, rice grain, or cooked rice, but in
languages like Japanese they are distinct things (稲 *ine*, 米 *kome*,
and 飯 *meshi* / ご飯 *gohan*, respectively).
The ``ili`` property of Synsets serves two purposes in Wn. Mainly it is
for encoding the ILI identifier associated with the synset, but it is
also used to indicate when a lexicon is proposing a new concept that is
not yet part of CILI. In the latter case, a WN-LMF lexicon file will
have the special value of ``in`` for a synset's ILI and it will provide
an ``<ILIDefinition>`` element. In Wn, this translates to
:attr:`wn.Synset.ili` returning :python:`None`, the same as if no ILI
were mapped at all. Both synsets with proposed ILIs and those with no
ILI cannot be used in interlingual queries. Proposed ILIs can be
inspected using the :mod:`wn.ili.get_proposed` function, if you know
have the synset, or :mod:`wn.ili.get_all_proposed` to get all of them.
.. [Vossen99]
Vossen, Piek, Wim Peters, and Julio Gonzalo.
"Towards a universal index of meaning."
In Proceedings of ACL-99 workshop, Siglex-99, standardizing lexical resources, pp. 81-90.
University of Maryland, 1999.
.. [Bond16]
Bond, Francis, Piek Vossen, John Philip McCrae, and Christiane Fellbaum.
"CILI: the Collaborative Interlingual Index."
In Proceedings of the 8th Global WordNet Conference (GWC), pp. 50-57. 2016.
Using Interlingual Indices
--------------------------
For synsets that have an associated ILI, you can retrieve it via the
:data:`wn.Synset.ili` property:
>>> apricot = en.synsets('apricot')[1]
>>> apricot.ili
'i77784'
The value is a :class:`str` ILI identifier. These may be used directly
for things like interlingual synset lookups:
>>> de.synsets(ili=apricot.ili)[0].lemmas()
['Marille', 'Aprikose']
There may be more information about the ILI itself which you can get
from the :mod:`wn.ili` module:
>>> from wn import ili
>>> apricot_ili = ili.get(apricot.ili)
>>> apricot_ili
ILI(id='i77784')
From this object you can get various properties of the ILI, such as
the ID string, its status, and its definition, but if you have
not added CILI to Wn's database, it will not be very informative:
>>> apricot_ili.id
'i77784'
>>> apricot_ili.status
'presupposed'
>>> apricot_ili.definition() is None
True
The ``presupposed`` status means that the ILI ID is in use by a
lexicon, but there is no other source of truth for the index. CILI can
be downloaded just like a lexicon:
>>> wn.download('cili:1.0')
Now the status and definition should be more useful:
>>> apricot_ili.status
'active'
>>> apricot_ili.definition()
'downy yellow to rosy-colored fruit resembling a small peach'
Translating Words, Senses, and Synsets
--------------------------------------
Rather than manually inserting the ILI IDs into Wn's lookup functions
as shown above, Wn provides the :meth:`wn.Synset.translate` method to
make it easier:
>>> apricot.translate(lexicon='odenet:1.4')
[Synset('odenet-13235-n')]
The method returns a list for two reasons: first, it's not guaranteed
that the target lexicon has only one synset with the ILI and, second,
you can translate to more than one lexicon at a time.
:class:`~wn.Sense` objects also have a :meth:`~wn.Sense.translate`
method, returning a list of senses instead of synsets:
>>> de_senses = apricot.senses()[0].translate(lexicon='odenet:1.4')
>>> [s.word().lemma() for s in de_senses]
['Marille', 'Aprikose']
:class:`~wn.Word` have a :meth:`~wn.Word.translate` method, too, but
it works a bit differently. Since each word may be part of multiple
synsets, the method returns a mapping of each word sense to the list
of translated words:
>>> result = en.words('apricot')[0].translate(lexicon='odenet:1.4')
>>> for sense, de_words in result.items():
... print(sense, [w.lemma() for w in de_words])
...
Sense('oewn-apricot__1.20.00..') []
Sense('oewn-apricot__1.13.00..') ['Marille', 'Aprikose']
Sense('oewn-apricot__1.07.00..') ['lachsrosa', 'lachsfarbig', 'in Lachs', 'lachsfarben', 'lachsrot', 'lachs']
The three senses above are for *apricot* as a tree, a fruit, and a
color. OdeNet does not have a synset for apricot trees, or it has one
not associated with the appropriate ILI, and therefore it could not
translate any words for that sense.
.. _cross-lingual-relation-traversal:
Cross-lingual Relation Traversal
--------------------------------
ILIs have a second use in Wn, which is relation traversal for wordnets
that depend on other lexicons, i.e., those created with the *expand*
methodology. These wordnets, such as many of those in the `Open
Multilingual Wordnet <https://github.com/omwn/>`_, do not include
synset relations on their own as they were built using the English
WordNet as their taxonomic scaffolding. Trying to load such a lexicon
when the lexicon it requires is not added to the database presents a
warning to the user:
>>> ja = wn.Wordnet('omw-ja:1.4')
[...] WnWarning: lexicon dependencies not available: omw-en:1.4
>>> ja.expanded_lexicons()
[]
.. warning::
Do not rely on the presence of a warning to determine if the
lexicon has its expand lexicon loaded. Python's default warning
filter may only show the warning the first time it is
encountered. Instead, inspect :meth:`wn.Wordnet.expanded_lexicons`
to see if it is non-empty.
When a dependency is unmet, Wn only issues a warning, not an error,
and you can continue to use the lexicon as it is, but it won't be
useful for exploring relations such as hypernyms and hyponyms:
>>> anzu = ja.synsets(ili='i77784')[0]
>>> anzu.lemmas()
['アンズ', 'アプリコット', '杏']
>>> anzu.hypernyms()
[]
One way to resolve this issue is to install the lexicon it requires:
>>> wn.download('omw-en:1.4')
>>> ja = wn.Wordnet('omw-ja:1.4') # no warning
>>> ja.expanded_lexicons()
[<Lexicon omw-en:1.4 [en]>]
Wn will detect the dependency and load ``omw-en:1.4`` as the *expand*
lexicon for ``omw-ja:1.4`` when the former is in the database. You may
also specify an expand lexicon manually, even one that isn't the
specified dependency:
>>> ja = wn.Wordnet('omw-ja:1.4', expand='oewn:2024') # no warning
>>> ja.expanded_lexicons()
[<Lexicon oewn:2024 [en]>]
In this case, the Open English WordNet is an actively-developed fork
of the lexicon that ``omw-ja:1.4`` depends on, and it should contain
all the relations, so you'll see little difference between using it
and ``omw-en:1.4``. This works because the relations are found using
ILIs and not synset offsets. You may still prefer to use the specified
dependency if you have strict compatibility needs, such as for
experiment reproducibility and/or compatibility with the `NLTK
<https://nltk.org>`_. Using some other lexicon as the expand lexicon
may yield very different results. For instance, ``odenet:1.4`` is much
smaller than the English wordnets and has fewer relations, so it would
not be a good substitute for ``omw-ja:1.4``'s expand lexicon.
When an appropriate expand lexicon is loaded, relations between
synsets, such as hypernyms, are more likely to be present:
>>> anzu = ja.synsets(ili='i77784')[0] # recreate the synset object
>>> anzu.hypernyms()
[Synset('omw-ja-07705931-n')]
>>> anzu.hypernyms()[0].lemmas()
['果物']
>>> anzu.hypernyms()[0].translate(lexicon='oewn:2024')[0].lemmas()
['edible fruit']
================================================
FILE: docs/guides/lemmatization.rst
================================================
Lemmatization and Normalization
===============================
Wn provides two methods for expanding queries: lemmatization_ and
normalization_\ . Wn also has a setting that allows `alternative forms
<alternative-forms_>`_ stored in the database to be included in
queries.
.. seealso::
The :mod:`wn.morphy` module is a basic English lemmatizer included
with Wn.
.. _lemmatization:
Lemmatization
-------------
When querying a wordnet with wordforms from natural language text, it
is important to be able to find entries for inflected forms as the
database generally contains only lemmatic forms, or *lemmas* (or
*lemmata*, if you prefer irregular plurals).
>>> import wn
>>> en = wn.Wordnet('oewn:2021')
>>> en.words('plurals')
[]
>>> en.words('plural')
[Word('oewn-plural-a'), Word('oewn-plural-n')]
Lemmas are sometimes called *citation forms* or *dictionary forms* as
they are often used as the head words in dictionary entries. In
Natural Language Processing (NLP), *lemmatization* is a technique
where a possibly inflected word form is transformed to yield a
lemma. In Wn, this concept is generalized somewhat to mean a
transformation that yields a form matching wordforms stored in the
database. For example, the English word *sparrows* is the plural
inflection of *sparrow*, while the word *leaves* is ambiguous between
the plural inflection of the nouns *leaf* and *leave* and the
3rd-person singular inflection of the verb *leave*.
For tasks where high-accuracy is needed, wrapping the wordnet queries
with external tools that handle tokenization, lemmatization, and
part-of-speech tagging will likely yield the best results as this
method can make use of word context. That is, something like this:
.. code-block:: python
for lemma, pos in fancy_shmancy_analysis(corpus):
synsets = w.synsets(lemma, pos=pos)
For modest needs, however, Wn provides a way to integrate basic
lemmatization directly into the queries.
Lemmatization in Wn works as follows: if a :class:`wn.Wordnet` object
is instantiated with a *lemmatizer* argument, then queries involving
wordforms (e.g., :meth:`wn.Wordnet.words`, :meth:`wn.Wordnet.senses`,
:meth:`wn.Wordnet.synsets`) will first lemmatize the wordform and then
check all resulting wordforms and parts of speech against the
database as successive queries.
Lemmatization Functions
'''''''''''''''''''''''
The *lemmatizer* argument of :class:`wn.Wordnet` is a callable that
takes two string arguments: (1) the original wordform, and (2) a
part-of-speech or :python:`None`. It returns a dictionary mapping
parts-of-speech to sets of lemmatized wordforms. The signature is as
follows:
.. code-block:: python
lemmatizer(s: str, pos: str | None) -> Dict[str | None, Set[str]]
The part-of-speech may be used by the function to determine which
morphological rules to apply. If the given part-of-speech is
:python:`None`, then it is not specified and any rule may apply. A
lemmatizer that only deinflects should not change any specified
part-of-speech, but this is not a requirement, and a function could be
provided that undoes derivational morphology (e.g., *democratic* →
*democracy*).
Querying With Lemmatization
'''''''''''''''''''''''''''
As the needs of lemmatization differs from one language to another, Wn
does not provide a lemmatizer by default, and therefore it is
unavailable to the convenience functions :func:`wn.words`,
:func:`wn.senses`, and :func:`wn.synsets`. A lemmatizer can be added
to a :class:`wn.Wordnet` object. For example, using :mod:`wn.morphy`:
>>> import wn
>>> from wn.morphy import Morphy
>>> en = wn.Wordnet('oewn:2021', lemmatizer=Morphy())
>>> en.words('sparrows')
[Word('oewn-sparrow-n')]
>>> en.words('leaves')
[Word('oewn-leave-v'), Word('oewn-leaf-n'), Word('oewn-leave-n')]
Querying Without Lemmatization
''''''''''''''''''''''''''''''
When lemmatization is not used, inflected terms may not return any
results:
>>> en = wn.Wordnet('oewn:2021')
>>> en.words('sparrows')
[]
Depending on the lexicon, there may be situations where results are
returned for inflected lemmas, such as when the inflected form is
lexicalized as its own entry:
>>> en.words('glasses')
[Word('oewn-glasses-n')]
Or if the lexicon lists the inflected form as an alternative form. For
example, the English Wordnet lists irregular inflections as
alternative forms:
>>> en.words('lemmata')
[Word('oewn-lemma-n')]
See below for excluding alternative forms from such queries.
.. _alternative-forms:
Alternative Forms in the Database
---------------------------------
A lexicon may include alternative forms in addition to lemmas for each
word, and by default these are included in queries. What exactly is
included as an alternative form depends on the lexicon. The English
Wordnet, for example, adds irregular inflections (or "exceptional
forms"), while the Japanese Wordnet includes the same word in multiple
orthographies (original, hiragana, katakana, and two romanizations).
For the English Wordnet, this means that you might get basic
lemmatization for irregular forms only:
>>> en = wn.Wordnet('oewn:2021')
>>> en.words('learnt', pos='v')
[Word('oewn-learn-v')]
>>> en.words('learned', pos='v')
[]
If this is undesirable, the alternative forms can be excluded from
queries with the *search_all_forms* parameter:
>>> en = wn.Wordnet('oewn:2021', search_all_forms=False)
>>> en.words('learnt', pos='v')
[]
>>> en.words('learned', pos='v')
[]
.. _normalization:
Normalization
-------------
While lemmatization deals with morphological variants of words,
normalization handles minor orthographic variants. Normalized forms,
however, may be invalid as wordforms in the target language, and as
such they are only used behind the scenes for query expansion and not
presented to users. For instance, a user might attempt to look up
*résumé* in the English wordnet, but the wordnet only contains the
form without diacritics: *resume*. With strict string matching, the
entry would not be found using the wordform in the query. By
normalizing the query word, the entry can be found. Similarly in the
Spanish wordnet, *soñar* (to dream) and *sonar* (to ring) are two
different words. A user who types *soñar* likely does not want to get
results for *sonar*, but one who types *sonar* may be a non-Spanish
speaker who is unaware of the missing diacritic or does not have an
input method that allows them to type the diacritic, so this query
would return both entries by matching against the normalized forms in
the database. Wn handles all of these use cases.
When a lexicon is added to the database, potentially two wordforms are
inserted for every one in the lexicon: the original wordform and a
normalized form. When querying against the database, the original
query string is first compared with the original wordforms and, if
normalization is enabled, with the normalized forms in the database as
well. If this first attempt yields no results and if normalization is
enabled, the query string is normalized and tried again.
Normalization Functions
'''''''''''''''''''''''
The normalized form is obtained from a *normalizer* function, passed
as an argument to :class:`wn.Wordnet`, that takes a single string
argument and returns a string. That is, a function with the following
signature:
.. code-block:: python
normalizer(s: str) -> str
While custom *normalizer* functions could be used, in practice the
choice is either the default normalizer or :python:`None`. The default
normalizer works by downcasing the string and performing NFKD_
normalization to remove diacritics. If the normalized form is the same
as the original, only the original is inserted into the database.
.. table:: Examples of normalization
:align: center
============= ===============
Original Form Normalized Form
============= ===============
résumé resume
soñar sonar
San José san jose
ハラペーニョ ハラヘーニョ
============= ===============
.. _NFKD: https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms
Querying With Normalization
'''''''''''''''''''''''''''
By default, normalization is enabled when a :class:`wn.Wordnet` is
created. Enabling normalization does two things: it allows queries to
check the original wordform in the query against the normalized forms
in the database and, if no results are returned in the first step, it
allows the queried wordform to be normalized as a back-off technique.
>>> en = wn.Wordnet('oewn:2021')
>>> en.words('résumé')
[Word('oewn-resume-n'), Word('oewn-resume-v')]
>>> es = wn.Wordnet('omw-es:1.4')
>>> es.words('soñar')
[Word('omw-es-soñar-v')]
>>> es.words('sonar')
[Word('omw-es-sonar-v'), Word('omw-es-soñar-v')]
.. note::
Users may supply a custom *normalizer* function to the
:class:`wn.Wordnet` object, but currently this is discouraged as
the result is unlikely to match normalized forms in the database
and there is not yet a way to customize the normalization of forms
added to the database.
Querying Without Normalization
''''''''''''''''''''''''''''''
Normalization can be disabled by passing :python:`None` as the
argument of the *normalizer* parameter of :class:`wn.Wordnet`. The
queried wordform will not be checked against normalized forms in the
database and neither will it be normalized as a back-off technique.
>>> en = wn.Wordnet('oewn:2021', normalizer=None)
>>> en.words('résumé')
[]
>>> es = wn.Wordnet('omw-es:1.4', normalizer=None)
>>> es.words('soñar')
[Word('omw-es-soñar-v')]
>>> es.words('sonar')
[Word('omw-es-sonar-v')]
.. note::
It is not possible to disable normalization for the convenience
functions :func:`wn.words`, :func:`wn.senses`, and
:func:`wn.synsets`.
================================================
FILE: docs/guides/lexicons.rst
================================================
Working with Lexicons
=====================
Terminology
-----------
In Wn, the following terminology is used:
:lexicon: An inventory of words, senses, synsets, relations, etc. that
share a namespace (i.e., that can refer to each other).
:wordnet: A group of lexicons (but usually just one).
:resource: A file containing lexicons.
:package: A directory containing a resource and optionally some
metadata files.
:collection: A directory containing packages and optionally some
metadata files.
:project: A general term for a resource, package, or collection,
particularly pertaining to its creation, maintenance, and
distribution.
In general, each resource contains one lexicon. For large projects
like the `Open English WordNet`_, that lexicon is also a wordnet on
its own. For a collection like the `Open Multilingual Wordnet`_, most
lexicons do not include relations as they are instead expected to use
those from the OMW's included English wordnet, which is derived from
the `Princeton WordNet`_. As such, a wordnet for these sub-projects is
best thought of as the grouping of the lexicon with the lexicon
providing the relations.
.. _Open English WordNet: https://en-word.net
.. _Open Multilingual Wordnet: https://github.com/omwn/
.. _Princeton WordNet: https://wordnet.princeton.edu/
.. _lexicon-specifiers:
Lexicon and Project Specifiers
------------------------------
Wn uses *lexicon specifiers* to deal with the possibility of having
multiple lexicons and multiple versions of lexicons loaded in the same
database. The specifiers are the joining of a lexicon's name (ID) and
version, delimited by ``:``. Here are the possible forms:
.. code-block:: none
* -- any/all lexicons
id -- the most recently added lexicon with the given id
id:* -- all lexicons with the given id
id:version -- the lexicon with the given id and version
*:version -- all lexicons with the given version
For example, if ``ewn:2020`` was installed followed by ``ewn:2019``,
then ``ewn`` would specify the ``2019`` version, ``ewn:*`` would
specify both versions, and ``ewn:2020`` would specify the ``2020``
version.
The same format is used for *project specifiers*, which refer to
projects as defined in Wn's index. In most cases the project specifier
is the same as the lexicon specifier (e.g., ``ewn:2020`` refers both
to the project to be downloaded and the lexicon that is installed),
but sometimes it is not. The 1.4 release of the `Open Multilingual
Wordnet`_, for instance, has the project specifier ``omw:1.4`` but it
installs a number of lexicons with their own lexicon specifiers
(``omw-zsm:1.4``, ``omw-cmn:1.4``, etc.). When only an id is given
(e.g., ``ewn``), a project specifier gets the *first* version listed
in the index (in the default index, conventionally, the first version
is the latest release).
.. _lexicon-filters:
Filtering Queries with Lexicons
-------------------------------
Queries against the database will search all installed lexicons unless
they are filtered by ``lang`` or ``lexicon`` arguments:
>>> import wn
>>> len(wn.words())
1538449
>>> len(wn.words(lang="en"))
318289
>>> len(wn.words(lexicon="oewn:2024"))
161705
The ``lexicon`` parameter can also take multiple specifiers so you can
include things like lexicon extensions or to explicitly include
multiple lexicons:
>>> len(wn.words(lexicon="oewn:2024 omw-en:1.4"))
318289
If a lexicon selected by the ``lexicon`` or ``lang`` arguments
specifies a dependency, the dependency is automatically added as an
*expand* lexicon. Explicitly set :python:`expand=''` to disable this
behavior:
>>> wn.lexicons(lexicon="omw-es:1.4")[0].requires() # omw-es requires omw-en
{'omw-en:1.4': <Lexicon omw-en:1.4 [en]>}
>>> es = wn.Wordnet("omw-es:1.4")
>>> es.lexicons()
[<Lexicon omw-es:1.4 [es]>]
>>> es.expanded_lexicons() # omw-en automatically added
[<Lexicon omw-en:1.4 [en]>]
>>> es_no_en = wn.Wordnet("omw-es:1.4", expand='')
>>> es_no_en.lexicons()
[<Lexicon omw-es:1.4 [es]>]
>>> es_no_en.expanded_lexicons() # no expand lexicons
[]
Also see :ref:`cross-lingual-relation-traversal` for
selecting expand lexicons for relations.
The objects returned by queries retain the "lexicon configuration"
used, which includes the lexicons and expand lexicons. This
configuration determines which lexicons are searched during secondary
queries. The lexicon configuration also stores a flag indicating
whether no lexicon filters were used at all, which triggers
:ref:`default mode <default-mode>` secondary queries.
.. _default-mode:
Default Mode Queries
--------------------
A special "default mode" is activated when making a module-function
query (:func:`wn.words`, :func:`wn.synsets`, etc.) or instantiating a
:class:`wn.Wordnet` object with no ``lexicon`` or ``lang`` argument
(so-named because the mode is triggered by using the default values of
``lexicon`` and ``lang``):
>>> w = wn.Wordnet()
>>> wn.words("pineapple") # for example
Default-mode causes the following behavior:
1. Primary queries search any installed lexicon
2. Secondary queries only search the lexicon of the primary entity
(e.g., :meth:`Synset.words` only finds words from the same lexicon
as the synset). If the lexicon has any extensions or is itself an
extension, any extension/base lexicons are also included.
3. If the ``expand`` argument is :python:`None` (always true for
module functions like :func:`wn.synsets`), all installed lexicons
are used as expand lexicons for relations queries.
.. warning::
Default-mode queries are not reproducible as the results can change
as lexicons are added or removed from the database. For anything
more than a casual query, it is highly suggested to instead create
a :class:`wn.Wordnet` object with fully-specified ``lexicon`` and
``expand`` arguments.
Downloading Lexicons
--------------------
Use :py:func:`wn.download` to download lexicons from the web given
either an indexed project specifier or the URL of a resource, package,
or collection.
>>> import wn
>>> wn.download('odenet') # get the latest Open German WordNet
>>> wn.download('odenet:1.3') # get the 1.3 version
>>> # download from a URL
>>> wn.download('https://github.com/omwn/omw-data/releases/download/v1.4/omw-1.4.tar.xz')
The project specifier is only used to retrieve information from Wn's
index. The lexicon IDs of the corresponding resource files are what is
stored in the database.
Adding Local Lexicons
---------------------
Lexicons can be added from local files with :py:func:`wn.add`:
>>> wn.add('~/data/omw-1.4/omw-nb/omw-nb.xml')
Or with the parent directory as a package:
>>> wn.add('~/data/omw-1.4/omw-nb/')
Or with the grandparent directory as a collection (installing all
packages contained by the collection):
>>> wn.add('~/data/omw-1.4/')
Or from a compressed archive of one of the above:
>>> wn.add('~/data/omw-1.4/omw-nb/omw-nb.xml.xz')
>>> wn.add('~/data/omw-1.4/omw-nb.tar.xz')
>>> wn.add('~/data/omw-1.4.tar.xz')
Listing Installed Lexicons
--------------------------
If you wish to see which lexicons have been added to the database,
:py:func:`wn.lexicons()` returns the list of :py:class:`wn.Lexicon`
objects that describe each one.
>>> for lex in wn.lexicons():
... print(f'{lex.id}:{lex.version}\t{lex.label}')
...
omw-en:1.4 OMW English Wordnet based on WordNet 3.0
omw-nb:1.4 Norwegian Wordnet (Bokmål)
odenet:1.3 Offenes Deutsches WordNet
ewn:2020 English WordNet
ewn:2019 English WordNet
Removing Lexicons
-----------------
Lexicons can be removed from the database with :py:func:`wn.remove`:
>>> wn.remove('omw-nb:1.4')
Note that this removes a single lexicon and not a project, so if, for
instance, you've installed a multi-lexicon project like ``omw``, you
will need to remove each lexicon individually or use a star specifier:
>>> wn.remove('omw-*:1.4')
WN-LMF Files, Packages, and Collections
---------------------------------------
Wn can handle projects with 3 levels of structure:
* WN-LMF XML files
* WN-LMF packages
* WN-LMF collections
WN-LMF XML Files
''''''''''''''''
A WN-LMF XML file is a file with a ``.xml`` extension that is valid
according to the `WN-LMF specification
<https://github.com/globalwordnet/schemas/>`_.
WN-LMF Packages
'''''''''''''''
If one needs to distribute metadata or additional files along with
WN-LMF XML file, a WN-LMF package allows them to include the files in
a directory. The directory should contain exactly one ``.xml`` file,
which is the WN-LMF XML file. In addition, it may contain additional
files and Wn will recognize three of them:
:``LICENSE`` (``.txt`` | ``.md`` | ``.rst`` ): the full text of the license
:``README`` (``.txt`` | ``.md`` | ``.rst`` ): the project README
:``citation.bib``: a BibTeX file containing academic citations for the project
.. code-block::
omw-sq/
├── omw-sq.xml
├── LICENSE.txt
└── README.md
WN-LMF Collections
''''''''''''''''''
In some cases a project may manage multiple resources and distribute
them as a collection. A collection is a directory containing
subdirectories which are WN-LMF packages. The collection may contain
its own README, LICENSE, and citation files which describe the project
as a whole.
.. code-block::
omw-1.4/
├── omw-sq
│ ├── oms-sq.xml
│ ├── LICENSE.txt
│ └── README.md
├── omw-lt
│ ├── citation.bib
│ ├── LICENSE
│ └── omw-lt.xml
├── ...
├── citation.bib
├── LICENSE
└── README.md
================================================
FILE: docs/guides/nltk-migration.rst
================================================
Migrating from the NLTK
=======================
This guide is for users of the `NLTK <https://www.nltk.org/>`_\ 's
``nltk.corpus.wordnet`` module who are migrating to Wn. It is not
guaranteed that Wn will produce the same results as the NLTK's module,
but with some care its behavior can be very similar.
Overview
--------
One important thing to note is that Wn will search all wordnets in the
database by default where the NLTK would only search the English.
>>> from nltk.corpus import wordnet as nltk_wn
>>> nltk_wn.synsets('chat') # only English
>>> nltk_wn.synsets('chat', lang='fra') # only French
>>> import wn
>>> wn.synsets('chat') # all wordnets
>>> wn.synsets('chat', lang='fr') # only French
With Wn it helps to create a :class:`wn.Wordnet` object to pre-filter
the results by language or lexicon.
>>> en = wn.Wordnet('omw-en:1.4')
>>> en.synsets('chat') # only the OMW English Wordnet
Equivalent Operations
---------------------
The following table lists equivalent API calls for the NLTK's wordnet
module and Wn assuming the respective modules have been instantiated
(in separate Python sessions) as follows:
NLTK:
>>> from nltk.corpus import wordnet as wn
>>> ss = wn.synsets("chat", pos="v")[0]
Wn:
>>> import wn
>>> en = wn.Wordnet('omw-en:1.4')
>>> ss = en.synsets("chat", pos="v")[0]
.. default-role:: python
Primary Queries
'''''''''''''''
========================================= ===============================================
NLTK Wn
========================================= ===============================================
`wn.langs()` `[lex.language for lex in wn.lexicons()]`
`wn.lemmas("chat")` --
-- `en.words("chat")`
-- `en.senses("chat")`
`wn.synsets("chat")` `en.synsets("chat")`
`wn.synsets("chat", pos="v")` `en.synsets("chat", pos="v")`
`wn.all_synsets()` `en.synsets()`
`wn.all_synsets(pos="v")` `en.synsets(pos="v")`
========================================= ===============================================
Synsets -- Basic
''''''''''''''''
=================== =================
NLTK Wn
=================== =================
`ss.lemmas()` --
-- `ss.senses()`
-- `ss.words()`
`ss.lemmas_names()` `ss.lemmas()`
`ss.definition()` `ss.definition()`
`ss.examples()` `ss.examples()`
`ss.pos()` `ss.pos`
=================== =================
Synsets -- Relations
''''''''''''''''''''
========================================== =====================================
NLTK Wn
========================================== =====================================
`ss.hypernyms()` `ss.get_related("hypernym")`
`ss.instance_hypernyms()` `ss.get_related("instance_hypernym")`
`ss.hypernyms() + ss.instance_hypernyms()` `ss.hypernyms()`
`ss.hyponyms()` `ss.get_related("hyponym")`
`ss.member_holonyms()` `ss.get_related("holo_member")`
`ss.member_meronyms()` `ss.get_related("mero_member")`
`ss.closure(lambda x: x.hypernyms())` `ss.closure("hypernym")`
========================================== =====================================
Synsets -- Taxonomic Structure
''''''''''''''''''''''''''''''
================================ =========================================================
NLTK Wn
================================ =========================================================
`ss.min_depth()` `ss.min_depth()`
`ss.max_depth()` `ss.max_depth()`
`ss.hypernym_paths()` `[list(reversed([ss] + p)) for p in ss.hypernym_paths()]`
`ss.common_hypernyms(ss)` `ss.common_hypernyms(ss)`
`ss.lowest_common_hypernyms(ss)` `ss.lowest_common_hypernyms(ss)`
`ss.shortest_path_distance(ss)` `len(ss.shortest_path(ss))`
================================ =========================================================
.. reset default role
.. default-role::
(these tables are incomplete)
================================================
FILE: docs/guides/wordnet.rst
================================================
.. raw:: html
<style>.center {margin-left:20%}</style>
The Structure of a Wordnet
==========================
A **wordnet** is an online lexicon which is organized by concepts.
The basic unit of a wordnet is the synonym set (**synset**), a group of words that all refer to the
same concept. Words and synsets are linked by means of conceptual-semantic relations to form the
structure of wordnet.
Words, Senses, and Synsets
--------------------------
We all know that **words** are the basic building blocks of languages, a word is built up with two parts,
its form and its meaning, but in natural languages, the word form and word meaning are not in an elegant
one-to-one match, one word form may connect to many different meanings, so hereforth, we need **senses**,
to work as the unit of word meanings, for example, the word *bank* has at least two senses:
1. bank\ :sup:`1`\: financial institution, like *City Bank*;
2. bank\ :sup:`2`\: sloping land, like *river bank*;
Since **synsets** are group of words sharing the same concept, bank\ :sup:`1`\ and bank\ :sup:`2`\ are members of
two different synsets, although they have the same word form.
On the other hand, different word forms may also convey the same concept, such as *cab* and *taxi*,
these word forms with the same concept are grouped together into one synset.
.. raw:: html
:file: images/word-sense-synset.svg
.. role:: center
:class: center
:center:`Figure: relations between words, senses and synsets`
Synset Relations
----------------
In wordnet, synsets are linked with each other to form various kinds of relations. For example, if
the concept expressed by a synset is more general than a given synset, then it is in a
*hypernym* relation with the given synset. As shown in the figure below, the synset with *car*, *auto* and *automobile* as its
member is the *hypernym* of the other synset with *cab*, *taxi* and *hack*. Such relation which is built on
the synset level is categorized as synset relations.
.. raw:: html
:file: images/synset-synset.svg
:center:`Figure: example of synset relations`
Sense Relations
---------------
Some relations in wordnet are also built on sense level, which can be further divided into two types,
relations that link sense with another sense, and relations that link sense with another synset.
.. note:: In wordnet, synset relation and sense relation can both employ a particular
relation type, such as `domain topic <https://globalwordnet.github.io/gwadoc/#domain_topic>`_.
**Sense-Sense**
Sense to sense relations emphasize the connections between different senses, especially when dealing
with morphologically related words. For example, *behavioral* is the adjective to the noun *behavior*,
which is known as in the *pertainym* relation with *behavior*, however, such relation doesn't exist between
*behavioral* and *conduct*, which is a synonym of *behavior* and
gitextract_ym42fg_3/
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ ├── data-issue.md
│ │ └── feature_request.md
│ └── workflows/
│ ├── checks.yml
│ └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── CITATION.cff
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── bench/
│ ├── README.md
│ ├── conftest.py
│ └── test_bench.py
├── docs/
│ ├── .readthedocs.yaml
│ ├── Makefile
│ ├── _static/
│ │ ├── css/
│ │ │ └── svg.css
│ │ └── demo.ipynb
│ ├── api/
│ │ ├── wn.compat.rst
│ │ ├── wn.compat.sensekey.rst
│ │ ├── wn.constants.rst
│ │ ├── wn.ic.rst
│ │ ├── wn.ili.rst
│ │ ├── wn.lmf.rst
│ │ ├── wn.morphy.rst
│ │ ├── wn.project.rst
│ │ ├── wn.rst
│ │ ├── wn.similarity.rst
│ │ ├── wn.taxonomy.rst
│ │ ├── wn.util.rst
│ │ └── wn.validate.rst
│ ├── cli.rst
│ ├── conf.py
│ ├── docutils.conf
│ ├── faq.rst
│ ├── guides/
│ │ ├── basic.rst
│ │ ├── interlingual.rst
│ │ ├── lemmatization.rst
│ │ ├── lexicons.rst
│ │ ├── nltk-migration.rst
│ │ └── wordnet.rst
│ ├── index.rst
│ ├── make.bat
│ ├── requirements.txt
│ └── setup.rst
├── pyproject.toml
├── tests/
│ ├── _config_test.py
│ ├── _util_test.py
│ ├── compat_sensekey_test.py
│ ├── conftest.py
│ ├── data/
│ │ ├── E101-0.xml
│ │ ├── E101-1.xml
│ │ ├── E101-2.xml
│ │ ├── E101-3.xml
│ │ ├── README.md
│ │ ├── W305-0.xml
│ │ ├── W306-0.xml
│ │ ├── W307-0.xml
│ │ ├── mini-ili-with-status.tsv
│ │ ├── mini-ili.tsv
│ │ ├── mini-lmf-1.0.xml
│ │ ├── mini-lmf-1.1.xml
│ │ ├── mini-lmf-1.3.xml
│ │ ├── mini-lmf-1.4.xml
│ │ ├── sense-key-variations.xml
│ │ ├── sense-key-variations2.xml
│ │ ├── sense-member-order.xml
│ │ └── test-package/
│ │ ├── LICENSE
│ │ ├── README.md
│ │ ├── citation.bib
│ │ └── test-wn.xml
│ ├── db_test.py
│ ├── export_test.py
│ ├── ic_test.py
│ ├── ili_test.py
│ ├── lmf_test.py
│ ├── morphy_test.py
│ ├── primary_query_test.py
│ ├── project_test.py
│ ├── relations_test.py
│ ├── secondary_query_test.py
│ ├── similarity_test.py
│ ├── taxonomy_test.py
│ ├── util_test.py
│ ├── validate_test.py
│ └── wordnet_test.py
└── wn/
├── __init__.py
├── __main__.py
├── _add.py
├── _config.py
├── _core.py
├── _db.py
├── _download.py
├── _exceptions.py
├── _export.py
├── _lexicon.py
├── _metadata.py
├── _module_functions.py
├── _queries.py
├── _types.py
├── _util.py
├── _wordnet.py
├── compat/
│ ├── __init__.py
│ └── sensekey.py
├── constants.py
├── ic.py
├── ili.py
├── index.toml
├── lmf.py
├── metrics.py
├── morphy.py
├── project.py
├── py.typed
├── schema.sql
├── similarity.py
├── taxonomy.py
├── util.py
└── validate.py
SYMBOL INDEX (784 symbols across 47 files)
FILE: bench/conftest.py
function clean_db (line 12) | def clean_db():
function datadir (line 31) | def datadir():
function empty_db (line 36) | def empty_db(clean_db, tmp_path):
function mock_lmf (line 45) | def mock_lmf():
function mock_db_dir (line 67) | def mock_db_dir(mock_lmf, tmp_path_factory):
function mock_db (line 78) | def mock_db(monkeypatch, mock_db_dir):
function _make_synsets (line 85) | def _make_synsets(pos: str, n: int) -> list[lmf.Synset]:
function _words (line 120) | def _words() -> Iterator[str]:
function _make_entries (line 127) | def _make_entries(synsets: list[lmf.Synset]) -> list[lmf.LexicalEntry]:
FILE: bench/test_bench.py
function test_load (line 8) | def test_load(datadir, benchmark):
function test_add_lexical_resource (line 14) | def test_add_lexical_resource(mock_lmf, benchmark):
function test_add_lexical_resource_no_progress (line 28) | def test_add_lexical_resource_no_progress(mock_lmf, benchmark):
function test_synsets (line 43) | def test_synsets(benchmark):
function test_words (line 49) | def test_words(benchmark):
function test_word_senses_no_wordnet (line 55) | def test_word_senses_no_wordnet(benchmark):
function test_word_senses_with_wordnet (line 62) | def test_word_senses_with_wordnet(benchmark):
FILE: tests/_config_test.py
function test_envvar_data_dir (line 6) | def test_envvar_data_dir(monkeypatch, tmp_path):
FILE: tests/_util_test.py
function test_flatten (line 10) | def test_flatten():
function test_unique_list (line 20) | def test_unique_list():
function test_normalize_form (line 32) | def test_normalize_form():
function test_format_lexicon_specifier (line 39) | def test_format_lexicon_specifier():
function test_split_lexicon_specifier (line 46) | def test_split_lexicon_specifier():
FILE: tests/compat_sensekey_test.py
function test_unescape_oewn_sense_key (line 7) | def test_unescape_oewn_sense_key():
function test_escape_oewn_sense_key (line 36) | def test_escape_oewn_sense_key():
function test_unescape_oewn_v2_sense_key (line 58) | def test_unescape_oewn_v2_sense_key():
function test_escape_oewn_v2_sense_key (line 88) | def test_escape_oewn_v2_sense_key():
function test_sense_key_getter (line 112) | def test_sense_key_getter(datadir):
function test_sense_getter (line 138) | def test_sense_getter(datadir):
FILE: tests/conftest.py
function datadir (line 10) | def datadir():
function uninitialized_datadir (line 15) | def uninitialized_datadir(monkeypatch, tmp_path: Path):
function empty_db (line 22) | def empty_db(tmp_path_factory):
function mini_db_dir (line 34) | def mini_db_dir(datadir, tmp_path_factory):
function mini_lmf_compressed (line 46) | def mini_lmf_compressed(datadir, tmp_path):
function mini_db_1_1_dir (line 55) | def mini_db_1_1_dir(datadir, tmp_path_factory):
function mini_db_1_4_dir (line 67) | def mini_db_1_4_dir(datadir, tmp_path_factory):
function mini_db (line 79) | def mini_db(monkeypatch, mini_db_dir):
function mini_db_1_1 (line 87) | def mini_db_1_1(monkeypatch, mini_db_1_1_dir):
function mini_db_1_4 (line 95) | def mini_db_1_4(monkeypatch, mini_db_1_4_dir):
FILE: tests/db_test.py
function test_schema_compatibility (line 11) | def test_schema_compatibility():
function test_db_multithreading (line 18) | def test_db_multithreading():
function test_remove_extension (line 48) | def test_remove_extension(datadir, tmp_path):
function test_add_lexical_resource (line 68) | def test_add_lexical_resource(datadir, tmp_path):
function test_reset_database (line 82) | def test_reset_database(datadir):
FILE: tests/export_test.py
function test_export (line 9) | def test_export(datadir, tmp_path):
function test_export_1_1 (line 26) | def test_export_1_1(datadir, tmp_path):
function test_export_1_4 (line 47) | def test_export_1_4(datadir, tmp_path):
FILE: tests/ic_test.py
function test_compute_nodistribute_nosmoothing (line 43) | def test_compute_nodistribute_nosmoothing():
function test_compute_nodistribute_smoothing (line 66) | def test_compute_nodistribute_smoothing():
function test_compute_distribute_smoothing (line 89) | def test_compute_distribute_smoothing():
function test_load (line 112) | def test_load(tmp_path):
function test_information_content (line 134) | def test_information_content():
FILE: tests/ili_test.py
function test_is_ili_tsv (line 11) | def test_is_ili_tsv(datadir: Path) -> None:
function test_load_tsv (line 18) | def test_load_tsv(datadir: Path) -> None:
function test_get (line 32) | def test_get() -> None:
function test_get_proposed (line 60) | def test_get_proposed() -> None:
FILE: tests/lmf_test.py
function test_is_lmf (line 6) | def test_is_lmf(datadir):
function test_scan_lexicons (line 14) | def test_scan_lexicons(datadir):
function test_load_1_0 (line 49) | def test_load_1_0(datadir):
function test_load_1_1 (line 93) | def test_load_1_1(datadir):
function test_load_1_3 (line 112) | def test_load_1_3(datadir):
function test_load_1_4 (line 130) | def test_load_1_4(datadir):
function test_dump (line 160) | def test_dump(datadir, tmp_path):
FILE: tests/morphy_test.py
function test_morphy_uninitialized (line 7) | def test_morphy_uninitialized():
function test_morphy_initialized (line 31) | def test_morphy_initialized():
function test_issue_154 (line 47) | def test_issue_154():
FILE: tests/primary_query_test.py
function test_lexicons_uninitialized (line 7) | def test_lexicons_uninitialized():
function test_lexicons_empty (line 12) | def test_lexicons_empty():
function test_lexicons_mini (line 17) | def test_lexicons_mini():
function test_lexicons_unknown (line 53) | def test_lexicons_unknown():
function test_words_empty (line 61) | def test_words_empty():
function test_words_mini (line 66) | def test_words_mini():
function test_lemmas_empty (line 106) | def test_lemmas_empty():
function test_lemmas_mini_1_4 (line 111) | def test_lemmas_mini_1_4():
function test_word_empty (line 161) | def test_word_empty():
function test_word_mini (line 167) | def test_word_mini():
function test_senses_empty (line 182) | def test_senses_empty():
function test_senses_mini (line 187) | def test_senses_mini():
function test_sense_empty (line 224) | def test_sense_empty():
function test_sense_mini (line 230) | def test_sense_mini():
function test_synsets_empty (line 245) | def test_synsets_empty():
function test_synsets_mini (line 250) | def test_synsets_mini():
function test_synset_empty (line 288) | def test_synset_empty():
function test_synset_mini (line 294) | def test_synset_mini():
function test_mini_1_1 (line 309) | def test_mini_1_1():
function test_mini_1_1_lexicons (line 346) | def test_mini_1_1_lexicons():
function test_mini_1_4 (line 375) | def test_mini_1_4():
FILE: tests/project_test.py
function test_is_package_directory (line 4) | def test_is_package_directory(datadir):
function test_is_collection_directory (line 9) | def test_is_collection_directory(datadir):
function test_get_project (line 15) | def test_get_project(datadir):
function test_iterpackages (line 31) | def test_iterpackages(datadir):
function test_compressed_iterpackages (line 46) | def test_compressed_iterpackages(mini_lmf_compressed):
FILE: tests/relations_test.py
function test_word_derived_words (line 7) | def test_word_derived_words():
function test_synset_hypernyms (line 13) | def test_synset_hypernyms():
function test_synset_hypernyms_expand_default (line 19) | def test_synset_hypernyms_expand_default():
function test_synset_hypernyms_expand_empty (line 25) | def test_synset_hypernyms_expand_empty():
function test_synset_hypernyms_expand_specified (line 31) | def test_synset_hypernyms_expand_specified():
function test_synset_relations (line 37) | def test_synset_relations():
function test_sense_get_related (line 46) | def test_sense_get_related():
function test_sense_relations (line 54) | def test_sense_relations():
function test_extension_relations (line 62) | def test_extension_relations():
function test_sense_synset_issue_168 (line 96) | def test_sense_synset_issue_168():
function test_synset_relations_issue_169 (line 104) | def test_synset_relations_issue_169():
function test_synset_relations_issue_177 (line 113) | def test_synset_relations_issue_177():
function test_sense_relation_data_true (line 119) | def test_sense_relation_data_true():
function test_synset_relations_data_true (line 133) | def test_synset_relations_data_true():
FILE: tests/secondary_query_test.py
function test_word_senses (line 7) | def test_word_senses():
function test_word_synsets (line 13) | def test_word_synsets():
function test_word_translate (line 19) | def test_word_translate():
function test_word_translate_issue_316 (line 25) | def test_word_translate_issue_316():
function test_word_lemma_tags (line 35) | def test_word_lemma_tags():
function test_word_lemma_pronunciations (line 45) | def test_word_lemma_pronunciations():
function test_sense_word (line 56) | def test_sense_word():
function test_sense_synset (line 66) | def test_sense_synset():
function test_sense_issue_157 (line 76) | def test_sense_issue_157():
function test_sense_examples (line 86) | def test_sense_examples():
function test_sense_counts (line 92) | def test_sense_counts():
function test_sense_lexicalized (line 101) | def test_sense_lexicalized():
function test_sense_frames (line 107) | def test_sense_frames():
function test_sense_frames_issue_156 (line 116) | def test_sense_frames_issue_156():
function test_sense_translate (line 125) | def test_sense_translate():
function test_sense_translate_issue_316 (line 131) | def test_sense_translate_issue_316():
function test_synset_senses (line 139) | def test_synset_senses():
function test_synset_words (line 145) | def test_synset_words():
function test_synset_lemmas (line 151) | def test_synset_lemmas():
function test_synset_ili (line 157) | def test_synset_ili():
function test_synset_definition (line 163) | def test_synset_definition():
function test_synset_definitions (line 172) | def test_synset_definitions():
function test_synset_examples (line 180) | def test_synset_examples():
function test_synset_lexicalized (line 189) | def test_synset_lexicalized():
function test_synset_translate (line 195) | def test_synset_translate():
function test_synset_translate_issue_316 (line 201) | def test_synset_translate_issue_316():
function test_word_sense_order (line 212) | def test_word_sense_order(datadir):
function test_synset_member_order (line 225) | def test_synset_member_order(datadir):
function test_confidence (line 238) | def test_confidence():
FILE: tests/similarity_test.py
function get_synsets (line 11) | def get_synsets(w):
function test_path (line 48) | def test_path():
function test_wup (line 67) | def test_wup():
function test_lch (line 89) | def test_lch():
function test_res (line 117) | def test_res():
function test_jcn (line 142) | def test_jcn():
function test_lin (line 164) | def test_lin():
FILE: tests/taxonomy_test.py
function test_roots (line 16) | def test_roots():
function test_leaves (line 43) | def test_leaves():
function test_taxonomy_depth (line 57) | def test_taxonomy_depth():
function test_hypernym_paths (line 64) | def test_hypernym_paths():
function test_interlingual_hypernym_paths (line 76) | def test_interlingual_hypernym_paths():
function test_shortest_path (line 88) | def test_shortest_path():
function test_min_depth (line 111) | def test_min_depth():
function test_max_depth (line 119) | def test_max_depth():
FILE: tests/util_test.py
function test_synset_id_formatter (line 4) | def test_synset_id_formatter():
FILE: tests/validate_test.py
function test_validate (line 19) | def test_validate(datadir, code: str, i: int) -> None:
FILE: tests/wordnet_test.py
function test_wordnet_lexicons (line 9) | def test_wordnet_lexicons():
function test_wordnet_normalize (line 40) | def test_wordnet_normalize():
function test_wordnet_lemmatize (line 57) | def test_wordnet_lemmatize():
function test_portable_entities_issue_226 (line 86) | def test_portable_entities_issue_226(monkeypatch, tmp_path, datadir):
FILE: wn/__main__.py
function _download (line 14) | def _download(args: argparse.Namespace) -> None:
function _cache (line 21) | def _cache(args: argparse.Namespace) -> None:
function _lexicons (line 40) | def _lexicons(args: argparse.Namespace) -> None:
function _projects (line 45) | def _projects(args: argparse.Namespace) -> None:
function _validate (line 63) | def _validate(args: argparse.Namespace) -> None:
function _path_type (line 95) | def _path_type(arg):
function _file_path_type (line 99) | def _file_path_type(arg):
FILE: wn/_add.py
function add (line 85) | def add(
function _add_lmf (line 121) | def _add_lmf(
function add_lexical_resource (line 143) | def add_lexical_resource(
function _add_lexical_resource (line 180) | def _add_lexical_resource(
function _precheck (line 239) | def _precheck(
function _sum_counts (line 272) | def _sum_counts(lex: _AnyLexicon) -> int:
function _update_lookup_tables (line 308) | def _update_lookup_tables(lexicon: _AnyLexicon, cur: sqlite3.Cursor) -> ...
function _insert_lexicon (line 333) | def _insert_lexicon(
function _build_lexid_map (line 402) | def _build_lexid_map(lexicon: _AnyLexicon, lexid: int, extid: int) -> _L...
function _batch (line 422) | def _batch(sequence: Iterable[T]) -> Iterator[list[T]]:
function _insert_synsets (line 430) | def _insert_synsets(
function _insert_synset_definitions (line 509) | def _insert_synset_definitions(
function _insert_synset_relations (line 540) | def _insert_synset_relations(
function _insert_entries (line 570) | def _insert_entries(
function _insert_index (line 587) | def _insert_index(
function _insert_forms (line 609) | def _insert_forms(
function _insert_pronunciations (line 663) | def _insert_pronunciations(
function _insert_tags (line 730) | def _insert_tags(
function _insert_senses (line 776) | def _insert_senses(
function _insert_adjpositions (line 837) | def _insert_adjpositions(
function _insert_counts (line 855) | def _insert_counts(
function _collect_frames (line 880) | def _collect_frames(lexicon: _AnyLexicon) -> list[lmf.SyntacticBehaviour]:
function _insert_syntactic_behaviours (line 919) | def _insert_syntactic_behaviours(
function _insert_sense_relations (line 955) | def _insert_sense_relations(
function _insert_examples (line 1008) | def _insert_examples(
function _add_ili (line 1039) | def _add_ili(
function remove (line 1073) | def remove(lexicon: str, progress_handler: type[ProgressHandler] = Progr...
function _entries (line 1122) | def _entries(lex: _AnyLexicon) -> Sequence[_AnyEntry]:
function _forms (line 1126) | def _forms(e: _AnyEntry) -> Sequence[_AnyForm]:
function _senses (line 1130) | def _senses(e: _AnyEntry) -> Sequence[_AnySense]:
function _synsets (line 1134) | def _synsets(lex: _AnyLexicon) -> Sequence[_AnySynset]:
function _is_external (line 1138) | def _is_external(x: _AnyForm | _AnyLemma | _AnyEntry | _AnySense | _AnyS...
function _local_synsets (line 1142) | def _local_synsets(synsets: Sequence[_AnySynset]) -> Iterator[lmf.Synset]:
function _local_entries (line 1149) | def _local_entries(entries: Sequence[_AnyEntry]) -> Iterator[lmf.Lexical...
function _local_senses (line 1156) | def _local_senses(senses: Sequence[_AnySense]) -> Iterator[lmf.Sense]:
FILE: wn/_config.py
class ResourceType (line 36) | class ResourceType(str, Enum):
class VersionInfo (line 41) | class VersionInfo(TypedDict):
class ProjectInfo (line 47) | class ProjectInfo(TypedDict):
class ResolvedProjectInfo (line 56) | class ResolvedProjectInfo(TypedDict):
class CacheEntry (line 67) | class CacheEntry(TypedDict):
class WNConfig (line 74) | class WNConfig:
method __init__ (line 77) | def __init__(self):
method data_directory (line 84) | def data_directory(self) -> Path:
method data_directory (line 102) | def data_directory(self, path: AnyPath) -> None:
method database_path (line 110) | def database_path(self) -> Path:
method downloads_directory (line 120) | def downloads_directory(self) -> Path:
method index (line 132) | def index(self) -> dict[str, ProjectInfo]:
method add_project (line 136) | def add_project(
method add_project_version (line 169) | def add_project_version(
method get_project_info (line 203) | def get_project_info(self, arg: str) -> ResolvedProjectInfo:
method get_cache_path (line 251) | def get_cache_path(self, url: str) -> Path:
method list_cache_entries (line 261) | def list_cache_entries(self, arg: str = "*") -> list[CacheEntry]:
method update (line 297) | def update(self, data: dict[str, Any]) -> None:
method _update_index (line 319) | def _update_index(self, index: dict[str, Any]) -> None:
method load_index (line 350) | def load_index(self, path: AnyPath) -> None:
function _get_cache_path_for_urls (line 381) | def _get_cache_path_for_urls(
function _cache_map (line 392) | def _cache_map(config: WNConfig) -> dict[Path, tuple[str, str, str]]:
FILE: wn/_core.py
class _EntityType (line 49) | class _EntityType(str, enum.Enum):
class _LexiconDataElement (line 69) | class _LexiconDataElement(LexiconElementWithMetadata):
method __init__ (line 82) | def __init__(
method __eq__ (line 92) | def __eq__(self, other) -> bool:
method __hash__ (line 97) | def __hash__(self) -> int:
method _get_lexicons (line 100) | def _get_lexicons(self) -> tuple[str, ...]:
class Pronunciation (line 112) | class Pronunciation(LexiconElement):
class Tag (line 126) | class Tag(LexiconElement):
class Form (line 137) | class Form(LexiconElement):
method pronunciations (line 151) | def pronunciations(self) -> list[Pronunciation]:
method tags (line 154) | def tags(self) -> list[Tag]:
function _make_form (line 158) | def _make_form(
class Word (line 176) | class Word(_LexiconDataElement):
method __init__ (line 186) | def __init__(
method __repr__ (line 196) | def __repr__(self) -> str:
method lemma (line 200) | def lemma(self, *, data: Literal[False] = False) -> str: ...
method lemma (line 202) | def lemma(self, *, data: Literal[True] = True) -> Form: ...
method lemma (line 206) | def lemma(self, *, data: bool) -> str | Form: ...
method lemma (line 208) | def lemma(self, *, data: bool = False) -> str | Form:
method forms (line 231) | def forms(self, *, data: Literal[False] = False) -> list[str]: ...
method forms (line 233) | def forms(self, *, data: Literal[True] = True) -> list[Form]: ...
method forms (line 237) | def forms(self, *, data: bool) -> list[str] | list[Form]: ...
method forms (line 239) | def forms(self, *, data: bool = False) -> list[str] | list[Form]:
method senses (line 261) | def senses(self) -> list[Sense]:
method metadata (line 274) | def metadata(self) -> Metadata:
method synsets (line 278) | def synsets(self) -> list[Synset]:
method derived_words (line 289) | def derived_words(self) -> list[Word]:
method translate (line 304) | def translate(
class Relation (line 333) | class Relation(LexiconElementWithMetadata):
method __init__ (line 344) | def __init__(
method __repr__ (line 359) | def __repr__(self) -> str:
method __eq__ (line 365) | def __eq__(self, other) -> bool:
method __hash__ (line 376) | def __hash__(self) -> int:
method subtype (line 381) | def subtype(self) -> str | None:
class _Relatable (line 394) | class _Relatable(_LexiconDataElement):
method relations (line 396) | def relations(
method relations (line 400) | def relations(
method relations (line 406) | def relations(
method relations (line 410) | def relations(
method get_related (line 415) | def get_related(self: T, *args: str) -> list[T]:
method closure (line 418) | def closure(self: T, *args: str) -> Iterator[T]:
method relation_paths (line 428) | def relation_paths(self: T, *args: str, end: T | None = None) -> Itera...
class Example (line 454) | class Example(LexiconElementWithMetadata):
method metadata (line 464) | def metadata(self) -> Metadata:
class Definition (line 470) | class Definition(LexiconElementWithMetadata):
method metadata (line 481) | def metadata(self) -> Metadata:
class Synset (line 486) | class Synset(_Relatable):
method __init__ (line 497) | def __init__(
method empty (line 510) | def empty(
method __eq__ (line 519) | def __eq__(self, other) -> bool:
method __hash__ (line 529) | def __hash__(self) -> int:
method __repr__ (line 532) | def __repr__(self) -> str:
method ili (line 536) | def ili(self) -> str | None:
method definition (line 540) | def definition(self, *, data: Literal[False] = False) -> str | None: ...
method definition (line 542) | def definition(self, *, data: Literal[True] = True) -> Definition | No...
method definition (line 546) | def definition(self, *, data: bool) -> str | Definition | None: ...
method definition (line 548) | def definition(self, *, data: bool = False) -> str | Definition | None:
method definitions (line 580) | def definitions(self, *, data: Literal[False] = False) -> list[str]: ...
method definitions (line 582) | def definitions(self, *, data: Literal[True] = True) -> list[Definitio...
method definitions (line 586) | def definitions(self, *, data: bool) -> list[str] | list[Definition]: ...
method definitions (line 588) | def definitions(self, *, data: bool = False) -> list[str] | list[Defin...
method examples (line 621) | def examples(self, *, data: Literal[False] = False) -> list[str]: ...
method examples (line 623) | def examples(self, *, data: Literal[True] = True) -> list[Example]: ...
method examples (line 627) | def examples(self, *, data: bool) -> list[str] | list[Example]: ...
method examples (line 629) | def examples(self, *, data: bool = False) -> list[str] | list[Example]:
method senses (line 652) | def senses(self) -> list[Sense]:
method lexicalized (line 665) | def lexicalized(self) -> bool:
method lexfile (line 669) | def lexfile(self) -> str | None:
method metadata (line 673) | def metadata(self) -> Metadata:
method words (line 677) | def words(self) -> list[Word]:
method lemmas (line 689) | def lemmas(self, *, data: Literal[False] = False) -> list[str]: ...
method lemmas (line 691) | def lemmas(self, *, data: Literal[True] = True) -> list[Form]: ...
method lemmas (line 695) | def lemmas(self, *, data: bool) -> list[str] | list[Form]: ...
method lemmas (line 697) | def lemmas(self, *, data: bool = False) -> list[str] | list[Form]:
method relations (line 720) | def relations(
method relations (line 724) | def relations(
method relations (line 730) | def relations(
method relations (line 734) | def relations(
method get_related (line 772) | def get_related(self, *args: str) -> list[Synset]:
method _iter_relations (line 792) | def _iter_relations(self, *args: str) -> Iterator[tuple[Relation, Syns...
method _iter_local_relations (line 799) | def _iter_local_relations(
method _iter_expanded_relations (line 819) | def _iter_expanded_relations(
method hypernym_paths (line 846) | def hypernym_paths(self, simulate_root: bool = False) -> list[list[Syn...
method min_depth (line 850) | def min_depth(self, simulate_root: bool = False) -> int:
method max_depth (line 854) | def max_depth(self, simulate_root: bool = False) -> int:
method shortest_path (line 858) | def shortest_path(self, other: Synset, simulate_root: bool = False) ->...
method common_hypernyms (line 862) | def common_hypernyms(
method lowest_common_hypernyms (line 868) | def lowest_common_hypernyms(
method holonyms (line 876) | def holonyms(self) -> list[Synset]:
method meronyms (line 893) | def meronyms(self) -> list[Synset]:
method hypernyms (line 910) | def hypernyms(self) -> list[Synset]:
method hyponyms (line 919) | def hyponyms(self) -> list[Synset]:
method translate (line 928) | def translate(
class Count (line 959) | class Count(LexiconElementWithMetadata):
class Sense (line 969) | class Sense(_Relatable):
method __init__ (line 977) | def __init__(
method __repr__ (line 989) | def __repr__(self) -> str:
method word (line 992) | def word(self) -> Word:
method synset (line 1005) | def synset(self) -> Synset:
method examples (line 1019) | def examples(self, *, data: Literal[False] = False) -> list[str]: ...
method examples (line 1021) | def examples(self, *, data: Literal[True] = True) -> list[Example]: ...
method examples (line 1025) | def examples(self, *, data: bool) -> list[str] | list[Example]: ...
method examples (line 1027) | def examples(self, *, data: bool = False) -> list[str] | list[Example]:
method lexicalized (line 1044) | def lexicalized(self) -> bool:
method adjposition (line 1048) | def adjposition(self) -> str | None:
method frames (line 1061) | def frames(self) -> list[str]:
method counts (line 1067) | def counts(self, *, data: Literal[False] = False) -> list[int]: ...
method counts (line 1069) | def counts(self, *, data: Literal[True] = True) -> list[Count]: ...
method counts (line 1073) | def counts(self, *, data: bool) -> list[int] | list[Count]: ...
method counts (line 1075) | def counts(self, *, data: bool = False) -> list[int] | list[Count]:
method metadata (line 1087) | def metadata(self) -> Metadata:
method relations (line 1092) | def relations(
method relations (line 1096) | def relations(
method relations (line 1102) | def relations(
method relations (line 1106) | def relations(
method synset_relations (line 1137) | def synset_relations(
method synset_relations (line 1141) | def synset_relations(
method synset_relations (line 1147) | def synset_relations(
method synset_relations (line 1151) | def synset_relations(
method get_related (line 1181) | def get_related(self, *args: str) -> list[Sense]:
method get_related_synsets (line 1199) | def get_related_synsets(self, *args: str) -> list[Synset]:
method _iter_sense_relations (line 1205) | def _iter_sense_relations(self, *args: str) -> Iterator[tuple[Relation...
method _iter_sense_synset_relations (line 1213) | def _iter_sense_synset_relations(
method translate (line 1224) | def translate(
FILE: wn/_db.py
function _adapt_dict (line 44) | def _adapt_dict(d: dict) -> bytes:
function _convert_dict (line 48) | def _convert_dict(s: bytes) -> dict:
function _convert_boolean (line 52) | def _convert_boolean(s: bytes) -> bool:
function connect (line 69) | def connect(check_schema: bool = True) -> sqlite3.Connection:
function _init_db (line 94) | def _init_db(conn: sqlite3.Connection) -> None:
function _check_schema_compatibility (line 104) | def _check_schema_compatibility(conn: sqlite3.Connection, dbpath: Path) ...
function list_lexicons_safe (line 135) | def list_lexicons_safe(conn: sqlite3.Connection | None = None) -> list[s...
function schema_hash (line 146) | def schema_hash(conn: sqlite3.Connection) -> str:
function clear_connections (line 158) | def clear_connections() -> None:
FILE: wn/_download.py
function download (line 20) | def download(
function _get_cache_path_and_urls (line 81) | def _get_cache_path_and_urls(project_or_url: str) -> tuple[Path | None, ...
function _download (line 89) | def _download(urls: Sequence[str], progress: ProgressHandler) -> Path:
FILE: wn/_exceptions.py
class Error (line 1) | class Error(Exception):
class DatabaseError (line 8) | class DatabaseError(Error):
class ConfigurationError (line 14) | class ConfigurationError(Error):
class ProjectError (line 20) | class ProjectError(Error):
class WnWarning (line 26) | class WnWarning(Warning):
FILE: wn/_export.py
function export (line 42) | def export(
function _precheck (line 71) | def _precheck(lexicons: Sequence[Lexicon]) -> None:
class _LexSpecs (line 88) | class _LexSpecs(NamedTuple):
class _LMFExporter (line 93) | class _LMFExporter:
method __init__ (line 102) | def __init__(self, version: str) -> None:
method export (line 111) | def export(self, lexicon: Lexicon) -> lmf.Lexicon | lmf.LexiconExtension:
method _lexicon (line 124) | def _lexicon(self, lexicon: Lexicon) -> lmf.Lexicon:
method _requires (line 145) | def _requires(self) -> list[lmf.Dependency]:
method _dependency (line 152) | def _dependency(self, id: str, version: str, url: str | None) -> lmf.D...
method _entries (line 156) | def _entries(
method _entries (line 161) | def _entries(self, extension: Literal[False]) -> Iterator[lmf.LexicalE...
method _entries (line 163) | def _entries(
method _entry (line 174) | def _entry(self, id: str, pos: str) -> lmf.LexicalEntry:
method _lemma (line 197) | def _lemma(self, form: Form, pos: str) -> lmf.Lemma:
method _form (line 206) | def _form(self, form: Form) -> lmf.Form:
method _pronunciations (line 215) | def _pronunciations(self, prons: list[Pronunciation]) -> list[lmf.Pron...
method _tags (line 229) | def _tags(self, tags: list[Tag]) -> list[lmf.Tag]:
method _senses (line 238) | def _senses(
method _senses (line 243) | def _senses(
method _senses (line 247) | def _senses(
method _sense (line 259) | def _sense(self, sense: Sense, index: str | None, i: int) -> lmf.Sense:
method _sense_relations (line 276) | def _sense_relations(self, sense_id: str) -> list[lmf.Relation]:
method _examples (line 294) | def _examples(self, id: str, table: str) -> list[lmf.Example]:
method _counts (line 301) | def _counts(self, sense_id: str) -> list[lmf.Count]:
method _synsets (line 309) | def _synsets(
method _synsets (line 314) | def _synsets(self, extension: Literal[False]) -> Iterator[lmf.Synset]:...
method _synsets (line 316) | def _synsets(
method _synset (line 327) | def _synset(self, id: str, pos: str, ili: str) -> lmf.Synset:
method _definitions (line 350) | def _definitions(self, synset_id: str) -> list[lmf.Definition]:
method _ili_definition (line 364) | def _ili_definition(self, synset: str) -> lmf.ILIDefinition | None:
method _synset_relations (line 378) | def _synset_relations(
method _syntactic_behaviours_1_0 (line 391) | def _syntactic_behaviours_1_0(
method _syntactic_behaviours_1_1 (line 409) | def _syntactic_behaviours_1_1(self) -> list[lmf.SyntacticBehaviour]:
method _metadata (line 416) | def _metadata(self, id: str, table: str) -> lmf.Metadata:
method _lexicon_extension (line 421) | def _lexicon_extension(
method _ext_entry (line 448) | def _ext_entry(self, id: str) -> lmf.ExternalLexicalEntry | None:
method _ext_lemma (line 465) | def _ext_lemma(self, lemma: Form) -> lmf.ExternalLemma | None:
method _ext_forms (line 477) | def _ext_forms(self, forms: list[Form]) -> list[lmf.Form | lmf.Externa...
method _ext_form (line 487) | def _ext_form(self, form: Form) -> lmf.ExternalForm | None:
method _ext_sense (line 502) | def _ext_sense(self, id: str) -> lmf.ExternalSense | None:
method _ext_synset (line 516) | def _ext_synset(self, id: str) -> lmf.ExternalSynset | None:
function _build_sbmap (line 539) | def _build_sbmap(lexicons: Sequence[str]) -> _SBMap:
function _get_entry_forms (line 549) | def _get_entry_forms(id: str, lexicons: Sequence[str]) -> tuple[Form, li...
function _get_sense_n (line 555) | def _get_sense_n(id: str, lexspec: str, index: str | None, i: int) -> int:
function _get_external_sense_ids (line 568) | def _get_external_sense_ids(lexspecs: _LexSpecs) -> set[str]:
function _get_external_synset_ids (line 575) | def _get_external_synset_ids(lexspecs: _LexSpecs) -> set[str]:
FILE: wn/_lexicon.py
class Lexicon (line 31) | class Lexicon(HasMetadata):
method from_specifier (line 49) | def from_specifier(cls: type[Self], specifier: str) -> Self:
method __repr__ (line 66) | def __repr__(self):
method specifier (line 69) | def specifier(self) -> str:
method confidence (line 73) | def confidence(self) -> float:
method modified (line 80) | def modified(self) -> bool:
method requires (line 84) | def requires(self) -> dict[str, Lexicon | None]:
method extends (line 91) | def extends(self) -> Lexicon | None:
method extensions (line 101) | def extensions(self, depth: int = 1) -> list[Lexicon]:
method describe (line 117) | def describe(self, full: bool = True) -> str:
function _desc_counts (line 149) | def _desc_counts(query: Callable, lexspecs: Sequence[str]) -> str:
class LexiconElement (line 160) | class LexiconElement(Protocol):
method lexicon (line 165) | def lexicon(self) -> Lexicon:
class LexiconElementWithMetadata (line 170) | class LexiconElementWithMetadata(LexiconElement, HasMetadata, Protocol):
method confidence (line 173) | def confidence(self) -> float:
class LexiconConfiguration (line 185) | class LexiconConfiguration(NamedTuple):
FILE: wn/_metadata.py
class Metadata (line 4) | class Metadata(TypedDict, total=False):
class HasMetadata (line 26) | class HasMetadata(Protocol):
method _metadata (line 28) | def _metadata(self) -> Metadata | None:
method metadata (line 31) | def metadata(self) -> Metadata:
method confidence (line 35) | def confidence(self) -> float:
FILE: wn/_module_functions.py
function projects (line 13) | def projects() -> list[ResolvedProjectInfo]:
function lexicons (line 39) | def lexicons(*, lexicon: str | None = "*", lang: str | None = None) -> l...
function reset_database (line 56) | def reset_database(rebuild: bool = False) -> None:
function word (line 86) | def word(id: str, *, lexicon: str | None = None, lang: str | None = None...
function words (line 100) | def words(
function lemmas (line 125) | def lemmas(
function lemmas (line 136) | def lemmas(
function lemmas (line 147) | def lemmas(
function lemmas (line 157) | def lemmas(
function synset (line 186) | def synset(id: str, *, lexicon: str | None = None, lang: str | None = No...
function synsets (line 200) | def synsets(
function senses (line 223) | def senses(
function sense (line 245) | def sense(id: str, *, lexicon: str | None = None, lang: str | None = Non...
FILE: wn/_queries.py
function resolve_lexicon_specifiers (line 114) | def resolve_lexicon_specifiers(
function get_lexicon (line 139) | def get_lexicon(lexicon: str) -> _Lexicon:
function get_modified (line 152) | def get_modified(lexicon: str) -> bool:
function get_lexicon_dependencies (line 157) | def get_lexicon_dependencies(lexicon: str) -> list[tuple[str, str, bool]]:
function get_lexicon_extension_bases (line 170) | def get_lexicon_extension_bases(lexicon: str, depth: int = -1) -> list[s...
function get_lexicon_extensions (line 190) | def get_lexicon_extensions(lexicon: str, depth: int = -1) -> list[str]:
function get_ili (line 210) | def get_ili(id: str) -> _ExistingILI | None:
function find_ilis (line 221) | def find_ilis(
function find_proposed_ilis (line 253) | def find_proposed_ilis(
function find_entries (line 276) | def find_entries(
function _load_lemmas_with_details (line 310) | def _load_lemmas_with_details(
function find_lemmas (line 370) | def find_lemmas(
function find_senses (line 413) | def find_senses(
function find_synsets (line 464) | def find_synsets(
function get_entry_forms (line 529) | def get_entry_forms(id: str, lexicons: Sequence[str]) -> Iterator[Form]:
function get_synsets_for_ilis (line 565) | def get_synsets_for_ilis(
function get_synset_relations (line 583) | def get_synset_relations(
function get_expanded_synset_relations (line 638) | def get_expanded_synset_relations(
function get_definitions (line 684) | def get_definitions(
function get_examples (line 710) | def get_examples(
function find_syntactic_behaviours (line 730) | def find_syntactic_behaviours(
function get_syntactic_behaviours (line 761) | def get_syntactic_behaviours(
function _get_senses (line 779) | def _get_senses(
function get_entry_senses (line 807) | def get_entry_senses(
function get_synset_members (line 813) | def get_synset_members(
function get_sense_relations (line 819) | def get_sense_relations(
function get_sense_synset_relations (line 868) | def get_sense_synset_relations(
function get_relation_targets (line 916) | def get_relation_targets(
function get_metadata (line 965) | def get_metadata(id: str, lexicon: str, table: str) -> Metadata:
function get_ili_metadata (line 982) | def get_ili_metadata(id: str) -> Metadata:
function get_proposed_ili_metadata (line 990) | def get_proposed_ili_metadata(synset: str, lexicon: str) -> Metadata:
function get_lexicalized (line 1011) | def get_lexicalized(id: str, lexicon: str, table: str) -> bool:
function get_adjposition (line 1030) | def get_adjposition(sense_id: str, lexicon: str) -> str | None:
function get_sense_counts (line 1046) | def get_sense_counts(sense_id: str, lexicons: Sequence[str]) -> list[_Co...
function get_lexfile (line 1060) | def get_lexfile(synset_id: str, lexicon: str) -> str | None:
function get_entry_index (line 1076) | def get_entry_index(entry_id: str, lexicon: str) -> str | None:
function get_sense_n (line 1092) | def get_sense_n(sense_id: str, lexicon: str) -> int | None:
function _qs (line 1107) | def _qs(xs: Collection) -> str:
function _vs (line 1111) | def _vs(xs: Collection) -> str:
function _kws (line 1115) | def _kws(xs: Collection) -> str:
function _query_forms (line 1119) | def _query_forms(
function _build_entry_conditions (line 1147) | def _build_entry_conditions(
FILE: wn/_util.py
function version_info (line 12) | def version_info(version_string: str) -> VersionInfo:
function is_url (line 16) | def is_url(string: str) -> bool:
function is_gzip (line 22) | def is_gzip(path: Path) -> bool:
function is_lzma (line 27) | def is_lzma(path: Path) -> bool:
function is_xml (line 32) | def is_xml(path: Path) -> bool:
function is_str_key_dict (line 37) | def is_str_key_dict(obj: Any) -> TypeGuard[dict[str, Any]]:
function _inspect_file_signature (line 41) | def _inspect_file_signature(path: Path, signature: bytes) -> bool:
function short_hash (line 48) | def short_hash(string: str) -> str:
function flatten (line 58) | def flatten(iterable: Iterable[Iterable[T]]) -> list[T]:
function unique_list (line 65) | def unique_list(items: Iterable[H]) -> list[H]:
function normalize_form (line 71) | def normalize_form(s: str) -> str:
function format_lexicon_specifier (line 75) | def format_lexicon_specifier(id: str, version: str) -> str:
function split_lexicon_specifier (line 79) | def split_lexicon_specifier(lexicon: str) -> tuple[str, str]:
FILE: wn/_wordnet.py
class Wordnet (line 27) | class Wordnet:
method __init__ (line 91) | def __init__(
method lexicons (line 129) | def lexicons(self) -> list[Lexicon]:
method expanded_lexicons (line 133) | def expanded_lexicons(self) -> list[Lexicon]:
method word (line 137) | def word(self, id: str) -> Word:
method words (line 146) | def words(self, form: str | None = None, pos: str | None = None) -> li...
method lemmas (line 158) | def lemmas(
method lemmas (line 166) | def lemmas(
method lemmas (line 176) | def lemmas(
method lemmas (line 180) | def lemmas(
method synset (line 222) | def synset(self, id: str) -> Synset:
method synsets (line 231) | def synsets(
method sense (line 247) | def sense(self, id: str) -> Sense:
method senses (line 256) | def senses(self, form: str | None = None, pos: str | None = None) -> l...
method describe (line 267) | def describe(self) -> str:
function _resolve_lexicon_dependencies (line 295) | def _resolve_lexicon_dependencies(
function _find_lemmas (line 320) | def _find_lemmas(
function _query_with_forms (line 354) | def _query_with_forms(
function _find_helper (line 379) | def _find_helper(
FILE: wn/compat/sensekey.py
function unescape (line 129) | def unescape(s: str, /, flavor: str = "oewn-v2") -> str:
function _unescape_oewn (line 161) | def _unescape_oewn(s: str, escape_sequences: list[tuple[str, str]]) -> str:
function escape (line 172) | def escape(sense_key: str, /, flavor: str = "oewn-v2") -> str:
function _escape_oewn (line 196) | def _escape_oewn(sense_key: str, escape_sequences: list[tuple[str, str]]...
function sense_key_getter (line 207) | def sense_key_getter(lexicon: str) -> SensekeyGetter:
function sense_getter (line 255) | def sense_getter(lexicon: str, wordnet: wn.Wordnet | None = None) -> Sen...
FILE: wn/ic.py
function information_content (line 22) | def information_content(synset: Synset, freq: Freq) -> float:
function synset_probability (line 32) | def synset_probability(synset: Synset, freq: Freq) -> float:
function _initialize (line 47) | def _initialize(
function compute (line 70) | def compute(
function load (line 155) | def load(
function _parse_ic_file (line 203) | def _parse_ic_file(icfile: TextIO) -> Iterator[tuple[int, str, float, bo...
FILE: wn/ili.py
class ILIStatus (line 33) | class ILIStatus(str, Enum):
class ILIDefinition (line 43) | class ILIDefinition(HasMetadata):
method metadata (line 52) | def metadata(self) -> Metadata:
method confidence (line 56) | def confidence(self) -> float:
class ILIProtocol (line 69) | class ILIProtocol(Protocol):
method id (line 74) | def id(self) -> str | None:
method status (line 79) | def status(self) -> ILIStatus:
method definition (line 84) | def definition(self, *, data: Literal[False] = False) -> str | None: ...
method definition (line 86) | def definition(self, *, data: Literal[True] = True) -> ILIDefinition |...
method definition (line 90) | def definition(self, *, data: bool) -> str | ILIDefinition | None: ...
method definition (line 92) | def definition(self, *, data: bool = False) -> str | ILIDefinition | N...
class ILI (line 115) | class ILI(ILIProtocol):
class ProposedILI (line 133) | class ProposedILI(LexiconElementWithMetadata, ILIProtocol):
method id (line 146) | def id(self) -> Literal[None]:
method status (line 156) | def status(self) -> Literal[ILIStatus.PROPOSED]:
method synset (line 164) | def synset(self) -> Synset:
function get (line 169) | def get(id: str) -> ILI | None:
function get_all (line 198) | def get_all(
function get_proposed (line 232) | def get_proposed(synset: Synset) -> ProposedILI | None:
function get_all_proposed (line 262) | def get_all_proposed(lexicon: str | None = None) -> list[ProposedILI]:
function is_ili_tsv (line 282) | def is_ili_tsv(source: AnyPath) -> bool:
function load_tsv (line 300) | def load_tsv(source: AnyPath) -> Iterator[dict[str, str]]:
FILE: wn/lmf.py
class LMFError (line 19) | class LMFError(Error):
class LMFWarning (line 23) | class LMFWarning(Warning):
class _HasId (line 171) | class _HasId(TypedDict):
class _HasILI (line 175) | class _HasILI(TypedDict):
class _HasSynset (line 179) | class _HasSynset(TypedDict):
class _MaybeId (line 183) | class _MaybeId(TypedDict, total=False):
class _HasText (line 187) | class _HasText(TypedDict):
class _MaybeScript (line 191) | class _MaybeScript(TypedDict, total=False):
class _HasMeta (line 195) | class _HasMeta(TypedDict, total=False):
class _External (line 199) | class _External(TypedDict):
class ILIDefinition (line 203) | class ILIDefinition(_HasText, _HasMeta): ...
class Definition (line 206) | class Definition(_HasText, _HasMeta, total=False):
class Relation (line 211) | class Relation(_HasMeta):
class Example (line 216) | class Example(_HasText, _HasMeta, total=False):
class Synset (line 220) | class Synset(_HasId, _HasILI, _HasMeta, total=False):
class ExternalSynset (line 231) | class ExternalSynset(_HasId, _External, total=False):
class Count (line 237) | class Count(_HasMeta):
class Sense (line 241) | class Sense(_HasId, _HasSynset, _HasMeta, total=False):
class ExternalSense (line 251) | class ExternalSense(_HasId, _External, total=False):
class Pronunciation (line 257) | class Pronunciation(_HasText, total=False):
class Tag (line 264) | class Tag(_HasText):
class _FormChildren (line 268) | class _FormChildren(TypedDict, total=False):
class Lemma (line 273) | class Lemma(_MaybeScript, _FormChildren):
class ExternalLemma (line 278) | class ExternalLemma(_FormChildren, _External): ...
class Form (line 281) | class Form(_MaybeId, _MaybeScript, _FormChildren):
class ExternalForm (line 285) | class ExternalForm(_HasId, _FormChildren, _External): ...
class _SyntacticBehaviourBase (line 288) | class _SyntacticBehaviourBase(_MaybeId):
class SyntacticBehaviour (line 292) | class SyntacticBehaviour(_SyntacticBehaviourBase, total=False):
class _LexicalEntryBase (line 296) | class _LexicalEntryBase(_HasId, _HasMeta, total=False):
class LexicalEntry (line 303) | class LexicalEntry(_LexicalEntryBase):
class ExternalLexicalEntry (line 307) | class ExternalLexicalEntry(_HasId, _External, total=False):
class LexiconSpecifier (line 313) | class LexiconSpecifier(_HasId): # public but not an LMF entry
class Dependency (line 317) | class Dependency(LexiconSpecifier, total=False):
class _LexiconRequired (line 321) | class _LexiconRequired(LexiconSpecifier, _HasMeta):
class _LexiconBase (line 328) | class _LexiconBase(_LexiconRequired, total=False):
class Lexicon (line 334) | class Lexicon(_LexiconBase, total=False):
class _LexiconExtensionBase (line 341) | class _LexiconExtensionBase(_LexiconBase):
class LexiconExtension (line 345) | class LexiconExtension(_LexiconExtensionBase, total=False):
class LexicalResource (line 352) | class LexicalResource(TypedDict):
function is_lmf (line 360) | def is_lmf(source: AnyPath) -> bool:
function _read_header (line 373) | def _read_header(fh: BinaryIO) -> str:
class ScanInfo (line 389) | class ScanInfo(LexiconSpecifier):
function scan_lexicons (line 394) | def scan_lexicons(source: AnyPath) -> list[ScanInfo]:
function load (line 438) | def load(
function _quick_scan (line 476) | def _quick_scan(source: Path) -> tuple[str, int]:
function _make_parser (line 485) | def _make_parser(root, version, progress): # noqa: C901
function _unexpected (line 540) | def _unexpected(name: str, p: xml.parsers.expat.XMLParserType) -> LMFError:
function _validate (line 547) | def _validate(elem: _Elem) -> Lexicon | LexiconExtension:
function _validate_lexicon (line 561) | def _validate_lexicon(elem: _Elem, extension: bool) -> None:
function _validate_entries (line 574) | def _validate_entries(elems: list[_Elem], extension: bool) -> None:
function _validate_forms (line 593) | def _validate_forms(elems: list[_Elem], extension: bool) -> None:
function _validate_senses (line 608) | def _validate_senses(elems: list[_Elem], extension: bool) -> None:
function _validate_frames (line 635) | def _validate_frames(elems: list[_Elem]) -> None:
function _validate_synsets (line 642) | def _validate_synsets(elems: list[_Elem], extension: bool) -> None:
function _validate_metadata (line 666) | def _validate_metadata(elem: _Elem) -> None:
function dump (line 674) | def dump(resource: LexicalResource, destination: AnyPath) -> None:
function _dump_lexicon (line 696) | def _dump_lexicon(
function _build_lexicon_attrib (line 728) | def _build_lexicon_attrib(
function _dump_dependency (line 749) | def _dump_dependency(
function _dump_lexical_entry (line 760) | def _dump_lexical_entry(
function _build_lemma (line 790) | def _build_lemma(lemma: Lemma | ExternalLemma, version: VersionInfo) -> ...
function _build_form (line 808) | def _build_form(form: Form | ExternalForm, version: VersionInfo) -> ET.E...
function _build_pronunciation (line 828) | def _build_pronunciation(pron: Pronunciation) -> ET.Element:
function _build_tag (line 843) | def _build_tag(tag: Tag) -> ET.Element:
function _build_sense (line 849) | def _build_sense(
function _build_example (line 877) | def _build_example(example: Example) -> ET.Element:
function _build_count (line 887) | def _build_count(count: Count) -> ET.Element:
function _dump_synset (line 893) | def _dump_synset(
function _build_definition (line 924) | def _build_definition(definition: Definition) -> ET.Element:
function _build_ili_definition (line 936) | def _build_ili_definition(ili_definition: ILIDefinition) -> ET.Element:
function _build_relation (line 942) | def _build_relation(relation: Relation, elemtype: str) -> ET.Element:
function _dump_syntactic_behaviour (line 948) | def _dump_syntactic_behaviour(
function _build_syntactic_behaviour (line 955) | def _build_syntactic_behaviour(
function _tostring (line 966) | def _tostring(elem: ET.Element, level: int, short_empty_elements: bool =...
function _indent (line 973) | def _indent(elem: ET.Element, level: int) -> None:
function _meta_dict (line 986) | def _meta_dict(meta: Metadata | None) -> dict[str, str]:
FILE: wn/metrics.py
function ambiguity (line 6) | def ambiguity(word: Word) -> int:
function average_ambiguity (line 10) | def average_ambiguity(synset: Synset) -> float:
FILE: wn/morphy.py
class _System (line 14) | class _System(Flag):
class Morphy (line 67) | class Morphy:
method __init__ (line 95) | def __init__(self, wordnet: wn.Wordnet | None = None):
method __call__ (line 121) | def __call__(self, form: str, pos: str | None = None) -> LemmatizeResult:
method _morphstr (line 141) | def _morphstr(self, form: str, pos: str) -> set[str]:
FILE: wn/project.py
function is_package_directory (line 22) | def is_package_directory(path: AnyPath) -> bool:
function _package_directory_types (line 28) | def _package_directory_types(path: Path) -> list[tuple[Path, str]]:
function _resource_file_type (line 38) | def _resource_file_type(path: Path) -> str | None:
function is_collection_directory (line 46) | def is_collection_directory(path: AnyPath) -> bool:
class Project (line 54) | class Project:
method __init__ (line 59) | def __init__(self, path: AnyPath):
method path (line 63) | def path(self) -> Path:
method readme (line 73) | def readme(self) -> Path | None:
method license (line 77) | def license(self) -> Path | None:
method citation (line 81) | def citation(self) -> Path | None:
method _find_file (line 85) | def _find_file(self, base: Path, suffixes: tuple[str, ...]) -> Path | ...
class Package (line 93) | class Package(Project):
method type (line 102) | def type(self) -> str | None:
method resource_file (line 112) | def resource_file(self) -> Path:
class ResourceOnlyPackage (line 122) | class ResourceOnlyPackage(Package):
method resource_file (line 132) | def resource_file(self) -> Path:
method readme (line 135) | def readme(self):
method license (line 138) | def license(self):
method citation (line 141) | def citation(self):
class Collection (line 145) | class Collection(Project):
method packages (line 152) | def packages(self) -> list[Package]:
function get_project (line 159) | def get_project(
function _get_project_from_path (line 199) | def _get_project_from_path(
function iterpackages (line 237) | def iterpackages(path: AnyPath, delete: bool = True) -> Iterator[Package]:
function _get_decompressed (line 282) | def _get_decompressed(
function _check_tar (line 312) | def _check_tar(tar: tarfile.TarFile) -> None:
FILE: wn/schema.sql
type ilis (line 4) | CREATE TABLE ilis (
type ili_id_index (line 12) | CREATE INDEX ili_id_index ON ilis (id)
type proposed_ilis (line 14) | CREATE TABLE proposed_ilis (
type proposed_ili_synset_rowid_index (line 21) | CREATE INDEX proposed_ili_synset_rowid_index ON proposed_ilis (synset_ro...
type lexicons (line 26) | CREATE TABLE lexicons (
type lexicon_specifier_index (line 43) | CREATE INDEX lexicon_specifier_index ON lexicons (specifier)
type lexicon_dependencies (line 45) | CREATE TABLE lexicon_dependencies (
type lexicon_dependent_index (line 52) | CREATE INDEX lexicon_dependent_index ON lexicon_dependencies(dependent_r...
type lexicon_extensions (line 54) | CREATE TABLE lexicon_extensions (
type lexicon_extension_index (line 62) | CREATE INDEX lexicon_extension_index ON lexicon_extensions(extension_rowid)
type entry_index (line 67) | CREATE TABLE entry_index (
type entry_index_entry_index (line 72) | CREATE INDEX entry_index_entry_index ON entry_index(entry_rowid)
type entry_index_lemma_index (line 73) | CREATE INDEX entry_index_lemma_index ON entry_index(lemma)
type entries (line 79) | CREATE TABLE entries (
type entry_id_index (line 87) | CREATE INDEX entry_id_index ON entries (id)
type forms (line 89) | CREATE TABLE forms (
type form_entry_index (line 100) | CREATE INDEX form_entry_index ON forms (entry_rowid)
type form_index (line 101) | CREATE INDEX form_index ON forms (form)
type form_norm_index (line 102) | CREATE INDEX form_norm_index ON forms (normalized_form)
type pronunciations (line 104) | CREATE TABLE pronunciations (
type pronunciation_form_index (line 113) | CREATE INDEX pronunciation_form_index ON pronunciations (form_rowid)
type tags (line 115) | CREATE TABLE tags (
type tag_form_index (line 121) | CREATE INDEX tag_form_index ON tags (form_rowid)
type synsets (line 126) | CREATE TABLE synsets (
type synset_id_index (line 135) | CREATE INDEX synset_id_index ON synsets (id)
type synset_ili_rowid_index (line 136) | CREATE INDEX synset_ili_rowid_index ON synsets (ili_rowid)
type unlexicalized_synsets (line 138) | CREATE TABLE unlexicalized_synsets (
type unlexicalized_synsets_index (line 141) | CREATE INDEX unlexicalized_synsets_index ON unlexicalized_synsets (synse...
type synset_relations (line 143) | CREATE TABLE synset_relations (
type synset_relation_source_index (line 151) | CREATE INDEX synset_relation_source_index ON synset_relations (source_ro...
type synset_relation_target_index (line 152) | CREATE INDEX synset_relation_target_index ON synset_relations (target_ro...
type definitions (line 154) | CREATE TABLE definitions (
type definition_rowid_index (line 163) | CREATE INDEX definition_rowid_index ON definitions (synset_rowid)
type definition_sense_index (line 164) | CREATE INDEX definition_sense_index ON definitions (sense_rowid)
type synset_examples (line 166) | CREATE TABLE synset_examples (
type synset_example_rowid_index (line 174) | CREATE INDEX synset_example_rowid_index ON synset_examples(synset_rowid)
type senses (line 179) | CREATE TABLE senses (
type sense_id_index (line 189) | CREATE INDEX sense_id_index ON senses(id)
type sense_entry_rowid_index (line 190) | CREATE INDEX sense_entry_rowid_index ON senses (entry_rowid)
type sense_synset_rowid_index (line 191) | CREATE INDEX sense_synset_rowid_index ON senses (synset_rowid)
type unlexicalized_senses (line 193) | CREATE TABLE unlexicalized_senses (
type unlexicalized_senses_index (line 196) | CREATE INDEX unlexicalized_senses_index ON unlexicalized_senses (sense_r...
type sense_relations (line 198) | CREATE TABLE sense_relations (
type sense_relation_source_index (line 206) | CREATE INDEX sense_relation_source_index ON sense_relations (source_rowid)
type sense_relation_target_index (line 207) | CREATE INDEX sense_relation_target_index ON sense_relations (target_rowid)
type sense_synset_relations (line 209) | CREATE TABLE sense_synset_relations (
type sense_synset_relation_source_index (line 217) | CREATE INDEX sense_synset_relation_source_index ON sense_synset_relation...
type sense_synset_relation_target_index (line 218) | CREATE INDEX sense_synset_relation_target_index ON sense_synset_relation...
type adjpositions (line 220) | CREATE TABLE adjpositions (
type adjposition_sense_index (line 224) | CREATE INDEX adjposition_sense_index ON adjpositions (sense_rowid)
type sense_examples (line 226) | CREATE TABLE sense_examples (
type sense_example_index (line 234) | CREATE INDEX sense_example_index ON sense_examples (sense_rowid)
type counts (line 236) | CREATE TABLE counts (
type count_index (line 243) | CREATE INDEX count_index ON counts(sense_rowid)
type syntactic_behaviours (line 248) | CREATE TABLE syntactic_behaviours (
type syntactic_behaviour_id_index (line 256) | CREATE INDEX syntactic_behaviour_id_index ON syntactic_behaviours (id)
type syntactic_behaviour_senses (line 258) | CREATE TABLE syntactic_behaviour_senses (
type syntactic_behaviour_sense_sb_index (line 262) | CREATE INDEX syntactic_behaviour_sense_sb_index
type syntactic_behaviour_sense_sense_index (line 264) | CREATE INDEX syntactic_behaviour_sense_sense_index
type relation_types (line 270) | CREATE TABLE relation_types (
type relation_type_index (line 275) | CREATE INDEX relation_type_index ON relation_types (type)
type ili_statuses (line 277) | CREATE TABLE ili_statuses (
type ili_status_index (line 282) | CREATE INDEX ili_status_index ON ili_statuses (status)
type lexfiles (line 284) | CREATE TABLE lexfiles (
type lexfile_index (line 289) | CREATE INDEX lexfile_index ON lexfiles (name)
FILE: wn/similarity.py
function path (line 11) | def path(synset1: Synset, synset2: Synset, simulate_root: bool = False) ...
function wup (line 49) | def wup(synset1: Synset, synset2: Synset, simulate_root=False) -> float:
function lch (line 87) | def lch(
function res (line 125) | def res(synset1: Synset, synset2: Synset, ic: Freq) -> float:
function jcn (line 150) | def jcn(synset1: Synset, synset2: Synset, ic: Freq) -> float:
function lin (line 183) | def lin(synset1: Synset, synset2: Synset, ic: Freq) -> float:
function _least_common_subsumers (line 215) | def _least_common_subsumers(
function _most_informative_lcs (line 224) | def _most_informative_lcs(synset1: Synset, synset2: Synset, ic: Freq) ->...
function _check_if_pos_compatible (line 230) | def _check_if_pos_compatible(pos1: str, pos2: str) -> None:
FILE: wn/taxonomy.py
function roots (line 12) | def roots(wordnet: wn.Wordnet, pos: str | None = None) -> list[wn.Synset]:
function leaves (line 34) | def leaves(wordnet: wn.Wordnet, pos: str | None = None) -> list[wn.Synset]:
function taxonomy_depth (line 55) | def taxonomy_depth(wordnet: wn.Wordnet, pos: str) -> int:
function _synsets_for_pos (line 86) | def _synsets_for_pos(wordnet: wn.Wordnet, pos: str | None) -> list[wn.Sy...
function _hypernym_paths (line 99) | def _hypernym_paths(
function hypernym_paths (line 115) | def hypernym_paths(
function min_depth (line 161) | def min_depth(synset: wn.Synset, simulate_root: bool = False) -> int:
function max_depth (line 185) | def max_depth(synset: wn.Synset, simulate_root: bool = False) -> int:
function _shortest_hyp_paths (line 209) | def _shortest_hyp_paths(
function shortest_path (line 253) | def shortest_path(
function common_hypernyms (line 286) | def common_hypernyms(
function lowest_common_hypernyms (line 324) | def lowest_common_hypernyms(
FILE: wn/util.py
function synset_id_formatter (line 8) | def synset_id_formatter(fmt: str = "{prefix}-{offset:08}-{pos}", **kwarg...
class ProgressHandler (line 36) | class ProgressHandler:
method __init__ (line 57) | def __init__(
method update (line 79) | def update(self, n: int = 1, force: bool = False) -> None:
method set (line 92) | def set(self, **kwargs) -> None:
method flash (line 103) | def flash(self, message: str) -> None:
method close (line 112) | def close(self) -> None:
class ProgressBar (line 122) | class ProgressBar(ProgressHandler):
method update (line 138) | def update(self, n: int = 1, force: bool = False) -> None:
method format (line 149) | def format(self) -> str:
method flash (line 187) | def flash(self, message: str) -> None:
method close (line 191) | def close(self) -> None:
FILE: wn/validate.py
class _Check (line 54) | class _Check(TypedDict):
function _non_unique_id (line 62) | def _non_unique_id(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _has_no_senses (line 76) | def _has_no_senses(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _redundant_sense (line 81) | def _redundant_sense(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _redundant_entry (line 94) | def _redundant_entry(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _missing_synset (line 104) | def _missing_synset(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _empty_synset (line 115) | def _empty_synset(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _repeated_ili (line 121) | def _repeated_ili(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _missing_ili_definition (line 131) | def _missing_ili_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _spurious_ili_definition (line 140) | def _spurious_ili_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _blank_synset_definition (line 149) | def _blank_synset_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _blank_synset_example (line 158) | def _blank_synset_example(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _repeated_synset_definition (line 167) | def _repeated_synset_definition(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _missing_relation_target (line 179) | def _missing_relation_target(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _invalid_relation_type (line 194) | def _invalid_relation_type(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _redundant_relation (line 210) | def _redundant_relation(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _missing_reverse_relation (line 230) | def _missing_reverse_relation(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _hypernym_wrong_pos (line 248) | def _hypernym_wrong_pos(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _self_loop (line 258) | def _self_loop(lex: lmf.Lexicon, ids: _Ids) -> _Result:
function _multiples (line 271) | def _multiples(iterable):
function _entries (line 276) | def _entries(lex: lmf.Lexicon) -> list[lmf.LexicalEntry]:
function _forms (line 280) | def _forms(e: lmf.LexicalEntry) -> list[lmf.Form]:
function _senses (line 284) | def _senses(e: lmf.LexicalEntry) -> list[lmf.Sense]:
function _synsets (line 288) | def _synsets(lex: lmf.Lexicon) -> list[lmf.Synset]:
function _sense_relations (line 292) | def _sense_relations(lex: lmf.Lexicon) -> Iterator[tuple[lmf.Sense, lmf....
function _synset_relations (line 299) | def _synset_relations(lex: lmf.Lexicon) -> Iterator[tuple[lmf.Synset, lm...
function _get_dc_type (line 305) | def _get_dc_type(r: lmf.Relation) -> str | None:
function _select_checks (line 348) | def _select_checks(select: Sequence[str]) -> list[tuple[str, _CheckFunct...
function validate (line 360) | def validate(
Condensed preview — 118 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (709K chars).
[
{
"path": ".github/ISSUE_TEMPLATE/bug_report.md",
"chars": 1098,
"preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the "
},
{
"path": ".github/ISSUE_TEMPLATE/data-issue.md",
"chars": 1117,
"preview": "---\nname: Data issue\nabout: Report an issue Wn's data index\ntitle: ''\nlabels: data\nassignees: ''\n\n---\n\n**If your issue i"
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.md",
"chars": 604,
"preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: enhancement\nassignees: ''\n\n---\n\n**Is"
},
{
"path": ".github/workflows/checks.yml",
"chars": 961,
"preview": "name: tests\n\non:\n push:\n branches: [main]\n pull_request:\n branches: [main]\n\njobs:\n lint:\n runs-on: ubuntu-la"
},
{
"path": ".github/workflows/publish.yml",
"chars": 1800,
"preview": "name: Build and Publish to PyPI or TestPyPI\n\non: push\n\njobs:\n build:\n name: Build distribution\n runs-on: ubuntu-l"
},
{
"path": ".gitignore",
"chars": 795,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": "CHANGELOG.md",
"chars": 30562,
"preview": "# Change Log\n\n## [Unreleased][unreleased]\n\n\n## [v1.1.0]\n\n**Release date: 2026-03-21**\n\n### Added\n\n* `cache` subcommand ("
},
{
"path": "CITATION.cff",
"chars": 1204,
"preview": "cff-version: 1.2.0\ntitle: Wn\nmessage: >-\n Please cite this software using the metadata from\n 'preferred-citation'.\ntyp"
},
{
"path": "CONTRIBUTING.md",
"chars": 3207,
"preview": "# Contributing to Wn\n\nThanks for helping to make Wn better!\n\n**Quick Links:**\n\n- [Report a bug or request a features](ht"
},
{
"path": "LICENSE",
"chars": 1078,
"preview": "MIT License\n\nCopyright (c) 2020 Michael Wayne Goodman\n\nPermission is hereby granted, free of charge, to any person obtai"
},
{
"path": "README.md",
"chars": 14760,
"preview": "\n\n<p align=\"center\">\n <img src=\"https://raw.githubusercontent.com/goodmami/wn/main/docs/_static/wn-logo.svg\" alt=\"Wn lo"
},
{
"path": "bench/README.md",
"chars": 1172,
"preview": "# Wn Benchmarking\n\nThis directory contains code and data for running benchmarks for\nWn. The benchmarks are implemented u"
},
{
"path": "bench/conftest.py",
"chars": 4476,
"preview": "from collections.abc import Iterator\nfrom itertools import cycle, product\nfrom pathlib import Path\n\nimport pytest\n\nimpor"
},
{
"path": "bench/test_bench.py",
"chars": 1734,
"preview": "import pytest\n\nimport wn\nfrom wn import lmf\n\n\n@pytest.mark.benchmark(group=\"lmf.load\", warmup=True)\ndef test_load(datadi"
},
{
"path": "docs/.readthedocs.yaml",
"chars": 600,
"preview": "# .readthedocs.yaml\n# Read the Docs configuration file\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html f"
},
{
"path": "docs/Makefile",
"chars": 634,
"preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
},
{
"path": "docs/_static/css/svg.css",
"chars": 205,
"preview": "svg {\n width: 500px;\n height: 300px;\n\t\n position: relative;\n left: 20%;\n -webkit-transform: translateX(-2"
},
{
"path": "docs/_static/demo.ipynb",
"chars": 23057,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {},\n \"source\": [\n \":\n assert "
},
{
"path": "tests/_util_test.py",
"chars": 1812,
"preview": "from wn._util import (\n flatten,\n format_lexicon_specifier,\n normalize_form,\n split_lexicon_specifier,\n u"
},
{
"path": "tests/compat_sensekey_test.py",
"chars": 6296,
"preview": "import pytest\n\nimport wn\nfrom wn.compat import sensekey\n\n\ndef test_unescape_oewn_sense_key():\n def unescape(s: str) -"
},
{
"path": "tests/conftest.py",
"chars": 2685,
"preview": "import lzma\nfrom pathlib import Path\n\nimport pytest\n\nimport wn\n\n\n@pytest.fixture(scope=\"session\")\ndef datadir():\n ret"
},
{
"path": "tests/data/E101-0.xml",
"chars": 893,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/E101-1.xml",
"chars": 835,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/E101-2.xml",
"chars": 779,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/E101-3.xml",
"chars": 729,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/README.md",
"chars": 98,
"preview": "# Testing Data Directory\n\nThis directory is used to store data files used by the testing system.\n\n"
},
{
"path": "tests/data/W305-0.xml",
"chars": 777,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/W306-0.xml",
"chars": 768,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/W307-0.xml",
"chars": 945,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/mini-ili-with-status.tsv",
"chars": 137,
"preview": "ILI\tDefinition\tStatus\ni1\ti1 definition\tactive\ni2\t\tdeprecated\ni67447\tknowledge acquired through study or experience or in"
},
{
"path": "tests/data/mini-ili.tsv",
"chars": 104,
"preview": "ILI\tDefinition\ni1\ti1 definition\ni2\ni67447\tknowledge acquired through study or experience or instruction\n"
},
{
"path": "tests/data/mini-lmf-1.0.xml",
"chars": 9188,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/mini-lmf-1.1.xml",
"chars": 5739,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/mini-lmf-1.3.xml",
"chars": 1616,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/mini-lmf-1.4.xml",
"chars": 3199,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/sense-key-variations.xml",
"chars": 1517,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/sense-key-variations2.xml",
"chars": 785,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/sense-member-order.xml",
"chars": 1172,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/data/test-package/LICENSE",
"chars": 13,
"preview": "Test License\n"
},
{
"path": "tests/data/test-package/README.md",
"chars": 14,
"preview": "# Test README\n"
},
{
"path": "tests/data/test-package/citation.bib",
"chars": 11,
"preview": "% test bib\n"
},
{
"path": "tests/data/test-package/test-wn.xml",
"chars": 168,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE LexicalResource SYSTEM \"http://globalwordnet.github.io/schemas/WN-LMF-1"
},
{
"path": "tests/db_test.py",
"chars": 2629,
"preview": "import sqlite3\nimport threading\n\nimport pytest\n\nimport wn\nfrom wn import lmf\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef t"
},
{
"path": "tests/export_test.py",
"chars": 2090,
"preview": "from xml.etree import ElementTree as ET\n\nimport pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_export("
},
{
"path": "tests/ic_test.py",
"chars": 3837,
"preview": "from math import log\n\nimport pytest\n\nimport wn\nimport wn.ic\nfrom wn.constants import ADJ, ADV, NOUN, VERB\nfrom wn.util i"
},
{
"path": "tests/ili_test.py",
"chars": 2691,
"preview": "from pathlib import Path\n\nimport pytest\n\nimport wn\nfrom wn import ili\n\nI67447_DEFN = \"knowledge acquired through study o"
},
{
"path": "tests/lmf_test.py",
"chars": 5994,
"preview": "from xml.etree import ElementTree as ET\n\nfrom wn import lmf\n\n\ndef test_is_lmf(datadir):\n assert lmf.is_lmf(datadir / "
},
{
"path": "tests/morphy_test.py",
"chars": 2084,
"preview": "import pytest\n\nimport wn\nfrom wn import morphy\n\n\ndef test_morphy_uninitialized():\n # An unintialized Morphy isn't ver"
},
{
"path": "tests/primary_query_test.py",
"chars": 13586,
"preview": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"uninitialized_datadir\")\ndef test_lexicons_uninitialized():\n asse"
},
{
"path": "tests/project_test.py",
"chars": 2003,
"preview": "from wn import project\n\n\ndef test_is_package_directory(datadir):\n assert project.is_package_directory(datadir / \"test"
},
{
"path": "tests/relations_test.py",
"chars": 5913,
"preview": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_derived_words():\n assert len(wn.word(\"te"
},
{
"path": "tests/secondary_query_test.py",
"chars": 9497,
"preview": "import pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db\")\ndef test_word_senses():\n assert len(wn.word(\"test-en-i"
},
{
"path": "tests/similarity_test.py",
"chars": 6938,
"preview": "from math import log\n\nimport pytest\n\nimport wn\nfrom wn import similarity as sim\nfrom wn.ic import information_content as"
},
{
"path": "tests/taxonomy_test.py",
"chars": 4067,
"preview": "import pytest\n\nimport wn\nfrom wn.taxonomy import (\n hypernym_paths,\n leaves,\n max_depth,\n min_depth,\n roo"
},
{
"path": "tests/util_test.py",
"chars": 360,
"preview": "from wn import util\n\n\ndef test_synset_id_formatter():\n f = util.synset_id_formatter\n assert f()(prefix=\"xyz\", offs"
},
{
"path": "tests/validate_test.py",
"chars": 592,
"preview": "import pytest\n\nfrom wn import lmf\nfrom wn.validate import validate\n\ntests = [\n (\"E101\", 0),\n (\"E101\", 1),\n (\"E1"
},
{
"path": "tests/wordnet_test.py",
"chars": 3340,
"preview": "from pathlib import Path\n\nimport pytest\n\nimport wn\n\n\n@pytest.mark.usefixtures(\"mini_db_1_1\")\ndef test_wordnet_lexicons()"
},
{
"path": "wn/__init__.py",
"chars": 1248,
"preview": "\"\"\"\nWordnet Interface.\n\"\"\"\n\n__all__ = (\n \"ConfigurationError\",\n \"Count\",\n \"DatabaseError\",\n \"Definition\",\n "
},
{
"path": "wn/__main__.py",
"chars": 6290,
"preview": "import argparse\nimport json\nimport logging\nimport sys\nfrom pathlib import Path\n\nimport wn\nfrom wn import lmf\nfrom wn._ut"
},
{
"path": "wn/_add.py",
"chars": 37615,
"preview": "\"\"\"\nAdding and removing lexicons to/from the database.\n\"\"\"\n\nimport logging\nimport sqlite3\nfrom collections.abc import It"
},
{
"path": "wn/_config.py",
"chars": 13640,
"preview": "\"\"\"\nLocal configuration settings.\n\"\"\"\n\nimport os\nfrom collections.abc import Sequence\nfrom enum import Enum\nfrom fnmatch"
},
{
"path": "wn/_core.py",
"chars": 40676,
"preview": "from __future__ import annotations\n\nimport enum\nfrom dataclasses import dataclass, field\nfrom typing import TYPE_CHECKIN"
},
{
"path": "wn/_db.py",
"chars": 4843,
"preview": "\"\"\"\nStorage back-end interface.\n\"\"\"\n\nimport json\nimport logging\nimport sqlite3\nfrom importlib import resources\nfrom path"
},
{
"path": "wn/_download.py",
"chars": 4819,
"preview": "import logging\nfrom collections.abc import Sequence\nfrom pathlib import Path\n\nimport httpx\n\nfrom wn._add import add as a"
},
{
"path": "wn/_exceptions.py",
"chars": 675,
"preview": "class Error(Exception):\n \"\"\"Generic error class for invalid wordnet operations.\"\"\"\n\n # reset the module so the use"
},
{
"path": "wn/_export.py",
"chars": 21309,
"preview": "from collections.abc import Iterator, Sequence\nfrom typing import Literal, NamedTuple, overload\n\nfrom wn import lmf\nfrom"
},
{
"path": "wn/_lexicon.py",
"chars": 5759,
"preview": "from __future__ import annotations\n\nfrom dataclasses import dataclass, field\nfrom typing import TYPE_CHECKING, NamedTupl"
},
{
"path": "wn/_metadata.py",
"chars": 960,
"preview": "from typing import Protocol, TypedDict\n\n\nclass Metadata(TypedDict, total=False):\n # For these, see https://globalword"
},
{
"path": "wn/_module_functions.py",
"chars": 7239,
"preview": "from typing import Literal, overload\n\nfrom wn._config import ResolvedProjectInfo, config\nfrom wn._core import Form, Sens"
},
{
"path": "wn/_queries.py",
"chars": 38129,
"preview": "\"\"\"\nDatabase retrieval queries.\n\"\"\"\n\nimport itertools\nfrom collections.abc import Collection, Iterator, Sequence\nfrom ty"
},
{
"path": "wn/_types.py",
"chars": 1052,
"preview": "from collections.abc import Callable, Mapping, Sequence\nfrom pathlib import Path\nfrom typing import Any, TypeAlias\n\n# Fo"
},
{
"path": "wn/_util.py",
"chars": 2206,
"preview": "\"\"\"Non-public Wn utilities.\"\"\"\n\nimport hashlib\nfrom collections.abc import Hashable, Iterable\nfrom pathlib import Path\nf"
},
{
"path": "wn/_wordnet.py",
"chars": 16157,
"preview": "import textwrap\nimport warnings\nfrom collections.abc import Callable, Iterator, Sequence\nfrom typing import Literal, Typ"
},
{
"path": "wn/compat/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "wn/compat/sensekey.py",
"chars": 10163,
"preview": "\"\"\"Functions Related to Sense Keys\n\nSense keys are identifiers of senses that (mostly) persist across\nwordnet versions. "
},
{
"path": "wn/constants.py",
"chars": 8645,
"preview": "\"\"\"\nConstants and literals used in wordnets.\n\"\"\"\n\nSENSE_RELATIONS = frozenset(\n [\n \"antonym\",\n \"also\",\n"
},
{
"path": "wn/ic.py",
"chars": 6647,
"preview": "\"\"\"Information Content is a corpus-based metrics of synset or sense\nspecificity.\n\n\"\"\"\n\nfrom collections import Counter\nf"
},
{
"path": "wn/ili.py",
"chars": 9677,
"preview": "\"\"\"Interlingual Indices\n\nThis module provides classes and functions for inspecting Interlingual\nIndex (ILI) objects, bot"
},
{
"path": "wn/index.toml",
"chars": 21020,
"preview": "[cili]\n type = \"ili\"\n label = \"Collaborative Interlingual Index\"\n license = \"https://creativecommons.org/licenses/by/"
},
{
"path": "wn/lmf.py",
"chars": 31372,
"preview": "\"\"\"\nReader for the Lexical Markup Framework (LMF) format.\n\"\"\"\n\nimport re\nimport xml.etree.ElementTree as ET # for gener"
},
{
"path": "wn/metrics.py",
"chars": 267,
"preview": "from wn._core import Synset, Word\n\n# Word-based Metrics\n\n\ndef ambiguity(word: Word) -> int:\n return len(word.synsets("
},
{
"path": "wn/morphy.py",
"chars": 4885,
"preview": "\"\"\"A simple English lemmatizer that finds and removes known suffixes.\"\"\"\n\nfrom enum import Flag, auto\nfrom typing import"
},
{
"path": "wn/project.py",
"chars": 10701,
"preview": "\"\"\"\nWordnet and ILI Packages and Collections\n\"\"\"\n\nimport gzip\nimport lzma\nimport shutil\nimport tarfile\nimport tempfile\nf"
},
{
"path": "wn/py.typed",
"chars": 1,
"preview": "\n"
},
{
"path": "wn/schema.sql",
"chars": 10017,
"preview": "\n-- ILI : Interlingual Index\n\nCREATE TABLE ilis (\n rowid INTEGER PRIMARY KEY,\n id TEXT NOT NULL,\n status_rowid "
},
{
"path": "wn/similarity.py",
"chars": 8072,
"preview": "\"\"\"Synset similarity metrics.\"\"\"\n\nimport math\n\nimport wn\nfrom wn._core import Synset\nfrom wn.constants import ADJ, ADJ_S"
},
{
"path": "wn/taxonomy.py",
"chars": 11279,
"preview": "\"\"\"Functions for working with hypernym/hyponym taxonomies.\"\"\"\n\nfrom __future__ import annotations\n\nimport wn\nfrom wn._ut"
},
{
"path": "wn/util.py",
"chars": 6378,
"preview": "\"\"\"Wn utility classes.\"\"\"\n\nimport sys\nfrom collections.abc import Callable\nfrom typing import TextIO\n\n\ndef synset_id_for"
},
{
"path": "wn/validate.py",
"chars": 12310,
"preview": "\"\"\"Wordnet lexicon validation.\n\nThis module is for checking whether the the contents of a lexicon are\nvalid according to"
}
]
About this extraction
This page contains the full source code of the goodmami/wn GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 118 files (651.7 KB), approximately 184.3k tokens, and a symbol index with 784 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.