Repository: chbrown/liwc-python
Branch: master
Commit: b8e158c84b2a
Files: 12
Total size: 9.9 KB
Directory structure:
gitextract__4myuiyb/
├── .github/
│ └── ISSUE_TEMPLATE.md
├── .gitignore
├── .travis.yml
├── LICENSE.txt
├── README.md
├── liwc/
│ ├── __init__.py
│ ├── dic.py
│ └── trie.py
├── setup.cfg
├── setup.py
└── test/
├── alpha.dic
└── test_alpha_dic.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/ISSUE_TEMPLATE.md
================================================
Please do not open an issue with the intent of subverting encryption implemented by the LIWC developers.
If the version of LIWC that you purchased (or otherwise legitimately obtained as a researcher at an academic institution) does not provide a machine-readable `*.dic` file, please contact the distributor directly.
================================================
FILE: .gitignore
================================================
build/
dist/
================================================
FILE: .travis.yml
================================================
language: python
python:
- "2.7"
- "3.6"
script:
- python setup.py test
================================================
FILE: LICENSE.txt
================================================
Copyright © 2012-2020 Christopher Brown <io@henrian.com>
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: README.md
================================================
# `liwc`
[](https://pypi.org/project/liwc/)
[](https://travis-ci.org/chbrown/liwc-python)
This repository is a Python package implementing two basic functions:
1. Loading (parsing) a Linguistic Inquiry and Word Count (LIWC) dictionary from the `.dic` file format.
2. Using that dictionary to count category matches on provided texts.
This is not an official LIWC product nor is it in any way affiliated with the LIWC development team or Receptiviti.
## Obtaining LIWC
The LIWC lexicon is proprietary, so it is _not_ included in this repository.
The lexicon data can be acquired (purchased) from [liwc.net](http://liwc.net/).
* If you are a researcher at an academic institution, please contact [Dr. James W. Pennebaker](https://liberalarts.utexas.edu/psychology/faculty/pennebak) directly.
* For commercial use, contact [Receptiviti](https://www.receptiviti.com/), which is the company that holds exclusive commercial license.
Finally, please do not open an issue in this repository with the intent of subverting encryption implemented by the LIWC developers.
If the version of LIWC that you purchased (or otherwise legitimately obtained as a researcher at an academic institution) does not provide a machine-readable `*.dic` file, please contact the distributor directly.
## Setup
Install from [PyPI](https://pypi.python.org/pypi/liwc):
pip install liwc
## Example
This example reads the LIWC dictionary from a file named `LIWC2007_English100131.dic`, which looks like this:
%
1 funct
2 pronoun
[...]
%
a 1 10
abdomen* 146 147
about 1 16 17
[...]
#### Loading the lexicon
```python
import liwc
parse, category_names = liwc.load_token_parser('LIWC2007_English100131.dic')
```
* `parse` is a function from a token of text (a string) to a list of matching LIWC categories (a list of strings)
* `category_names` is all LIWC categories in the lexicon (a list of strings)
#### Analyzing text
```python
import re
def tokenize(text):
# you may want to use a smarter tokenizer
for match in re.finditer(r'\w+', text, re.UNICODE):
yield match.group(0)
gettysburg = '''Four score and seven years ago our fathers brought forth on
this continent a new nation, conceived in liberty, and dedicated to the
proposition that all men are created equal. Now we are engaged in a great
civil war, testing whether that nation, or any nation so conceived and so
dedicated, can long endure. We are met on a great battlefield of that war.
We have come to dedicate a portion of that field, as a final resting place
for those who here gave their lives that that nation might live. It is
altogether fitting and proper that we should do this.'''.lower()
gettysburg_tokens = tokenize(gettysburg)
```
Now, count all the categories in all of the tokens, and print the results:
```python
from collections import Counter
gettysburg_counts = Counter(category for token in gettysburg_tokens for category in parse(token))
print(gettysburg_counts)
#=> Counter({'funct': 58, 'pronoun': 18, 'cogmech': 17, ...})
```
### _N.B._:
* The LIWC lexicon only matches lowercase strings, so you will most likely want to lowercase your input text before passing it to `parse(...)`.
In the example above, I call `.lower()` on the entire string, but you could alternatively incorporate that into your tokenization process (e.g., by using [spaCy](https://spacy.io/api/token)'s `token.lower_`).
## License
Copyright (c) 2012-2020 Christopher Brown.
[MIT Licensed](LICENSE.txt).
================================================
FILE: liwc/__init__.py
================================================
from .dic import read_dic
from .trie import build_trie, search_trie
try:
import pkg_resources
__version__ = pkg_resources.get_distribution("liwc").version
except Exception:
__version__ = None
def load_token_parser(filepath):
"""
Reads a LIWC lexicon from a file in the .dic format, returning a tuple of
(parse, category_names), where:
* `parse` is a function from a token to a list of strings (potentially
empty) of matching categories
* `category_names` is a list of strings representing all LIWC categories in
the lexicon
"""
lexicon, category_names = read_dic(filepath)
trie = build_trie(lexicon)
def parse_token(token):
for category_name in search_trie(trie, token):
yield category_name
return parse_token, category_names
================================================
FILE: liwc/dic.py
================================================
def _parse_categories(lines):
"""
Read (category_id, category_name) pairs from the categories section.
Each line consists of an integer followed a tab and then the category name.
This section is separated from the lexicon by a line consisting of a single "%".
"""
for line in lines:
line = line.strip()
if line == "%":
return
# ignore non-matching groups of categories
if "\t" in line:
category_id, category_name = line.split("\t", 1)
yield category_id, category_name
def _parse_lexicon(lines, category_mapping):
"""
Read (match_expression, category_names) pairs from the lexicon section.
Each line consists of a match expression followed by a tab and then one or more
tab-separated integers, which are mapped to category names using `category_mapping`.
"""
for line in lines:
line = line.strip()
parts = line.split("\t")
yield parts[0], [category_mapping[category_id] for category_id in parts[1:]]
def read_dic(filepath):
"""
Reads a LIWC lexicon from a file in the .dic format, returning a tuple of
(lexicon, category_names), where:
* `lexicon` is a dict mapping string patterns to lists of category names
* `category_names` is a list of category names (as strings)
"""
with open(filepath) as lines:
# read up to first "%" (should be very first line of file)
for line in lines:
if line.strip() == "%":
break
# read categories (a mapping from integer string to category name)
category_mapping = dict(_parse_categories(lines))
# read lexicon (a mapping from matching string to a list of category names)
lexicon = dict(_parse_lexicon(lines, category_mapping))
return lexicon, list(category_mapping.values())
================================================
FILE: liwc/trie.py
================================================
def build_trie(lexicon):
"""
Build a character-trie from the plain pattern_string -> categories_list
mapping provided by `lexicon`.
Some LIWC patterns end with a `*` to indicate a wildcard match.
"""
trie = {}
for pattern, category_names in lexicon.items():
cursor = trie
for char in pattern:
if char == "*":
cursor["*"] = category_names
break
if char not in cursor:
cursor[char] = {}
cursor = cursor[char]
cursor["$"] = category_names
return trie
def search_trie(trie, token, token_i=0):
"""
Search the given character-trie for paths that match the `token` string.
"""
if "*" in trie:
return trie["*"]
if "$" in trie and token_i == len(token):
return trie["$"]
if token_i < len(token):
char = token[token_i]
if char in trie:
return search_trie(trie[char], token, token_i + 1)
return []
================================================
FILE: setup.cfg
================================================
[metadata]
name = liwc
author = Christopher Brown
author_email = chrisbrown@utexas.edu
url = https://github.com/chbrown/liwc-python
description = Linguistic Inquiry and Word Count (LIWC) analyzer (proprietary data not included)
long_description = file: README.md
long_description_content_type = text/markdown
license = MIT
[options]
packages = find:
zip_safe = True
setup_requires =
pytest-runner
setuptools-scm
tests_require =
pytest
pytest-black
[aliases]
test = pytest
[tool:pytest]
addopts =
--black
[bdist_wheel]
universal = 1
================================================
FILE: setup.py
================================================
from setuptools import setup
setup(use_scm_version=True)
================================================
FILE: test/alpha.dic
================================================
%
1 A
2 Bravo
%
a* 1
bravo 2
================================================
FILE: test/test_alpha_dic.py
================================================
import os.path
import liwc
test_dir = os.path.dirname(__file__)
def test_category_names():
_, category_names = liwc.load_token_parser(os.path.join(test_dir, "alpha.dic"))
assert category_names == ["A", "Bravo"]
def test_parse():
parse, _ = liwc.load_token_parser(os.path.join(test_dir, "alpha.dic"))
sentence = "Any alpha a bravo charlie Bravo boy"
tokens = sentence.split()
matches = [category for token in tokens for category in parse(token)]
# matching is case-sensitive, so the only matches are "alpha" (A), "a" (A) and "bravo" (Bravo)
assert matches == ["A", "A", "Bravo"]
gitextract__4myuiyb/
├── .github/
│ └── ISSUE_TEMPLATE.md
├── .gitignore
├── .travis.yml
├── LICENSE.txt
├── README.md
├── liwc/
│ ├── __init__.py
│ ├── dic.py
│ └── trie.py
├── setup.cfg
├── setup.py
└── test/
├── alpha.dic
└── test_alpha_dic.py
SYMBOL INDEX (8 symbols across 4 files) FILE: liwc/__init__.py function load_token_parser (line 12) | def load_token_parser(filepath): FILE: liwc/dic.py function _parse_categories (line 1) | def _parse_categories(lines): function _parse_lexicon (line 17) | def _parse_lexicon(lines, category_mapping): function read_dic (line 29) | def read_dic(filepath): FILE: liwc/trie.py function build_trie (line 1) | def build_trie(lexicon): function search_trie (line 22) | def search_trie(trie, token, token_i=0): FILE: test/test_alpha_dic.py function test_category_names (line 8) | def test_category_names(): function test_parse (line 13) | def test_parse():
Condensed preview — 12 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (11K chars).
[
{
"path": ".github/ISSUE_TEMPLATE.md",
"chars": 319,
"preview": "Please do not open an issue with the intent of subverting encryption implemented by the LIWC developers.\n\nIf the version"
},
{
"path": ".gitignore",
"chars": 13,
"preview": "build/\ndist/\n"
},
{
"path": ".travis.yml",
"chars": 78,
"preview": "language: python\npython:\n - \"2.7\"\n - \"3.6\"\nscript:\n - python setup.py test\n"
},
{
"path": "LICENSE.txt",
"chars": 1094,
"preview": "Copyright © 2012-2020 Christopher Brown <io@henrian.com>\n\nMIT License\n\nPermission is hereby granted, free of charge, to "
},
{
"path": "README.md",
"chars": 3680,
"preview": "# `liwc`\n\n[](https://pypi.org/project/liwc/)\n[![Travis CI Build Status"
},
{
"path": "liwc/__init__.py",
"chars": 816,
"preview": "from .dic import read_dic\nfrom .trie import build_trie, search_trie\n\ntry:\n import pkg_resources\n\n __version__ = pk"
},
{
"path": "liwc/dic.py",
"chars": 1859,
"preview": "def _parse_categories(lines):\n \"\"\"\n Read (category_id, category_name) pairs from the categories section.\n Each "
},
{
"path": "liwc/trie.py",
"chars": 1002,
"preview": "def build_trie(lexicon):\n \"\"\"\n Build a character-trie from the plain pattern_string -> categories_list\n mapping"
},
{
"path": "setup.cfg",
"chars": 546,
"preview": "[metadata]\nname = liwc\nauthor = Christopher Brown\nauthor_email = chrisbrown@utexas.edu\nurl = https://github.com/chbrown/"
},
{
"path": "setup.py",
"chars": 58,
"preview": "from setuptools import setup\n\nsetup(use_scm_version=True)\n"
},
{
"path": "test/alpha.dic",
"chars": 29,
"preview": "%\n1\tA\n2\tBravo\n%\na*\t1\nbravo\t2\n"
},
{
"path": "test/test_alpha_dic.py",
"chars": 616,
"preview": "import os.path\n\nimport liwc\n\ntest_dir = os.path.dirname(__file__)\n\n\ndef test_category_names():\n _, category_names = l"
}
]
About this extraction
This page contains the full source code of the chbrown/liwc-python GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 12 files (9.9 KB), approximately 2.7k tokens, and a symbol index with 8 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.