Full Code of closeio/redis-hashring for AI

master 5a64283df5de cached
10 files
32.6 KB
9.2k tokens
33 symbols
1 requests
Download .txt
Repository: closeio/redis-hashring
Branch: master
Commit: 5a64283df5de
Files: 10
Total size: 32.6 KB

Directory structure:
gitextract_e8xbh69s/

├── .github/
│   └── workflows/
│       └── test.yaml
├── .gitignore
├── LICENSE
├── README.md
├── example.py
├── pyproject.toml
├── redis_hashring/
│   └── __init__.py
├── requirements.txt
├── setup.py
└── tests.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/test.yaml
================================================
name: test-workflow

on: [push]

permissions:
  contents: read

jobs:
  lint:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.13"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Format
        run: |
          ruff format --check --no-cache
      - name: Lint
        run: |
          ruff check --no-cache

  test:
    runs-on: ubuntu-24.04
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10", "3.11", "3.12", "3.13"]
    services:
      redis:
        image: redis:7.2.4
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Test
        run: |
          pytest


================================================
FILE: .gitignore
================================================
*.pyc
venv/
*.egg
*.egg-info/


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2015-2024 Elastic Inc. (Close)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE


================================================
FILE: README.md
================================================
# `redis-hashring`

`redis-hashring` is a Python library that implements a consistent hash ring for
building distributed applications. The hash ring is stored in Redis.

## The problem

Let's assume you're building a distributed application that's responsible for
syncing accounts. Accounts are synced continuously, e.g. by keeping a
connection open. Given the large amount of accounts, the application can't run
in one process and has to be distributed and split up in multiple processes.
Also, if one of the processes fails or crashes, other machines need to be able
to take over accounts quickly. The load should be balanced equally between the
machines.

## The solution

A solution to this problem is to use a consistent hash ring: Different Python
instances ("nodes") are responsible for a different set of keys. In our account
example, the account IDs could be used as keys. A consistent hash ring is a
large (integer) space that wraps around to form a circle. Each node picks a few
random points ("replicas") on the hash ring when starting. Keys are hashed and
looked up on the hash ring: In order to find the node that's responsible for a
given key, we move on the hash ring until we find the next smaller point that
belongs to a replica. The reason for multiple replicas per node is to ensure
better distribution of the keys amongst the nodes. It can also be used to give
certain nodes more weight. The ring is automatically rebalanced when a node
enters or leaves the ring: If a node crashes or shuts down, its replicas are
removed from the ring.

## How it works

The ring is stored as a sorted set (ZSET) in Redis. Each replica is a member of
the set, scored by it's expiration time. Each node needs to periodically
refresh the score of its replicas to stay on the ring.

The ring contains 2^32 points, and a replica is created by randomly placing a
point on the ring. A replica of a node is responsible for the range of points
from its randomly generated starting point until the starting point of the next
node or replica.

To check if a node is responsible for a given key, the key's position on the
ring is determined by hashing the key using xxHash (CRC-32 is also supported
for backwards-compatibility).

For example, let's say there are two nodes, having one replica each. The first
node is at 1 000 000 000 (1e9), the second at 2e9. In this case, the first node
is responsible for the range [1e9, 2e9-1], the second node is responsible for
[2e9, 2^32-1] and [0, 1e9-1], since the ring wraps. To check to which node the
key *hello* belongs, we compute its hash, which is 4 211 111 929, and the value
is therefore on the second node.

Since the node replica points are picked randomly, it is recommended to have
multiple replicas of the node on a ring to ensure a more even distribution of
the nodes.

## Demo

As an example, let's assume you have a process that is responsible for syncing
accounts. In this example they are numbered from 0 to 99. Starting node 1 will
assign all accounts to node 1, since it's the only node on the ring.

We can see this by running the provided example script on node 1:

```
% python example.py
INFO:root:PID 80721, 100 keys ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
```

We can print the ring for debugging and see all the nodes and replicas on the
ring:

```
% python example.py --print
Hash ring "ring" replicas:
Start      Range  Delay   Node
 706234936  2.97%      0s mbp.local:80721:249d729d
 833679955  3.58%      0s mbp.local:80721:aa60d44c
 987624694 24.44%      0s mbp.local:80721:aa7d4433
2037338983  3.41%      0s mbp.local:80721:e810d068
2183761853  3.55%      0s mbp.local:80721:3917f572
2336151471  2.82%      0s mbp.local:80721:e42b1b46
2457297989  4.40%      0s mbp.local:80721:e6bd5726
2646391033  4.37%      0s mbp.local:80721:6de2fc22
2834073726  5.30%      0s mbp.local:80721:b6f950b2
3061910569  3.96%      0s mbp.local:80721:d176c9e2
3231812046  5.70%      0s mbp.local:80721:65432143
3476455773  5.71%      0s mbp.local:80721:f2b29682
3721589736  0.65%      0s mbp.local:80721:51d0cb09
3749333446  5.53%      0s mbp.local:80721:3572f718
3986767934  4.39%      0s mbp.local:80721:42147f45
4175523935 19.22%      0s mbp.local:80721:296c9522

Hash ring "ring" nodes:
Range    Replicas Delay   Hostname             PID
100.00%       16      0s mbp.local            80721
```

We can see that the node is responsible for the entire ring (range 100%) and
has 16 replicas on the ring.

Now let's start another node by running the script again. It will add its
replicas to the ring and notify all the remaining nodes.

```
% python example.py
INFO:root:PID 80721, 51 keys ([1, 5, 8, 9, 10, 14, 17, 20, 21, 24, 25, 28, 30, 32, 33, 34, 36, 38, 41, 42, 45, 46, 49, 50, 52, 54, 56, 58, 59, 60, 61, 62, 65, 66, 68, 69, 71, 74, 75, 78, 79, 81, 82, 85, 86, 87, 88, 89, 92, 93, 96])
```

Node 1 will rebalance and is now only responsible for keys not in node 2:

```
INFO:root:PID 80808, 49 keys ([0, 2, 3, 4, 6, 7, 11, 12, 13, 15, 16, 18, 19, 22, 23, 26, 27, 29, 31, 35, 37, 39, 40, 43, 44, 47, 48, 51, 53, 55, 57, 63, 64, 67, 70, 72, 73, 76, 77, 80, 83, 84, 90, 91, 94, 95, 97, 98, 99])
```

We can inspect the ring:

```
% python example.py --print
Hash ring "ring" replicas:
Start      Range  Delay   Node
 204632062  1.06%      0s mbp.local:80808:f933c33c
 250215779  0.36%      0s mbp.local:80808:3b104c45
 265648189  1.15%      0s mbp.local:80808:84d71125
 315059885  2.77%      0s mbp.local:80808:bab5a03c
 434081415  6.34%      0s mbp.local:80808:6eec1b26
 706234936  2.97%      0s mbp.local:80721:249d729d
 833679955  1.59%      0s mbp.local:80721:aa60d44c
 901926411  2.00%      0s mbp.local:80808:bd6f3b27
 987624694  2.87%      0s mbp.local:80721:aa7d4433
1110943067  5.42%      0s mbp.local:80808:abfa5d78
1343923832  0.83%      0s mbp.local:80808:5261947f
1379658747  4.70%      0s mbp.local:80808:cb0904de
1581392642  1.06%      0s mbp.local:80808:3050daa3
1627017290  9.55%      0s mbp.local:80808:8e1cef12
2037338983  3.41%      0s mbp.local:80721:e810d068
2183761853  3.55%      0s mbp.local:80721:3917f572
2336151471  2.82%      0s mbp.local:80721:e42b1b46
2457297989  4.40%      0s mbp.local:80721:e6bd5726
2646391033  4.37%      0s mbp.local:80721:6de2fc22
2834073726  2.30%      0s mbp.local:80721:b6f950b2
2932842903  3.01%      0s mbp.local:80808:58f09769
3061910569  3.08%      0s mbp.local:80721:d176c9e2
3194206736  0.88%      0s mbp.local:80808:ce94a1cf
3231812046  5.70%      0s mbp.local:80721:65432143
3476455773  0.21%      0s mbp.local:80721:f2b29682
3485592199  5.49%      0s mbp.local:80808:6fc107a3
3721589736  0.65%      0s mbp.local:80721:51d0cb09
3749333446  0.68%      0s mbp.local:80721:3572f718
3778349273  4.85%      0s mbp.local:80808:e7cc7485
3986767934  1.29%      0s mbp.local:80721:42147f45
4042192844  3.10%      0s mbp.local:80808:001590b5
4175523935  7.55%      0s mbp.local:80721:296c9522

Hash ring "ring" nodes:
Range    Replicas Delay   Hostname             PID
47.42%       16      0s mbp.local            80721
52.58%       16      0s mbp.local            80808
```

## `gevent` example

`redis-hashring` provides a `GeventRingNode` class for `gevent`-based
applications. The `GeventRingNode.start()` method spawns a greenlet that
initializes the ring and periodically updates the node's replicas.

An example app could look as follows:

```python
from redis import Redis
from redis_hashring import GeventRingNode

KEY = "example-ring"

redis = Redis()
node = GeventRingNode(redis, KEY)
node.start()


def get_items():
    """
    Implement this method and return items to be processed.
    """
    raise NotImplementedError()


def process_items(items):
    """
    Implement this method and process the given items.
    """
    raise NotImplementedError()


try:
    while True:
        # Only process items this node is reponsible for.
        items = [item for item in get_items() if node.contains(item)]
        process_items(items)
except KeyboardInterrupt:
    pass

node.stop()
```

## Implementation considerations

When implementing a distributed application using `redis-hashring`, be aware of
the following:

- Locking

  When nodes are added to the ring, multiple nodes might assume they're
  responsible for the same key until they are notified about the new state of
  the ring. Depending on the application, locking may be necessary to avoid
  duplicate processing.

  For example, in the demo above the node could add a per-account-ID lock if an
  account should never be synced by multiple nodes at the same time. This can
  be done using a Redis lock class or any other distributed lock.

- Limit

  It is recommended to add an upper limit to the number of keys a node can
  process to avoid overloading a node when there are few nodes on the ring or
  all nodes need to be restarted.

  For example, in the demo above we could implement a limit of 50 accounts, if
  we know that a node may not be capable of syncing much more. In this case,
  multiple nodes would need to be running to sync all the accounts. Also note
  that the ring is not usually equally balanced, so running 2 nodes wouldn't be
  enough in this example.


================================================
FILE: example.py
================================================
import argparse
import logging
import os
import sys
import time

import redis

from redis_hashring import RingNode

N_KEYS = 100

logging.basicConfig(level=logging.DEBUG)


def _parse_arguments():
    parser = argparse.ArgumentParser("Hash ring example.", add_help=False)
    parser.add_argument(
        "--host", "-h", default="localhost", help="Redis hostname."
    )
    parser.add_argument(
        "--port", "-p", type=int, default=6379, help="Redis port."
    )
    parser.add_argument(
        "--print",
        action="store_true",
        dest="print",
        help="Print the hash ring.",
    )
    parser.add_argument(
        "--help",
        action="help",
        default=argparse.SUPPRESS,
        help="Show this help message and exit.",
    )
    return parser.parse_args()


if __name__ == "__main__":
    args = _parse_arguments()

    print(f"Attempting to connect to Redis at {args.host}:{args.port}.")
    r = redis.Redis(args.host, args.port)
    try:
        r.ping()
    except redis.exceptions.ConnectionError:
        print("Failed to connect to Redis.")
        sys.exit(1)

    node = RingNode(r, "ring")

    if args.print:
        node.debug_print()
        sys.exit()

    pid = os.getpid()

    node.start()

    try:
        while True:
            keys = [key for key in range(N_KEYS) if node.contains(str(key))]
            logging.info("PID %d, %d keys (%s)", pid, len(keys), repr(keys))
            time.sleep(2)
    except KeyboardInterrupt:
        pass

    node.stop()


================================================
FILE: pyproject.toml
================================================
[tool.ruff]
target-version = "py38"
line-length = 79
exclude = [
    ".git",
    "venv",
    ".venv",
    "__pycache__",
]

[tool.ruff.lint]
extend-select = ["I"]
unfixable = [
    # Variable assigned but never used - automatically removing the assignment
    # is annoying when running autofix on work-in-progress code.
    "F841",
]

[tool.ruff.lint.flake8-tidy-imports]
ban-relative-imports = "all"

[tool.ruff.lint.flake8-tidy-imports.banned-api]
"functools.partial".msg = "Use a lambda or a named function instead. Partials don't type check correctly."
"datetime.datetime.utcnow".msg = "Use `datetime.datetime.now(datetime.timezone.utc).replace(tzinfo=None)` instead."
"datetime.datetime.utcfromtimestamp".msg = "Use `datetime.datetime.fromtimestamp(timestamp, datetime.timezone.utc).replace(tzinfo=None)` instead."

[tool.ruff.lint.isort]
combine-as-imports = true
forced-separate = ["tests"]

[tool.ruff.lint.pydocstyle]
# https://google.github.io/styleguide/pyguide.html#383-functions-and-methods
convention = "google"

[tool.ruff.lint.flake8-annotations]
ignore-fully-untyped = true

[tool.pytest.ini_options]
timeout = 180
python_files = "tests.py"
testpaths = ["."]
xfail_strict = true


================================================
FILE: redis_hashring/__init__.py
================================================
import binascii
import collections
import enum
import operator
import os
import random
import select
import socket
import threading
import time

try:
    import xxhash
except ImportError:
    xxhash = None

# Amount of points on the ring. Must not be higher than 2**32.
RING_SIZE = 2**32

# Default amount of replicas per node.
RING_REPLICAS = 16


class HashAlgorithm(enum.Enum):
    CRC32 = "crc32"
    XXHASH = "xxhash"


# How often to update a node's heartbeat.
POLL_INTERVAL = 10

# After how much time a node is considered to be dead.
NODE_TIMEOUT = 60

# How often expired nodes are cleaned up from the ring.
CLEANUP_INTERVAL = 120


def _hash_with_xxhash(key):
    return xxhash.xxh32(key.encode()).intdigest() % RING_SIZE


def _hash_with_crc32(key):
    return binascii.crc32(key.encode()) % RING_SIZE


def _decode(data):
    # Compatibility with different redis-py `decode_responses` settings.
    if isinstance(data, bytes):
        return data.decode()
    else:
        return data


class RingNode(object):
    """
    A node in a Redis hash ring.

    Each node may have multiple replicas on the ring for more balanced hashing.

    The ring is stored as follows in Redis:

    ZSET <key>
    Represents the ring in Redis. The keys of this ZSET represent
    "start:replica_name", where start is the start of the range for which the
    replica is responsible.

    CHANNEL <key>
    Represents a pubsub channel in Redis which receives a message every time
    the ring structure has changed.

    Simple usage example:

    ```
    node = RingNode(redis, key)
    node.start()

    while is_running:
        # Only process items this node is responsible for. `item` should be an
        # object that can be encoded to bytes by calling `item.encode()` on it,
        # like a `str`.
        items = [item for item in get_items() if node.contains(item)]
        process_items(items)

    node.stop()
    ```

    Using CRC-32 (if you need to support hashrings created before xxHash
    support was introduced):

    ```
    from redis_hashring import RingNode, HashAlgorithm

    node = RingNode(redis, key, hash_algorithm=HashAlgorithm.CRC32)
    node.start()
    ```

    As a context manager:

    ```
    with RingNode(redis, key) as node:
        while is_running:
            # Only process items this node is responsible for. `item` should be
            # an object that can be encoded to bytes by calling `item.encode()`
            # on it, like a `str`.
            items = [item for item in get_items() if node.contains(item)]
            process_items(items)
    ```
    """

    def __init__(
        self,
        conn,
        key,
        *,
        n_replicas=RING_REPLICAS,
        hash_algorithm=HashAlgorithm.XXHASH,
    ):
        """
        Initializes a Redis hash ring node.

        Args:
            conn: The Redis connection to use.
            key: A key to use for this node.
            n_replicas: Number of replicas this node should have on the ring.
            hash_algorithm: Hash algorithm to use. It is recommended to use
                `HashAlgorithm.XXHASH` (the default) because it provides better
                uniform distribution than CRC-32 with faster hashing. If you
                need to support hashrings created before we introduced support
                for xxHash, use `HashAlgorithm.CRC32`.
        """
        self._polling_thread = None
        self._stop_polling_fd_r = None
        self._stop_polling_fd_w = None

        self._conn = conn
        self._key = key

        if hash_algorithm is HashAlgorithm.XXHASH:
            if xxhash is None:
                raise ImportError(
                    "xxhash library is required for XXHASH algorithm. "
                    "Install with: pip install redis-hashring[xxhash]"
                )
            self._hash_function = _hash_with_xxhash
        elif hash_algorithm is HashAlgorithm.CRC32:
            self._hash_function = _hash_with_crc32
        else:
            raise ValueError("Unexpected hash algorithm requested")

        host = socket.gethostname()
        pid = os.getpid()

        # Create unique identifiers for the replicas.
        self._replicas = [
            (
                random.randrange(2**32),
                "{host}:{pid}:{id_}".format(
                    host=host,
                    pid=pid,
                    id_=binascii.hexlify(os.urandom(4)).decode(),
                ),
            )
            for _ in range(n_replicas)
        ]

        # Number of nodes currently active in the ring.
        self._node_count = 0
        # List of tuples of ranges this node is responsible for, where a tuple
        # (a, b) includes any N matching a <= N < b.
        self._ranges = []

        self._select = select.select

    def _fetch_ring(self):
        """
        Fetch the ring from Redis.

        The fetched ring only includes active nodes. Returns a list of tuples
        (start, replica) (see _fetch_all docs for more details).
        """
        expiry_time = time.time() - NODE_TIMEOUT
        data = self._conn.zrangebyscore(self._key, expiry_time, "INF")

        ring = []
        for replica_data in data:
            start, replica = _decode(replica_data).split(":", 1)
            ring.append((int(start), replica))
        return sorted(ring, key=operator.itemgetter(0))

    def _fetch_ring_all(self):
        """
        Fetch the ring from Redis.

        The fetched ring will include inactive nodes. Returns a list of tuples
        (start, replica, heartbeat, expired), where:
        * start: start of the range for which the replica is responsible.
        * replica: name of the replica.
        * heartbeat: timestamp of the last heartbeat.
        * expired: boolean denoting whether this replica is inactive.
        """
        expiry_time = time.time() - NODE_TIMEOUT
        data = self._conn.zrange(self._key, 0, -1, withscores=True)

        ring = []
        for replica_data, heartbeat in data:
            start, replica = _decode(replica_data).split(":", 1)
            ring.append(
                (int(start), replica, heartbeat, heartbeat < expiry_time)
            )
        return sorted(ring, key=operator.itemgetter(0))

    def debug_print(self):
        """
        Prints the ring for debugging purposes.
        """
        ring = self._fetch_ring_all()

        print('Hash ring "{key}" replicas:'.format(key=self._key))

        now = time.time()

        n_replicas = len(ring)
        if ring:
            print(
                "{:10} {:6} {:7} {}".format("Start", "Range", "Delay", "Node")
            )
        else:
            print("(no replicas)")

        nodes = collections.defaultdict(list)

        for n, (start, replica, heartbeat, expired) in enumerate(ring):
            hostname, pid, _ = replica.split(":")
            node = ":".join([hostname, pid])

            abs_size = (ring[(n + 1) % n_replicas][0] - ring[n][0]) % RING_SIZE
            size = 100.0 / RING_SIZE * abs_size
            delay = int(now - heartbeat)
            expired_str = "(EXPIRED)" if expired else ""

            nodes[node].append((hostname, pid, abs_size, delay, expired))

            print(
                f"{start:10} {size:5.2f}% {delay:6}s {replica} {expired_str}"
            )

        print()
        print('Hash ring "{key}" nodes:'.format(key=self._key))

        if nodes:
            print(
                "{:8} {:8} {:7} {:20} {:5}".format(
                    "Range", "Replicas", "Delay", "Hostname", "PID"
                )
            )
        else:
            print("(no nodes)")

        for _, v in nodes.items():
            hostname, pid = v[0][0], v[0][1]
            abs_size = sum(replica[2] for replica in v)
            size = 100.0 / RING_SIZE * abs_size
            delay = max(replica[3] for replica in v)
            expired = any(replica[4] for replica in v)
            count = len(v)
            expired_str = "(EXPIRED)" if expired else ""
            print(
                f"{size:5.2f}% {count:8} {delay:6}s {hostname:20} {pid:5}"
                f" {expired_str}"
            )

    def heartbeat(self):
        """
        Add/update the node in Redis.

        Needs to be called regularly by the node.
        """
        pipeline = self._conn.pipeline()

        now = time.time()

        for replica in self._replicas:
            pipeline.zadd(self._key, {f"{replica[0]}:{replica[1]}": now})
        ret = pipeline.execute()

        # Only notify the other nodes if we're not in the ring yet.
        if any(ret):
            self._notify()

    def remove(self):
        """
        Remove the node from the ring.
        """
        pipeline = self._conn.pipeline()

        for replica in self._replicas:
            pipeline.zrem(self._key, f"{replica[0]}:{replica[1]}")
        pipeline.execute()

        # Make sure this node won't contain any items.
        self._node_count = 0
        self._ranges = []

        self._notify()

    def _notify(self):
        """
        Publish an update to the ring's activity channel.
        """
        self._conn.publish(self._key, "*")

    def cleanup(self):
        """
        Removes expired nodes from the ring.
        """
        expired = time.time() - NODE_TIMEOUT

        if self._conn.zremrangebyscore(self._key, 0, expired):
            self._notify()

    def update(self):
        """
        Fetches the updated ring from Redis and updates the current ranges.
        """
        ring = self._fetch_ring()
        nodes = set()
        n_replicas = len(ring)

        own_replicas = {r[1] for r in self._replicas}

        self._ranges = []
        for n, (start, replica) in enumerate(ring):
            host, pid, _ = replica.split(":")
            node = ":".join([host, pid])
            nodes.add(node)

            if replica in own_replicas:
                end = ring[(n + 1) % n_replicas][0] % RING_SIZE
                if start < end:
                    self._ranges.append((start, end))
                elif end < start:
                    self._ranges.append((start, RING_SIZE))
                    self._ranges.append((0, end))
                else:
                    self._ranges.append((0, RING_SIZE))

        self._node_count = len(nodes)

    def get_ranges(self):
        """
        Return the hash ring ranges that this node owns.
        """
        return self._ranges

    def get_node_count(self):
        """
        Return the number of active nodes in the ring.
        """
        return self._node_count

    def contains(self, key):
        """
        Check whether this node is responsible for the item.
        """
        return self._contains_ring_point(self.key_as_ring_point(key))

    def key_as_ring_point(self, key):
        """Turn a key into a point on a hash ring."""
        return self._hash_function(key)

    def _contains_ring_point(self, n):
        """
        Check whether this node is responsible for the ring point.
        """
        for start, end in self._ranges:
            if start <= n < end:
                return True
        return False

    def poll(self):
        """
        Keep a node in the hash ring.

        This should be kept running for as long as the node needs to stay in
        the ring. Can be run in a separate thread or in a greenlet. This takes
        care of:
        * Updating the heartbeat.
        * Checking for ring updates.
        * Cleaning up expired nodes periodically.
        """
        pubsub = self._conn.pubsub()
        pubsub.subscribe(self._key)
        pubsub_fd = pubsub.connection._sock.fileno()

        last_heartbeat = time.time()
        self.heartbeat()

        last_cleanup = time.time()
        self.cleanup()

        self._stop_polling_fd_r, self._stop_polling_fd_w = os.pipe()

        try:
            while True:
                # Since Redis' `listen` method blocks, we use `select` to
                # inspect the underlying socket to see if there is activity.
                timeout = max(
                    0.0, POLL_INTERVAL - (time.time() - last_heartbeat)
                )
                r, _, _ = self._select(
                    [self._stop_polling_fd_r, pubsub_fd], [], [], timeout
                )

                if self._stop_polling_fd_r in r:
                    os.close(self._stop_polling_fd_r)
                    os.close(self._stop_polling_fd_w)
                    self._stop_polling_fd_r = None
                    self._stop_polling_fd_w = None
                    break

                if pubsub_fd in r:
                    while pubsub.get_message():
                        pass
                    self.update()

                last_heartbeat = time.time()
                self.heartbeat()

                now = time.time()
                if now - last_cleanup > CLEANUP_INTERVAL:
                    last_cleanup = now
                    self.cleanup()
        finally:
            pubsub.close()

    def start(self):
        """
        Start the node for threads-based applications.
        """
        self._polling_thread = threading.Thread(target=self.poll, daemon=True)
        self._polling_thread.start()

    def stop(self):
        """
        Stop the node for threads-based applications.
        """
        if self._polling_thread:
            while not self._stop_polling_fd_w:
                # Let's give the thread some time to create the fd.
                time.sleep(0.1)
            os.write(self._stop_polling_fd_w, b"1")
            self._polling_thread.join()
            self._polling_thread = None
        self.remove()

    def __enter__(self):
        self.start()
        return self

    def __exit__(self, *args, **kwargs):
        self.stop()


class GeventRingNode(RingNode):
    """
    A node in a Redis hash ring.

    This works exactly the same as `RingNode`, except that `start` and `stop`
    will create a gevent greenlet to maintain the node information up to date
    with the hash ring.

    For a usage example, see the documentation for `RingNode`.
    """

    def __init__(self, *args, **kwargs):
        self._polling_greenlet = None
        super().__init__(*args, **kwargs)

    def start(self):
        """
        Start the node for gevent-based applications.
        """
        import gevent
        import gevent.select

        self._select = gevent.select.select
        self._polling_greenlet = gevent.spawn(self.poll)

        # Even though `self.poll` will run `self.heartbeat` and `self.update`
        # immediately as it starts, this is gevent and `self.poll` may take a
        # while to run, depending on how long the greenlet that creates the
        # node takes to yield. So we'll run these functions here to make sure
        # the node is up to date immediately.
        self.heartbeat()
        self.update()

    def stop(self):
        """
        Stop the node for gevent-based applications.
        """
        if self._polling_greenlet:
            while not self._stop_polling_fd_w:
                # Let's give the greenlet some time to create the fd.
                time.sleep(0.1)
            os.write(self._stop_polling_fd_w, b"1")
            self._polling_greenlet.join()
            self._polling_greenlet = None
        self.remove()
        self._select = select.select


================================================
FILE: requirements.txt
================================================
pytest==7.2.2
redis==4.6.0
ruff==0.4.3
xxhash==3.5.0


================================================
FILE: setup.py
================================================
from setuptools import setup

setup(
    name="redis-hashring",
    version="0.6.0",
    author="Close Engineering",
    author_email="engineering@close.com",
    url="https://github.com/closeio/redis-hashring",
    license="MIT",
    description=(
        "Python library for distributed applications using a Redis hash ring"
    ),
    install_requires=["redis>=3"],
    extras_require={
        "xxhash": ["xxhash>=3.5.0"],
    },
    platforms="any",
    classifiers=[
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
        "Topic :: Software Development :: Libraries :: Python Modules",
        "Programming Language :: Python",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3 :: Only",
        "Programming Language :: Python :: 3.10",
        "Programming Language :: Python :: 3.11",
        "Programming Language :: Python :: 3.12",
        "Programming Language :: Python :: 3.13",
    ],
    packages=["redis_hashring"],
)


================================================
FILE: tests.py
================================================
import socket
from unittest.mock import patch

import pytest
from redis import Redis

from redis_hashring import HashAlgorithm, RingNode

TEST_KEY = "hashring-test"


@pytest.fixture
def redis():
    redis = Redis()
    yield redis
    redis.delete(TEST_KEY)


def get_node(redis, n_replicas, total_replicas, hash_algorithm):
    node = RingNode(
        redis, TEST_KEY, n_replicas=n_replicas, hash_algorithm=hash_algorithm
    )

    assert len(node._replicas) == n_replicas
    assert redis.zcard(TEST_KEY) == total_replicas - n_replicas

    node.heartbeat()

    assert redis.zcard(TEST_KEY) == total_replicas
    assert len(node.get_ranges()) == 0

    return node


def test_node(redis):
    with patch.object(socket, "gethostname", return_value="host1"):
        node1 = get_node(redis, 1, 1, HashAlgorithm.XXHASH)
    node1.update()
    assert len(node1.get_ranges()) == 1
    assert node1.get_node_count() == 1

    with patch.object(socket, "gethostname", return_value="host2"):
        node2 = get_node(redis, 1, 2, HashAlgorithm.XXHASH)
    node1.update()
    node2.update()
    assert len(node1.get_ranges()) + len(node2.get_ranges()) == 3
    assert node1.get_node_count() == 2
    assert node2.get_node_count() == 2

    with patch.object(socket, "gethostname", return_value="host3"):
        node3 = get_node(redis, 2, 4, HashAlgorithm.XXHASH)
    node1.update()
    node2.update()
    node3.update()
    assert (
        len(node1.get_ranges())
        + len(node2.get_ranges())
        + len(node3.get_ranges())
        == 5
    )
    assert node1.get_node_count() == 3
    assert node2.get_node_count() == 3
    assert node3.get_node_count() == 3

    node1.remove()
    node2.update()
    node3.update()
    assert len(node1.get_ranges()) == 0
    assert node1.get_node_count() == 0
    assert len(node2.get_ranges()) + len(node3.get_ranges()) == 4
    assert node2.get_node_count() == 2
    assert node3.get_node_count() == 2


@pytest.mark.parametrize(
    "hash_algorithm", [HashAlgorithm.CRC32, HashAlgorithm.XXHASH]
)
def test_contains(redis, hash_algorithm):
    node1 = get_node(redis, 1, 1, hash_algorithm=hash_algorithm)
    node1.update()
    assert node1.contains("item") is True

    node1.remove()
    assert node1.contains("item") is False
Download .txt
gitextract_e8xbh69s/

├── .github/
│   └── workflows/
│       └── test.yaml
├── .gitignore
├── LICENSE
├── README.md
├── example.py
├── pyproject.toml
├── redis_hashring/
│   └── __init__.py
├── requirements.txt
├── setup.py
└── tests.py
Download .txt
SYMBOL INDEX (33 symbols across 3 files)

FILE: example.py
  function _parse_arguments (line 16) | def _parse_arguments():

FILE: redis_hashring/__init__.py
  class HashAlgorithm (line 24) | class HashAlgorithm(enum.Enum):
  function _hash_with_xxhash (line 39) | def _hash_with_xxhash(key):
  function _hash_with_crc32 (line 43) | def _hash_with_crc32(key):
  function _decode (line 47) | def _decode(data):
  class RingNode (line 55) | class RingNode(object):
    method __init__ (line 111) | def __init__(
    method _fetch_ring (line 175) | def _fetch_ring(self):
    method _fetch_ring_all (line 191) | def _fetch_ring_all(self):
    method debug_print (line 213) | def debug_print(self):
    method heartbeat (line 273) | def heartbeat(self):
    method remove (line 291) | def remove(self):
    method _notify (line 307) | def _notify(self):
    method cleanup (line 313) | def cleanup(self):
    method update (line 322) | def update(self):
    method get_ranges (line 350) | def get_ranges(self):
    method get_node_count (line 356) | def get_node_count(self):
    method contains (line 362) | def contains(self, key):
    method key_as_ring_point (line 368) | def key_as_ring_point(self, key):
    method _contains_ring_point (line 372) | def _contains_ring_point(self, n):
    method poll (line 381) | def poll(self):
    method start (line 437) | def start(self):
    method stop (line 444) | def stop(self):
    method __enter__ (line 457) | def __enter__(self):
    method __exit__ (line 461) | def __exit__(self, *args, **kwargs):
  class GeventRingNode (line 465) | class GeventRingNode(RingNode):
    method __init__ (line 476) | def __init__(self, *args, **kwargs):
    method start (line 480) | def start(self):
    method stop (line 498) | def stop(self):

FILE: tests.py
  function redis (line 13) | def redis():
  function get_node (line 19) | def get_node(redis, n_replicas, total_replicas, hash_algorithm):
  function test_node (line 35) | def test_node(redis):
  function test_contains (line 78) | def test_contains(redis, hash_algorithm):
Condensed preview — 10 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (35K chars).
[
  {
    "path": ".github/workflows/test.yaml",
    "chars": 1297,
    "preview": "name: test-workflow\n\non: [push]\n\npermissions:\n  contents: read\n\njobs:\n  lint:\n    runs-on: ubuntu-24.04\n    steps:\n     "
  },
  {
    "path": ".gitignore",
    "chars": 30,
    "preview": "*.pyc\nvenv/\n*.egg\n*.egg-info/\n"
  },
  {
    "path": "LICENSE",
    "chars": 1091,
    "preview": "The MIT License (MIT)\n\nCopyright (c) 2015-2024 Elastic Inc. (Close)\n\nPermission is hereby granted, free of charge, to an"
  },
  {
    "path": "README.md",
    "chars": 9516,
    "preview": "# `redis-hashring`\n\n`redis-hashring` is a Python library that implements a consistent hash ring for\nbuilding distributed"
  },
  {
    "path": "example.py",
    "chars": 1514,
    "preview": "import argparse\nimport logging\nimport os\nimport sys\nimport time\n\nimport redis\n\nfrom redis_hashring import RingNode\n\nN_KE"
  },
  {
    "path": "pyproject.toml",
    "chars": 1197,
    "preview": "[tool.ruff]\ntarget-version = \"py38\"\nline-length = 79\nexclude = [\n    \".git\",\n    \"venv\",\n    \".venv\",\n    \"__pycache__\","
  },
  {
    "path": "redis_hashring/__init__.py",
    "chars": 15385,
    "preview": "import binascii\nimport collections\nimport enum\nimport operator\nimport os\nimport random\nimport select\nimport socket\nimpor"
  },
  {
    "path": "requirements.txt",
    "chars": 53,
    "preview": "pytest==7.2.2\nredis==4.6.0\nruff==0.4.3\nxxhash==3.5.0\n"
  },
  {
    "path": "setup.py",
    "chars": 1070,
    "preview": "from setuptools import setup\n\nsetup(\n    name=\"redis-hashring\",\n    version=\"0.6.0\",\n    author=\"Close Engineering\",\n   "
  },
  {
    "path": "tests.py",
    "chars": 2275,
    "preview": "import socket\nfrom unittest.mock import patch\n\nimport pytest\nfrom redis import Redis\n\nfrom redis_hashring import HashAlg"
  }
]

About this extraction

This page contains the full source code of the closeio/redis-hashring GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 10 files (32.6 KB), approximately 9.2k tokens, and a symbol index with 33 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!