Repository: wenkesj/holdem
Branch: master
Commit: b2089d54122c
Files: 8
Total size: 33.8 KB

Directory structure:
gitextract_e7zk95ig/

├── .gitignore
├── README.md
├── example.py
├── holdem/
│   ├── __init__.py
│   ├── env.py
│   ├── player.py
│   └── utils.py
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
/*.egg-info
/dist
/build
/test
/examples
__pycache__
*.DS_Store


================================================
FILE: README.md
================================================
# holdem

:warning: **This is an experimental API, it will most definitely contain bugs, but that's why you are here!**

```sh
pip install holdem
```

Afaik, this is the first [OpenAI Gym](https://github.com/openai/gym) _No-Limit Texas Hold'em_* (NLTH)
environment written in Python. It's an experiment to build a Gym environment that is synchronous and
can support any number of players but also appeal to the general public that wants to learn how to
"solve" NLTH.

*Python 3 supports arbitrary length integers :money_with_wings:

Right now, this is a work in progress, but I believe the API is mature enough for some preliminary
experiments. Join me in making some interesting progress on multi-agent Gym environments.

# Usage

There is limited documentation at the moment. I'll try to make this less painful to understand.

## `env = holdem.TexasHoldemEnv(n_seats, max_limit=1e9, debug=False)`

Creates a gym environment representation a NLTH Table from the parameters:

+ `n_seats` - number of available players for the current table. No players are initially allocated
  to the table. You must call `env.add_player(seat_id, ...)` to populate the table.
+ `max_limit` - max_limit is used to define the `gym.spaces` API for the class. It does not actually
  determine any NLTH limits; in support of `gym.spaces.Discrete`.
+ `debug` - add debug statements to play, will probably be removed in the future.

### `env.add_player(seat_id, stack=2000)`

Adds a player to the table according to the specified seat (`seat_id`) and the initial amount of
chips allocated to the player's `stack`. If the table does not have enough seats according to the
`n_seats` used by the constructor, a `gym.error.Error` will be raised.

### `(player_states, community_states) = env.reset()`

Calling `env.reset` resets the NLTH table to a new hand state. It does not reset any of the players
stacks, or, reset any of the blinds. New behavior is reserved for a special, future portion of the
API that is yet another feature that is not standard in Gym environments and is a work in progress.

The observation returned is a `tuple` of the following by index:

0. `player_states` - a `tuple` where each entry is `tuple(player_info, player_hand)`, this feature
   can be used to gather all states and hands by `(player_infos, player_hands) = zip(*player_states)`.
   + `player_infos` - is a `list` of `int` features describing the individual player. It contains
     the following by index:
     0. `[0, 1]` - `0` - seat is empty, `1` - seat is not empty.
     1. `[0, n_seats - 1]` - player's id, where they are sitting.
     2. `[0, inf]` - player's current stack.
     3. `[0, 1]` - player is playing the current hand.
     4. `[0, inf]` the player's current handrank according to `treys.Evaluator.evaluate(hand, community)`.
     5. `[0, 1]` - `0` - player has not played this round, `1` - player has played this round.
     6. `[0, 1]` - `0` - player is currently not betting, `1` - player is betting.
     7. `[0, 1]` - `0` - player is currently not all-in, `1` - player is all-in.
     8. `[0, inf]` - player's last sidepot.
   + `player_hands` - is a `list` of `int` features describing the cards in the player's pocket.
     The values are encoded based on the `treys.Card` integer representation.
1. `community_states` - a `tuple(community_infos, community_cards)` where:
   + `community_infos` - a `list` by index:
     0. `[0, n_seats - 1]` - location of the dealer button, where big blind is posted.
     1. `[0, inf]` - the current small blind amount.
     2. `[0, inf]` - the current big blind amount.
     3. `[0, inf]` - the current total amount in the community pot.
     4. `[0, inf]` - the last posted raise amount.
     5. `[0, inf]` - minimum required raise amount, if above 0.
     6. `[0, inf]` - the amount required to call.
     7. `[0, n_seats - 1]` - the current player required to take an action.
   + `community_cards` - is a `list` of `int` features describing the cards in the community.
     The values are encoded based on the `treys.Card` integer representation. There are 5 `int` in
     the list, where `-1` represents that there is no card present.

# Example

```python
import gym
import holdem

def play_out_hand(env, n_seats):
  # reset environment, gather relevant observations
  (player_states, (community_infos, community_cards)) = env.reset()
  (player_infos, player_hands) = zip(*player_states)

  # display the table, cards and all
  env.render(mode='human')

  terminal = False
  while not terminal:
    # play safe actions, check when noone else has raised, call when raised.
    actions = holdem.safe_actions(community_infos, n_seats=n_seats)
    (player_states, (community_infos, community_cards)), rews, terminal, info = env.step(actions)
    env.render(mode='human')

env = gym.make('TexasHoldem-v1') # holdem.TexasHoldemEnv(2)

# start with 2 players
env.add_player(0, stack=2000) # add a player to seat 0 with 2000 "chips"
env.add_player(1, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out a hand
play_out_hand(env, env.n_seats)

# add one more player
env.add_player(2, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out another hand
play_out_hand(env, env.n_seats)
```


================================================
FILE: example.py
================================================
# -*- coding: utf-8 -*-
#
# Copyright (c) 2018 Sam Wenke (samwenke@gmail.com)
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
import gym
import holdem

def play_out_hand(env, n_seats):
  # reset environment, gather relevant observations
  (player_states, (community_infos, community_cards)) = env.reset()
  (player_infos, player_hands) = zip(*player_states)

  # display the table, cards and all
  env.render(mode='human')

  terminal = False
  while not terminal:
    # play safe actions, check when noone else has raised, call when raised.
    actions = holdem.safe_actions(community_infos, n_seats=n_seats)
    (player_states, (community_infos, community_cards)), rews, terminal, info = env.step(actions)
    env.render(mode='human')

env = gym.make('TexasHoldem-v1') # holdem.TexasHoldemEnv(2)

# start with 2 players
env.add_player(0, stack=2000) # add a player to seat 0 with 2000 "chips"
env.add_player(1, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out a hand
play_out_hand(env, env.n_seats)

# add one more player
env.add_player(2, stack=2000) # add another player to seat 1 with 2000 "chips"
# play out another hand
play_out_hand(env, env.n_seats)


================================================
FILE: holdem/__init__.py
================================================
# -*- coding: utf-8 -*-
from gym.envs.registration import register

from .env import TexasHoldemEnv
from .utils import card_to_str, hand_to_str, safe_actions, action_table

register(
 	id='TexasHoldem-v0',
 	entry_point='holdem.env:TexasHoldemEnv',
  kwargs={'n_seats': 2, 'debug': False},
)

register(
 	id='TexasHoldem-v1',
 	entry_point='holdem.env:TexasHoldemEnv',
  kwargs={'n_seats': 4, 'debug': False},
)

register(
 	id='TexasHoldem-v2',
 	entry_point='holdem.env:TexasHoldemEnv',
  kwargs={'n_seats': 8, 'debug': False},
)


================================================
FILE: holdem/env.py
================================================
# -*- coding: utf-8 -*-
#
# Copyright (c) 2016 Aleksander Beloi (beloi.alex@gmail.com)
# Copyright (c) 2018 Sam Wenke (samwenke@gmail.com)
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
from gym import Env, error, spaces, utils
from gym.utils import seeding

from treys import Card, Deck, Evaluator

from .player import Player
from .utils import hand_to_str, format_action


class TexasHoldemEnv(Env, utils.EzPickle):
  BLIND_INCREMENTS = [[10,25], [25,50], [50,100], [75,150], [100,200],
                      [150,300], [200,400], [300,600], [400,800], [500,10000],
                      [600,1200], [800,1600], [1000,2000]]

  def __init__(self, n_seats, max_limit=100000, debug=False):
    n_suits = 4                     # s,h,d,c
    n_ranks = 13                    # 2,3,4,5,6,7,8,9,T,J,Q,K,A
    n_community_cards = 5           # flop, turn, river
    n_pocket_cards = 2
    n_stud = 5

    self.n_seats = n_seats

    self._blind_index = 0
    [self._smallblind, self._bigblind] = TexasHoldemEnv.BLIND_INCREMENTS[0]
    self._deck = Deck()
    self._evaluator = Evaluator()

    self.community = []
    self._round = 0
    self._button = 0
    self._discard = []

    self._side_pots = [0] * n_seats
    self._current_sidepot = 0 # index of _side_pots
    self._totalpot = 0
    self._tocall = 0
    self._lastraise = 0
    self._number_of_hands = 0

    # fill seats with dummy players
    self._seats = [Player(i, stack=0, emptyplayer=True) for i in range(n_seats)]
    self.emptyseats = n_seats
    self._player_dict = {}
    self._current_player = None
    self._debug = debug
    self._last_player = None
    self._last_actions = None

    self.observation_space = spaces.Tuple([
      spaces.Tuple([                # players
        spaces.MultiDiscrete([
          1,                   # emptyplayer
          n_seats - 1,         # seat
          max_limit,           # stack
          1,                   # is_playing_hand
          max_limit,           # handrank
          1,                   # playedthisround
          1,                   # is_betting
          1,                   # isallin
          max_limit,           # last side pot
        ]),
        spaces.Tuple([
          spaces.MultiDiscrete([    # hand
            n_suits,          # suit, can be negative one if it's not avaiable.
            n_ranks,          # rank, can be negative one if it's not avaiable.
          ])
        ] * n_pocket_cards)
      ] * n_seats),
      spaces.Tuple([
        spaces.Discrete(n_seats - 1), # big blind location
        spaces.Discrete(max_limit),   # small blind
        spaces.Discrete(max_limit),   # big blind
        spaces.Discrete(max_limit),   # pot amount
        spaces.Discrete(max_limit),   # last raise
        spaces.Discrete(max_limit),   # minimum amount to raise
        spaces.Discrete(max_limit),   # how much needed to call by current player.
        spaces.Discrete(n_seats - 1), # current player seat location.
        spaces.MultiDiscrete([        # community cards
          n_suits - 1,          # suit
          n_ranks - 1,          # rank
          1,                     # is_flopped
        ]),
      ] * n_stud),
    ])

    self.action_space = spaces.Tuple([
      spaces.MultiDiscrete([
        3,                     # action_id
        max_limit,             # raise_amount
      ]),
    ] * n_seats)

  def seed(self, seed=None):
    _, seed = seeding.np_random(seed)
    return [seed]

  def add_player(self, seat_id, stack=2000):
    """Add a player to the environment seat with the given stack (chipcount)"""
    player_id = seat_id
    if player_id not in self._player_dict:
      new_player = Player(player_id, stack=stack, emptyplayer=False)
      if self._seats[player_id].emptyplayer:
        self._seats[player_id] = new_player
        new_player.set_seat(player_id)
      else:
        raise error.Error('Seat already taken.')
      self._player_dict[player_id] = new_player
      self.emptyseats -= 1

  def remove_player(self, seat_id):
    """Remove a player from the environment seat."""
    player_id = seat_id
    try:
      idx = self._seats.index(self._player_dict[player_id])
      self._seats[idx] = Player(0, stack=0, emptyplayer=True)
      del self._player_dict[player_id]
      self.emptyseats += 1
    except ValueError:
      pass

  def reset(self):
    self._reset_game()
    self._ready_players()
    self._number_of_hands = 1
    [self._smallblind, self._bigblind] = TexasHoldemEnv.BLIND_INCREMENTS[0]
    if (self.emptyseats < len(self._seats) - 1):
      players = [p for p in self._seats if p.playing_hand]
      self._new_round()
      self._round = 0
      self._current_player = self._first_to_act(players)
      self._post_smallblind(self._current_player)
      self._current_player = self._next(players, self._current_player)
      self._post_bigblind(self._current_player)
      self._current_player = self._next(players, self._current_player)
      self._tocall = self._bigblind
      self._round = 0
      self._deal_next_round()
      self._folded_players = []
    return self._get_current_reset_returns()

  def step(self, actions):
    """
    CHECK = 0
    CALL = 1
    RAISE = 2
    FOLD = 3

    RAISE_AMT = [0, minraise]
    """
    if len(actions) != len(self._seats):
      raise error.Error('actions must be same shape as number of seats.')

    if self._current_player is None:
      raise error.Error('Round cannot be played without 2 or more players.')

    if self._round == 4:
      raise error.Error('Rounds already finished, needs to be reset.')

    players = [p for p in self._seats if p.playing_hand]
    if len(players) == 1:
      raise error.Error('Round cannot be played with one player.')

    self._last_player = self._current_player
    self._last_actions = actions

    if not self._current_player.playedthisround and len([p for p in players if not p.isallin]) >= 1:
      if self._current_player.isallin:
        self._current_player = self._next(players, self._current_player)
        return self._get_current_step_returns(False)

      move = self._current_player.player_move(
          self._output_state(self._current_player), actions[self._current_player.player_id])

      if move[0] == 'call':
        self._player_bet(self._current_player, self._tocall)
        if self._debug:
          print('Player', self._current_player.player_id, move)
        self._current_player = self._next(players, self._current_player)
      elif move[0] == 'check':
        self._player_bet(self._current_player, self._current_player.currentbet)
        if self._debug:
          print('Player', self._current_player.player_id, move)
        self._current_player = self._next(players, self._current_player)
      elif move[0] == 'raise':
        self._player_bet(self._current_player, move[1]+self._current_player.currentbet)
        if self._debug:
          print('Player', self._current_player.player_id, move)
        for p in players:
          if p != self._current_player:
            p.playedthisround = False
        self._current_player = self._next(players, self._current_player)
      elif move[0] == 'fold':
        self._current_player.playing_hand = False
        folded_player = self._current_player
        if self._debug:
          print('Player', self._current_player.player_id, move)
        self._current_player = self._next(players, self._current_player)
        players.remove(folded_player)
        self._folded_players.append(folded_player)
        # break if a single player left
        if len(players) == 1:
          self._resolve(players)
    if all([player.playedthisround for player in players]):
      self._resolve(players)

    terminal = False
    if all([player.isallin for player in players]):
      while self._round < 4:
        self._deal_next_round()
        self._round += 1
    if self._round == 4 or len(players) == 1:
      terminal = True
      self._resolve_round(players)
    return self._get_current_step_returns(terminal)

  def render(self, mode='human', close=False):
    print('total pot: {}'.format(self._totalpot))
    if self._last_actions is not None:
      pid = self._last_player.player_id
      print('last action by player {}:'.format(pid))
      print(format_action(self._last_player, self._last_actions[pid]))

    (player_states, community_states) = self._get_current_state()
    (player_infos, player_hands) = zip(*player_states)
    (community_infos, community_cards) = community_states

    print('community:')
    print('-' + hand_to_str(community_cards))
    print('players:')
    for idx, hand in enumerate(player_hands):
      print('{}{}stack: {}'.format(idx, hand_to_str(hand), self._seats[idx].stack))

  def _resolve(self, players):
    self._current_player = self._first_to_act(players)
    self._resolve_sidepots(players + self._folded_players)
    self._new_round()
    self._deal_next_round()
    if self._debug:
      print('totalpot', self._totalpot)

  def _deal_next_round(self):
    if self._round == 0:
      self._deal()
    elif self._round == 1:
      self._flop()
    elif self._round == 2:
      self._turn()
    elif self._round == 3:
      self._river()

  def _increment_blinds(self):
    self._blind_index = min(self._blind_index + 1, len(TexasHoldemEnv.BLIND_INCREMENTS) - 1)
    [self._smallblind, self._bigblind] = TexasHoldemEnv.BLIND_INCREMENTS[self._blind_index]

  def _post_smallblind(self, player):
    if self._debug:
      print('player ', player.player_id, 'small blind', self._smallblind)
    self._player_bet(player, self._smallblind)
    player.playedthisround = False

  def _post_bigblind(self, player):
    if self._debug:
      print('player ', player.player_id, 'big blind', self._bigblind)
    self._player_bet(player, self._bigblind)
    player.playedthisround = False
    self._lastraise = self._bigblind

  def _player_bet(self, player, total_bet):
    # relative_bet is how much _additional_ money is the player betting this turn,
    # on top of what they have already contributed
    # total_bet is the total contribution by player to pot in this round
    relative_bet = min(player.stack, total_bet - player.currentbet)
    player.bet(relative_bet + player.currentbet)

    self._totalpot += relative_bet
    self._tocall = max(self._tocall, total_bet)
    if self._tocall > 0:
      self._tocall = max(self._tocall, self._bigblind)
    self._lastraise = max(self._lastraise, relative_bet  - self._lastraise)

  def _first_to_act(self, players):
    if self._round == 0 and len(players) == 2:
      return self._next(sorted(
          players + [self._seats[self._button]], key=lambda x:x.get_seat()),
          self._seats[self._button])
    try:
      first = [player for player in players if player.get_seat() > self._button][0]
    except IndexError:
      first = players[0]
    return first

  def _next(self, players, current_player):
    idx = players.index(current_player)
    return players[(idx+1) % len(players)]

  def _deal(self):
    for player in self._seats:
      if player.playing_hand:
        player.hand = self._deck.draw(2)

  def _flop(self):
    self._discard.append(self._deck.draw(1)) #burn
    self.community = self._deck.draw(3)

  def _turn(self):
    self._discard.append(self._deck.draw(1)) #burn
    self.community.append(self._deck.draw(1))

  def _river(self):
    self._discard.append(self._deck.draw(1)) #burn
    self.community.append(self._deck.draw(1))

  def _ready_players(self):
    for p in self._seats:
      if not p.emptyplayer and p.sitting_out:
        p.sitting_out = False
        p.playing_hand = True

  def _resolve_sidepots(self, players_playing):
    players = [p for p in players_playing if p.currentbet]
    if self._debug:
      print('current bets: ', [p.currentbet for p in players])
      print('playing hand: ', [p.playing_hand for p in players])
    if not players:
      return
    try:
      smallest_bet = min([p.currentbet for p in players if p.playing_hand])
    except ValueError:
      for p in players:
        self._side_pots[self._current_sidepot] += p.currentbet
        p.currentbet = 0
      return

    smallest_players_allin = [p for p, bet in zip(players, [p.currentbet for p in players]) if bet == smallest_bet and p.isallin]

    for p in players:
      self._side_pots[self._current_sidepot] += min(smallest_bet, p.currentbet)
      p.currentbet -= min(smallest_bet, p.currentbet)
      p.lastsidepot = self._current_sidepot

    if smallest_players_allin:
      self._current_sidepot += 1
      self._resolve_sidepots(players)
    if self._debug:
      print('sidepots: ', self._side_pots)

  def _new_round(self):
    for player in self._player_dict.values():
      player.currentbet = 0
      player.playedthisround = False
    self._round += 1
    self._tocall = 0
    self._lastraise = 0

  def _resolve_round(self, players):
    if len(players) == 1:
      players[0].refund(sum(self._side_pots))
      self._totalpot = 0
    else:
      # compute hand ranks
      for player in players:
        player.handrank = self._evaluator.evaluate(player.hand, self.community)

      # trim side_pots to only include the non-empty side pots
      temp_pots = [pot for pot in self._side_pots if pot > 0]

      # compute who wins each side pot and pay winners
      for pot_idx,_ in enumerate(temp_pots):
        # find players involved in given side_pot, compute the winner(s)
        pot_contributors = [p for p in players if p.lastsidepot >= pot_idx]
        winning_rank = min([p.handrank for p in pot_contributors])
        winning_players = [p for p in pot_contributors if p.handrank == winning_rank]

        for player in winning_players:
          split_amount = int(self._side_pots[pot_idx]/len(winning_players))
          if self._debug:
            print('Player', player.player_id, 'wins side pot (', int(self._side_pots[pot_idx]/len(winning_players)), ')')
          player.refund(split_amount)
          self._side_pots[pot_idx] -= split_amount

        # any remaining chips after splitting go to the winner in the earliest position
        if self._side_pots[pot_idx]:
          earliest = self._first_to_act([player for player in winning_players])
          earliest.refund(self._side_pots[pot_idx])

  def _reset_game(self):
    playing = 0
    for player in self._seats:
      if not player.emptyplayer and not player.sitting_out:
        player.reset_hand()
        playing += 1
    self.community = []
    self._current_sidepot = 0
    self._totalpot = 0
    self._side_pots = [0] * len(self._seats)
    self._deck.shuffle()

    if playing:
      self._button = (self._button + 1) % len(self._seats)
      while not self._seats[self._button].playing_hand:
        self._button = (self._button + 1) % len(self._seats)

  def _output_state(self, current_player):
    return {
      'players': [player.player_state() for player in self._seats],
      'community': self.community,
      'my_seat': current_player.get_seat(),
      'pocket_cards': current_player.hand,
      'pot': self._totalpot,
      'button': self._button,
      'tocall': (self._tocall - current_player.currentbet),
      'stack': current_player.stack,
      'bigblind': self._bigblind,
      'player_id': current_player.player_id,
      'lastraise': self._lastraise,
      'minraise': max(self._bigblind, self._lastraise + self._tocall),
    }

  def _pad(self, l, n, v):
    if (not l) or (l is None):
      l = []
    return l + [v] * (n - len(l))

  def _get_current_state(self):
    player_states = []
    for player in self._seats:
      player_features = [
        int(player.emptyplayer),
        int(player.get_seat()),
        int(player.stack),
        int(player.playing_hand),
        int(player.handrank),
        int(player.playedthisround),
        int(player.betting),
        int(player.isallin),
        int(player.lastsidepot),
      ]
      player_states.append((player_features, self._pad(player.hand, 2, -1)))
    community_states = ([
      int(self._button),
      int(self._smallblind),
      int(self._bigblind),
      int(self._totalpot),
      int(self._lastraise),
      int(max(self._bigblind, self._lastraise + self._tocall)),
      int(self._tocall - self._current_player.currentbet),
      int(self._current_player.player_id),
    ], self._pad(self.community, 5, -1))
    return (tuple(player_states), community_states)

  def _get_current_reset_returns(self):
    return self._get_current_state()

  def _get_current_step_returns(self, terminal):
    obs = self._get_current_state()
    # TODO, make this something else?
    rew = [player.stack for player in self._seats]
    return obs, rew, terminal, [] # TODO, return some info?


================================================
FILE: holdem/player.py
================================================
# -*- coding: utf-8 -*-
#
# Copyright (c) 2016 Aleksander Beloi (beloi.alex@gmail.com)
# Copyright (c) 2018 Sam Wenke (samwenke@gmail.com)
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
from random import randint

from gym import error

from treys import Card


class Player(object):

  CHECK = 0
  CALL = 1
  RAISE = 2
  FOLD = 3

  def __init__(self, player_id, stack=2000, emptyplayer=False):
    self.player_id = player_id

    self.hand = []
    self.stack = stack
    self.currentbet = 0
    self.lastsidepot = 0
    self._seat = -1
    self.handrank = -1

    # flags for table management
    self.emptyplayer = emptyplayer
    self.betting = False
    self.isallin = False
    self.playing_hand = False
    self.playedthisround = False
    self.sitting_out = True

  def get_seat(self):
    return self._seat

  def set_seat(self, value):
    self._seat = value

  def reset_hand(self):
    self._hand = []
    self.playedthisround = False
    self.betting = False
    self.isallin = False
    self.currentbet = 0
    self.lastsidepot = 0
    self.playing_hand = (self.stack != 0)

  def bet(self, bet_size):
    self.playedthisround = True
    if not bet_size:
      return
    self.stack -= (bet_size - self.currentbet)
    self.currentbet = bet_size
    if self.stack == 0:
      self.isallin = True

  def refund(self, ammount):
    self.stack += ammount

  def player_state(self):
    return (self.get_seat(), self.stack, self.playing_hand, self.betting, self.player_id)

  def reset_stack(self):
    self.stack = 2000

  def update_localstate(self, table_state):
    self.stack = table_state.get('stack')
    self.hand = table_state.get('pocket_cards')

  # cleanup
  def player_move(self, table_state, action):
    self.update_localstate(table_state)
    bigblind = table_state.get('bigblind')
    tocall = min(table_state.get('tocall', 0), self.stack)
    minraise = table_state.get('minraise', 0)

    [action_idx, raise_amount] = action
    raise_amount = int(raise_amount)
    action_idx = int(action_idx)

    if tocall == 0:
      assert action_idx in [Player.CHECK, Player.RAISE]
      if action_idx == Player.RAISE:
        if raise_amount < minraise:
          raise error.Error('raise must be greater than minraise {}'.format(minraise))
        if raise_amount > self.stack:
          raise error.Error('raise must be less than maxraise {}'.format(self.stack))
        move_tuple = ('raise', raise_amount)
      elif action_idx == Player.CHECK:
        move_tuple = ('check', 0)
      else:
        raise error.Error('invalid action ({}) must be check (0) or raise (2)'.format(action_idx))
    else:
      if action_idx not in [Player.RAISE, Player.CALL, Player.FOLD]:
        raise error.Error('invalid action ({}) must be raise (2), call (1), or fold (3)'.format(action_idx))
      if action_idx == Player.RAISE:
        if raise_amount < minraise:
          raise error.Error('raise must be greater than minraise {}'.format(minraise))
        if raise_amount > self.stack:
          raise error.Error('raise must be less than maxraise {}'.format(self.stack))
        move_tuple = ('raise', raise_amount)
      elif action_idx == Player.CALL:
        move_tuple = ('call', tocall)
      elif action_idx == Player.FOLD:
        move_tuple = ('fold', -1)
      else:
        raise error.Error('invalid action ({}) must be raise (2), call (1), or fold (3)'.format(action_idx))
    return move_tuple


================================================
FILE: holdem/utils.py
================================================
# -*- coding: utf-8 -*-
#
# Copyright (c) 2018 Sam Wenke (samwenke@gmail.com)
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
from treys import Card


class action_table:
  CHECK = 0
  CALL = 1
  RAISE = 2
  FOLD = 3
  NA = 0


def format_action(player, action):
  color = False
  try:
    from termcolor import colored
    # for mac, linux: http://pypi.python.org/pypi/termcolor
    # can use for windows: http://pypi.python.org/pypi/colorama
    color = True
  except ImportError:
    pass
  [aid, raise_amt] = action
  if aid == action_table.CHECK:
    text = '_ check'
    if color:
      text = colored(text, 'white')
    return text
  if aid == action_table.CALL:
    text = '- call, current bet: {}'.format(player.currentbet)
    if color:
      text = colored(text, 'yellow')
    return text
  if aid == action_table.RAISE:
    text = '^ raise, current bet: {}'.format(raise_amt)
    if color:
      text = colored(text, 'green')
    return text
  if aid == action_table.FOLD:
    text = 'x fold'
    if color:
      text = colored(text, 'red')
    return text


def card_to_str(card):
  if card == -1:
    return ''
  return Card.int_to_pretty_str(card)


def hand_to_str(hand):
  output = " "
  for i in range(len(hand)):
    c = hand[i]
    if c == -1:
      if i != len(hand) - 1:
        output += '[  ],'
      else:
        output += '[  ] '
      continue
    if i != len(hand) - 1:
      output += str(Card.int_to_pretty_str(c)) + ','
    else:
      output += str(Card.int_to_pretty_str(c)) + ' '
  return output


def safe_actions(community_infos, n_seats):
  current_player = community_infos[-1]
  to_call = community_infos[-2]
  actions = [[action_table.CHECK, action_table.NA]] * n_seats
  if to_call > 0:
    actions[current_player] = [action_table.CALL, action_table.NA]
  return actions


================================================
FILE: setup.py
================================================
# -*- coding: utf-8 -*-
# 
# Copyright (c) 2018 Sam Wenke (samwenke@gmail.com)
#
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
from setuptools import setup, find_packages

with open("README.md") as readme:
  long_description = readme.read()

setup(
  name='holdem',
  version='1.0.0',
  long_description=long_description,
  url='https://github.com/wenkesj/holdem',
  author='Sam Wenke',
  author_email='samwenke@gmail.com',
  license='MIT',
  description=('OpenAI Gym No-Limit Texas Holdem Environment.'),
  packages=find_packages(exclude=['test', 'examples']),
  install_requires=['treys', 'gym'],
  platforms='any',
)