Skip to content

Commit 5fc22e2

Browse files
committed
Update docs and examples
Former-commit-id: 3ee441d
1 parent ea2e041 commit 5fc22e2

File tree

10 files changed

+377
-975
lines changed

10 files changed

+377
-975
lines changed

CONTRIBUTING.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ Contribution to this project is greatly appreciated! If you find any bugs or hav
66
* **Game Specific Configurations.** Now we plan to gradually support game specific configurations. Currently we only support specifying the number of players in Blackjack
77
* **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
88
* **More Games and Algorithms.** Develop more games and algorithms.
9-
* **Keras Implementation** Provide Keras Implementation of the algorithms.
109
* **Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
1110

1211
## How to Create a Pull Request

README.md

Lines changed: 68 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
[![Downloads](https://pepy.tech/badge/rlcard)](https://pepy.tech/project/rlcard)
88
[![Downloads](https://pepy.tech/badge/rlcard/month)](https://pepy.tech/project/rlcard)
99

10-
RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University and community contributors.
10+
RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces for implementing various reinforcement learning and searching algorithms. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University and community contributors.
1111

12-
* Official Website: [http://www.rlcard.org](http://www.rlcard.org)
12+
* Official Website: [https://www.rlcard.org](https://www.rlcard.org)
1313
* Tutorial in Jupyter Notebook: [https://github.com/datamllab/rlcard-tutorial](https://github.com/datamllab/rlcard-tutorial)
1414
* Paper: [https://arxiv.org/abs/1910.04376](https://arxiv.org/abs/1910.04376)
1515
* GUI: [RLCard-Showdown](https://github.com/datamllab/rlcard-showdown)
@@ -32,71 +32,82 @@ RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports m
3232

3333
## Cite this work
3434
If you find this repo useful, you may cite:
35+
36+
Zha, Daochen, et al. "RLCard: A Platform for Reinforcement Learning in Card Games." IJCAI. 2020.
37+
3538
```bibtex
36-
@article{zha2019rlcard,
37-
title={RLCard: A Toolkit for Reinforcement Learning in Card Games},
38-
author={Zha, Daochen and Lai, Kwei-Herng and Cao, Yuanpu and Huang, Songyi and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
39-
journal={arXiv preprint arXiv:1910.04376},
40-
year={2019}
39+
@inproceedings{DBLP:conf/ijcai/ZhaLHCRVNWGH20,
40+
author = {Daochen Zha and
41+
Kwei{-}Herng Lai and
42+
Songyi Huang and
43+
Yuanpu Cao and
44+
Keerthana Reddy and
45+
Juan Vargas and
46+
Alex Nguyen and
47+
Ruzhe Wei and
48+
Junyu Guo and
49+
Xia Hu},
50+
title = {RLCard: {A} Platform for Reinforcement Learning in Card Games},
51+
booktitle = {{IJCAI}},
52+
pages = {5264--5266},
53+
publisher = {ijcai.org},
54+
year = {2020}
4155
}
4256
```
4357

4458
## Installation
45-
Make sure that you have **Python 3.5+** and **pip** installed. We recommend installing the latest version of `rlcard` with `pip`:
59+
Make sure that you have **Python 3.6+** and **pip** installed. We recommend installing the stable version of `rlcard` with `pip`:
4660

4761
```
48-
git clone https://github.com/datamllab/rlcard.git
49-
cd rlcard
50-
pip install -e .
51-
```
52-
Alternatively, you can install the latest stable version with:
53-
```
54-
pip install rlcard
62+
pip3 install rlcard
5563
```
56-
The default installation will only include the card environments. To use Tensorflow implementation of the example algorithms, install the supported verison of Tensorflow with:
64+
Alternatively, you can install the latest version with:
5765
```
58-
pip install rlcard[tensorflow]
66+
git clone https://github.com/datamllab/rlcard.git
67+
cd rlcard
68+
pip3 install -e .
5969
```
60-
To try PyTorch implementations, please run:
70+
The default installation will only include the card environments. To use PyTorch implementation of the training algorithms, run
6171
```
62-
pip install rlcard[torch]
72+
pip3 install rlcard[training]
6373
```
64-
If you meet any problems when installing PyTorch with the command above, you may follow the instructions on [PyTorch official website](https://pytorch.org/get-started/locally/) to manually install PyTorch.
6574

6675
We also provide [**conda** installation method](https://anaconda.org/toubun/rlcard):
6776

6877
```
6978
conda install -c toubun rlcard
7079
```
7180

72-
Conda installation only provides the card environments, you need to manually install Tensorflow or Pytorch on your demands.
81+
Conda installation only provides the card environments, you need to manually install Pytorch on your demands.
7382

7483
## Examples
75-
Please refer to [examples/](examples). A **short example** is as below.
84+
A **short example** is as below.
7685

7786
```python
7887
import rlcard
7988
from rlcard.agents import RandomAgent
8089

90+
print(env.num_actions) # 2
91+
print(env.num_players) # 1
92+
print(env.state_shape) # [[2]]
93+
print(env.action_shape) # [None]
94+
8195
env = rlcard.make('blackjack')
82-
env.set_agents([RandomAgent(action_num=env.action_num)])
96+
env.set_agents([RandomAgent(num_actions=env.num_actions)])
8397

8498
trajectories, payoffs = env.run()
8599
```
86100

87-
We also recommend the following **toy examples** in Python.
101+
RLCard can be flexibly connected to various algorithms. See the following examples:
88102

89103
* [Playing with random agents](docs/toy-examples.md#playing-with-random-agents)
90104
* [Deep-Q learning on Blackjack](docs/toy-examples.md#deep-q-learning-on-blackjack)
91-
* [Running multiple processes](docs/toy-examples.md#running-multiple-processes)
92105
* [Training CFR (chance sampling) on Leduc Hold'em](docs/toy-examples.md#training-cfr-on-leduc-holdem)
93106
* [Having fun with pretrained Leduc model](docs/toy-examples.md#having-fun-with-pretrained-leduc-model)
94-
* [Leduc Hold'em as single-agent environment](docs/toy-examples.md#leduc-holdem-as-single-agent-environment)
95-
96-
R examples can be found [here](docs/toy-examples-r.md).
107+
* [Training DMC on Dou Dizhu](docs/toy-examples.md#training-dmc-on-dou-dizhu)
97108

98109
## Demo
99-
Run `examples/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found [here](docs/games.md#leduc-holdem).
110+
Run `examples/human/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found [here](docs/games.md#leduc-holdem).
100111

101112
```
102113
>> Leduc Hold'em pre-trained model
@@ -146,33 +157,48 @@ We provide a complexity estimation for the games on several aspects. **InfoSet N
146157
| Leduc Hold’em ([paper](http://poker.cs.ualberta.ca/publications/UAI05.pdf)) | 10^2 | 10^2 | 10^0 | leduc-holdem | [doc](docs/games.md#leduc-holdem), [example](examples/leduc_holdem_random.py) |
147158
| Limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^14 | 10^3 | 10^0 | limit-holdem | [doc](docs/games.md#limit-texas-holdem), [example](examples/limit_holdem_random.py) |
148159
| Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [baike](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin)) | 10^53 ~ 10^83 | 10^23 | 10^4 | doudizhu | [doc](docs/games.md#dou-dizhu), [example](examples/doudizhu_random.py) |
149-
| Simple Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [baike](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin)) | - | - | - | simple-doudizhu | [doc](docs/games.md#simple-dou-dizhu), [example](examples/simple_doudizhu_random.py) |
150160
| Mahjong ([wiki](https://en.wikipedia.org/wiki/Competition_Mahjong_scoring_rules), [baike](https://baike.baidu.com/item/%E9%BA%BB%E5%B0%86/215)) | 10^121 | 10^48 | 10^2 | mahjong | [doc](docs/games.md#mahjong), [example](examples/mahjong_random.py) |
151161
| No-limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^162 | 10^3 | 10^4 | no-limit-holdem | [doc](docs/games.md#no-limit-texas-holdem), [example](examples/nolimit_holdem_random.py) |
152162
| UNO ([wiki](https://en.wikipedia.org/wiki/Uno_\(card_game\)), [baike](https://baike.baidu.com/item/UNO%E7%89%8C/2249587)) | 10^163 | 10^10 | 10^1 | uno | [doc](docs/games.md#uno), [example](examples/uno_random.py) |
153163
| Gin Rummy ([wiki](https://en.wikipedia.org/wiki/Gin_rummy), [baike](https://baike.baidu.com/item/%E9%87%91%E6%8B%89%E7%B1%B3/3471710)) | 10^52 | - | - | gin-rummy | [doc](docs/games.md#gin-rummy), [example](examples/gin_rummy_random.py) |
154164

165+
## Supported Algorithms
166+
| Algorithm | example | reference |
167+
| :--------------------------------------: | :-----------------------------------------: | :------------------------------------------------------------------------------------------------------: |
168+
| Deep Monte-Carlo (DMC) | [examples/run\_dmc.py](examples/run_dmc.py) | |
169+
| Deep Q-Learning (DQN) | [examples/run\_rl.py](examples/run_rl.py) | [[paper]](https://arxiv.org/abs/1312.5602) |
170+
| Neural Fictitious Self-Play (NFSP) | [examples/run\_rl.py](examples/run_rl.py) | [[paper]](https://arxiv.org/abs/1603.01121) |
171+
| Counterfactual Regret Minimization (CFR) | [examples/run\_cfr.py](examples/run_cfr.py) | [[paper]](http://papers.nips.cc/paper/3306-regret-minimization-in-games-with-incomplete-information.pdf) |
172+
173+
## Pre-trained and Rule-based Models
174+
We provide a [model zoo](rlcard/models) to serve as the baselines.
175+
176+
| Model | Explanation |
177+
| :--------------------------------------: | :------------------------------------------------------: |
178+
| leduc-holdem-cfr | Pre-trained CFR (chance sampling) model on Leduc Hold'em |
179+
| leduc-holdem-rule-v1 | Rule-based model for Leduc Hold'em, v1 |
180+
| leduc-holdem-rule-v2 | Rule-based model for Leduc Hold'em, v2 |
181+
| uno-rule-v1 | Rule-based model for UNO, v1 |
182+
| limit-holdem-rule-v1 | Rule-based model for Limit Texas Hold'em, v1 |
183+
| doudizhu-rule-v1 | Rule-based model for Dou Dizhu, v1 |
184+
| gin-rummy-novice-rule | Gin Rummy novice rule model |
185+
155186
## API Cheat Sheet
156187
### How to create an environment
157188
You can use the the following interface to make an environment. You may optionally specify some configurations with a dictionary.
158189
* **env = rlcard.make(env_id, config={})**: Make an environment. `env_id` is a string of a environment; `config` is a dictionary that specifies some environment configurations, which are as follows.
159190
* `seed`: Default `None`. Set a environment local random seed for reproducing the results.
160-
* `env_num`: Default `1`. It specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration.
161191
* `allow_step_back`: Defualt `False`. `True` if allowing `step_back` function to traverse backward in the tree.
162-
* `allow_raw_data`: Default `False`. `True` if allowing raw data in the `state`.
163-
* `single_agent_mode`: Default `False`. `True` if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.
164-
* `active_player`: Defualt `0`. If `single_agent_mode` is `True`, `active_player` will specify operating on which player in single agent mode.
165-
* `record_action`: Default `False`. If `True`, a field of `action_record` will be in the `state` to record the historical actions. This may be used for human-agent play.
166-
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_player_num` in Blackjack.
192+
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_num_players` in Blackjack, .
167193

168194
Once the environemnt is made, we can access some information of the game.
169-
* **env.action_num**: The number of actions.
170-
* **env.player_num**: The number of players.
171-
* **env.state_space**: Ther state space of the observations.
172-
* **env.timestep**: The number of timesteps stepped by the environment.
195+
* **env.num_actions**: The number of actions.
196+
* **env.num_players**: The number of players.
197+
* **env.state_shape**: The shape of the state space of the observations.
198+
* **env.action_shape**: The shape of the action features (Dou Dizhu's action can encoded as features)
173199

174200
### What is state in RLCard
175-
State is a Python dictionary. It will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will also have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
201+
State is a Python dictionary. It consists of observation `state['obs']`, legal actions `state['legal_actions']`, raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
176202

177203
### Basic interfaces
178204
The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow [agent template](docs/developping-algorithms.md).
@@ -190,9 +216,6 @@ For advanced usage, the following interfaces allow flexible operations on the ga
190216
* **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
191217
* **env.get_perfect_information()**: (Currently only support some of the games) Obtain the perfect information at the current state.
192218

193-
### Running with multiple processes
194-
RLCard now supports acceleration with multiple processes. Simply change `env_num` when making the environment to indicate how many processes would be used. Currenly we only support `run()` function with multiple processes. An example is [DQN on blackjack](docs/toy-examples.md#running-multiple-processes)
195-
196219
## Library Structure
197220
The purposes of the main modules are listed as below:
198221

@@ -208,7 +231,7 @@ The purposes of the main modules are listed as below:
208231
For more documentation, please refer to the [Documents](docs/README.md) for general introductions. API documents are available at our [website](http://www.rlcard.org).
209232

210233
## Contributing
211-
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to [Contributing Guide](./CONTRIBUTING.md).
234+
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to [Contributing Guide](./CONTRIBUTING.md). If you have any questions, please contact [Daochen Zha](https://github.com/daochenzha) with [[email protected]](mailto:[email protected]).
212235

213236
## Acknowledgements
214237
We would like to thank JJ World Network Technology Co.,LTD for the generous support and all the contributions from the community contributors.

docs/adding-new-environments.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Adding New Environments
22
To add a new environment to the toolkit, generally you should take the following steps:
3-
* **Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player`, as in existing games. The easiest way is to inherit the classed in [rlcard/core.py](../rlcard/core.py) and implement the functions.
3+
* **Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player`, as in existing games. The easiest way is to inherit the classed in [rlcard/games/base.py](../rlcard/games/base.py) and implement the functions.
44
* **Wrap the game with an environment.** The easiest way is to inherit `Env` in [rlcard/envs/env.py](../rlcard/env/env.py). You need to implement `_extract_state` which encodes the state, `_decode_action` which decodes actions from the id to the text string, and `get_payoffs` which calculates payoffs of the players.
55
* **Register the game.** Now it is time to tell the toolkit where to locate the new environment. Go to [rlcard/envs/\_\_init\_\_.py](../rlcard/envs/__init__.py), and indicate the name of the game and its entry point.
66

docs/algorithms.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
# Index
22

3+
* [DMC](algorithms.md#deep-monte-carlo)
34
* [Deep-Q Learning](algorithms.md#deep-q-learning)
45
* [NFSP](algorithms.md#nfsp)
5-
* [CFR (chance sampling)](docs/algorithms.md#cfr)
6-
* [DeepCFR](docs/algorithms.md#deepcfr)
6+
* [CFR (chance sampling)](algorithms.md#cfr)
7+
8+
## Deep Monte-Carlo
9+
Deep Monte-Carlo (DMC) is a very effective algorithm for card games. This is the only algorithm that shows human-level performance on complex games such as Dou Dizhu.
710

811
## Deep-Q Learning
912
Deep-Q Learning (DQN) [[paper]](https://arxiv.org/abs/1312.5602) is a basic reinforcement learning (RL) algorithm. We wrap DQN as an example to show how RL algorithms can be connected to the environments. In the DQN agent, the following classes are implemented:
@@ -17,10 +20,3 @@ Neural Fictitious Self-Play (NFSP) [[paper]](https://arxiv.org/abs/1603.01121) e
1720

1821
## CFR (chance sampling)
1922
Counterfactual Regret Minimization (CFR) [[paper]](http://papers.nips.cc/paper/3306-regret-minimization-in-games-with-incomplete-information.pdf) is a regret minimizaiton method for solving imperfect information games.
20-
21-
## DeepCFR
22-
Deep Counterfactual Regret Minimization (DeepCFR) [[paper]](https://arxiv.org/abs/1811.00164) is a state-of-the-art framework for solving imperfect-information games.
23-
We wrap DeepCFR as an example to show how state-of-the-art framework can be connected to the environments. In the DeepCFR, the following classes are implemented:
24-
25-
* `DeepCFR`: The DeepCFR class that interacts with the environment.
26-
* `Fixed Size Ring Buffer`: A memory buffer that manages the storing and sampling of transitions.

0 commit comments

Comments
 (0)