You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,6 @@ Contribution to this project is greatly appreciated! If you find any bugs or hav
6
6
***Game Specific Configurations.** Now we plan to gradually support game specific configurations. Currently we only support specifying the number of players in Blackjack
7
7
***Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
8
8
***More Games and Algorithms.** Develop more games and algorithms.
9
-
***Keras Implementation** Provide Keras Implementation of the algorithms.
10
9
***Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University and community contributors.
10
+
RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces for implementing various reinforcement learning and searching algorithms. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University and community contributors.
11
11
12
-
* Official Website: [http://www.rlcard.org](http://www.rlcard.org)
12
+
* Official Website: [https://www.rlcard.org](https://www.rlcard.org)
13
13
* Tutorial in Jupyter Notebook: [https://github.com/datamllab/rlcard-tutorial](https://github.com/datamllab/rlcard-tutorial)
@@ -32,71 +32,82 @@ RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports m
32
32
33
33
## Cite this work
34
34
If you find this repo useful, you may cite:
35
+
36
+
Zha, Daochen, et al. "RLCard: A Platform for Reinforcement Learning in Card Games." IJCAI. 2020.
37
+
35
38
```bibtex
36
-
@article{zha2019rlcard,
37
-
title={RLCard: A Toolkit for Reinforcement Learning in Card Games},
38
-
author={Zha, Daochen and Lai, Kwei-Herng and Cao, Yuanpu and Huang, Songyi and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
39
-
journal={arXiv preprint arXiv:1910.04376},
40
-
year={2019}
39
+
@inproceedings{DBLP:conf/ijcai/ZhaLHCRVNWGH20,
40
+
author = {Daochen Zha and
41
+
Kwei{-}Herng Lai and
42
+
Songyi Huang and
43
+
Yuanpu Cao and
44
+
Keerthana Reddy and
45
+
Juan Vargas and
46
+
Alex Nguyen and
47
+
Ruzhe Wei and
48
+
Junyu Guo and
49
+
Xia Hu},
50
+
title = {RLCard: {A} Platform for Reinforcement Learning in Card Games},
51
+
booktitle = {{IJCAI}},
52
+
pages = {5264--5266},
53
+
publisher = {ijcai.org},
54
+
year = {2020}
41
55
}
42
56
```
43
57
44
58
## Installation
45
-
Make sure that you have **Python 3.5+** and **pip** installed. We recommend installing the latest version of `rlcard` with `pip`:
59
+
Make sure that you have **Python 3.6+** and **pip** installed. We recommend installing the stable version of `rlcard` with `pip`:
46
60
47
61
```
48
-
git clone https://github.com/datamllab/rlcard.git
49
-
cd rlcard
50
-
pip install -e .
51
-
```
52
-
Alternatively, you can install the latest stable version with:
53
-
```
54
-
pip install rlcard
62
+
pip3 install rlcard
55
63
```
56
-
The default installation will only include the card environments. To use Tensorflow implementation of the example algorithms, install the supported verison of Tensorflow with:
64
+
Alternatively, you can install the latest version with:
57
65
```
58
-
pip install rlcard[tensorflow]
66
+
git clone https://github.com/datamllab/rlcard.git
67
+
cd rlcard
68
+
pip3 install -e .
59
69
```
60
-
To try PyTorch implementations, please run:
70
+
The default installation will only include the card environments. To use PyTorch implementation of the training algorithms, run
61
71
```
62
-
pip install rlcard[torch]
72
+
pip3 install rlcard[training]
63
73
```
64
-
If you meet any problems when installing PyTorch with the command above, you may follow the instructions on [PyTorch official website](https://pytorch.org/get-started/locally/) to manually install PyTorch.
65
74
66
75
We also provide [**conda** installation method](https://anaconda.org/toubun/rlcard):
67
76
68
77
```
69
78
conda install -c toubun rlcard
70
79
```
71
80
72
-
Conda installation only provides the card environments, you need to manually install Tensorflow or Pytorch on your demands.
81
+
Conda installation only provides the card environments, you need to manually install Pytorch on your demands.
73
82
74
83
## Examples
75
-
Please refer to [examples/](examples). A **short example** is as below.
*[Training CFR (chance sampling) on Leduc Hold'em](docs/toy-examples.md#training-cfr-on-leduc-holdem)
93
106
*[Having fun with pretrained Leduc model](docs/toy-examples.md#having-fun-with-pretrained-leduc-model)
94
-
*[Leduc Hold'em as single-agent environment](docs/toy-examples.md#leduc-holdem-as-single-agent-environment)
95
-
96
-
R examples can be found [here](docs/toy-examples-r.md).
107
+
*[Training DMC on Dou Dizhu](docs/toy-examples.md#training-dmc-on-dou-dizhu)
97
108
98
109
## Demo
99
-
Run `examples/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found [here](docs/games.md#leduc-holdem).
110
+
Run `examples/human/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found [here](docs/games.md#leduc-holdem).
100
111
101
112
```
102
113
>> Leduc Hold'em pre-trained model
@@ -146,33 +157,48 @@ We provide a complexity estimation for the games on several aspects. **InfoSet N
| leduc-holdem-cfr | Pre-trained CFR (chance sampling) model on Leduc Hold'em |
179
+
| leduc-holdem-rule-v1 | Rule-based model for Leduc Hold'em, v1 |
180
+
| leduc-holdem-rule-v2 | Rule-based model for Leduc Hold'em, v2 |
181
+
| uno-rule-v1 | Rule-based model for UNO, v1 |
182
+
| limit-holdem-rule-v1 | Rule-based model for Limit Texas Hold'em, v1 |
183
+
| doudizhu-rule-v1 | Rule-based model for Dou Dizhu, v1 |
184
+
| gin-rummy-novice-rule | Gin Rummy novice rule model |
185
+
155
186
## API Cheat Sheet
156
187
### How to create an environment
157
188
You can use the the following interface to make an environment. You may optionally specify some configurations with a dictionary.
158
189
***env = rlcard.make(env_id, config={})**: Make an environment. `env_id` is a string of a environment; `config` is a dictionary that specifies some environment configurations, which are as follows.
159
190
*`seed`: Default `None`. Set a environment local random seed for reproducing the results.
160
-
*`env_num`: Default `1`. It specifies how many environments running in parallel. If the number is larger than 1, then the tasks will be assigned to multiple processes for acceleration.
161
191
*`allow_step_back`: Defualt `False`. `True` if allowing `step_back` function to traverse backward in the tree.
162
-
*`allow_raw_data`: Default `False`. `True` if allowing raw data in the `state`.
163
-
*`single_agent_mode`: Default `False`. `True` if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.
164
-
*`active_player`: Defualt `0`. If `single_agent_mode` is `True`, `active_player` will specify operating on which player in single agent mode.
165
-
*`record_action`: Default `False`. If `True`, a field of `action_record` will be in the `state` to record the historical actions. This may be used for human-agent play.
166
-
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_player_num` in Blackjack.
192
+
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_num_players` in Blackjack, .
167
193
168
194
Once the environemnt is made, we can access some information of the game.
169
-
***env.action_num**: The number of actions.
170
-
***env.player_num**: The number of players.
171
-
***env.state_space**: Ther state space of the observations.
172
-
***env.timestep**: The number of timesteps stepped by the environment.
195
+
***env.num_actions**: The number of actions.
196
+
***env.num_players**: The number of players.
197
+
***env.state_shape**: The shape of the state space of the observations.
198
+
***env.action_shape**: The shape of the action features (Dou Dizhu's action can encoded as features)
173
199
174
200
### What is state in RLCard
175
-
State is a Python dictionary. It will always have observation `state['obs']` and legal actions `state['legal_actions']`. If `allow_raw_data` is `True`, state will also have raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
201
+
State is a Python dictionary. It consists of observation `state['obs']`, legal actions `state['legal_actions']`, raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
176
202
177
203
### Basic interfaces
178
204
The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow [agent template](docs/developping-algorithms.md).
@@ -190,9 +216,6 @@ For advanced usage, the following interfaces allow flexible operations on the ga
190
216
***env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
191
217
***env.get_perfect_information()**: (Currently only support some of the games) Obtain the perfect information at the current state.
192
218
193
-
### Running with multiple processes
194
-
RLCard now supports acceleration with multiple processes. Simply change `env_num` when making the environment to indicate how many processes would be used. Currenly we only support `run()` function with multiple processes. An example is [DQN on blackjack](docs/toy-examples.md#running-multiple-processes)
195
-
196
219
## Library Structure
197
220
The purposes of the main modules are listed as below:
198
221
@@ -208,7 +231,7 @@ The purposes of the main modules are listed as below:
208
231
For more documentation, please refer to the [Documents](docs/README.md) for general introductions. API documents are available at our [website](http://www.rlcard.org).
209
232
210
233
## Contributing
211
-
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to [Contributing Guide](./CONTRIBUTING.md).
234
+
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to [Contributing Guide](./CONTRIBUTING.md). If you have any questions, please contact [Daochen Zha](https://github.com/daochenzha) with [[email protected]](mailto:[email protected]).
212
235
213
236
## Acknowledgements
214
237
We would like to thank JJ World Network Technology Co.,LTD for the generous support and all the contributions from the community contributors.
Copy file name to clipboardExpand all lines: docs/adding-new-environments.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Adding New Environments
2
2
To add a new environment to the toolkit, generally you should take the following steps:
3
-
***Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player`, as in existing games. The easiest way is to inherit the classed in [rlcard/core.py](../rlcard/core.py) and implement the functions.
3
+
***Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player`, as in existing games. The easiest way is to inherit the classed in [rlcard/games/base.py](../rlcard/games/base.py) and implement the functions.
4
4
***Wrap the game with an environment.** The easiest way is to inherit `Env` in [rlcard/envs/env.py](../rlcard/env/env.py). You need to implement `_extract_state` which encodes the state, `_decode_action` which decodes actions from the id to the text string, and `get_payoffs` which calculates payoffs of the players.
5
5
***Register the game.** Now it is time to tell the toolkit where to locate the new environment. Go to [rlcard/envs/\_\_init\_\_.py](../rlcard/envs/__init__.py), and indicate the name of the game and its entry point.
Deep Monte-Carlo (DMC) is a very effective algorithm for card games. This is the only algorithm that shows human-level performance on complex games such as Dou Dizhu.
7
10
8
11
## Deep-Q Learning
9
12
Deep-Q Learning (DQN) [[paper]](https://arxiv.org/abs/1312.5602) is a basic reinforcement learning (RL) algorithm. We wrap DQN as an example to show how RL algorithms can be connected to the environments. In the DQN agent, the following classes are implemented:
@@ -17,10 +20,3 @@ Neural Fictitious Self-Play (NFSP) [[paper]](https://arxiv.org/abs/1603.01121) e
17
20
18
21
## CFR (chance sampling)
19
22
Counterfactual Regret Minimization (CFR) [[paper]](http://papers.nips.cc/paper/3306-regret-minimization-in-games-with-incomplete-information.pdf) is a regret minimizaiton method for solving imperfect information games.
20
-
21
-
## DeepCFR
22
-
Deep Counterfactual Regret Minimization (DeepCFR) [[paper]](https://arxiv.org/abs/1811.00164) is a state-of-the-art framework for solving imperfect-information games.
23
-
We wrap DeepCFR as an example to show how state-of-the-art framework can be connected to the environments. In the DeepCFR, the following classes are implemented:
24
-
25
-
*`DeepCFR`: The DeepCFR class that interacts with the environment.
26
-
*`Fixed Size Ring Buffer`: A memory buffer that manages the storing and sampling of transitions.
0 commit comments