datamllab
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎.travis.yml‎
Lines changed: 0 additions & 2 deletions b/‎.travis.yml‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 88 additions & 30 deletions b/‎README.md‎
Lines changed: 88 additions & 30 deletions
diff --git a/‎docs/README.md‎
Lines changed: 21 additions & 17 deletions b/‎docs/README.md‎
Lines changed: 21 additions & 17 deletions
diff --git a/‎docs/adding-models.md‎
Lines changed: 7 additions & 0 deletions b/‎docs/adding-models.md‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎docs/adding-new-environments.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/adding-new-environments.md‎
Lines changed: 4 additions & 4 deletions
@@ -15,3 +15,4 @@ docs/rst
 docs/sphinx
 experiments/
 newtest/
+dist/
@@ -2,8 +2,6 @@ language: python
 install: 
   - pip install -e .
 before_script:
-  - pip install matplotlib
-  - pip install dm-sonnet
   - pip install python-coveralls
   - pip install pytest-cover
 script: 
 
@@ -1,52 +1,110 @@
 # RLCard: A Toolkit for Reinforcement Learning in Card Games
 [![Build Status](https://travis-ci.org/datamllab/RLCard.svg?branch=master)](https://travis-ci.org/datamllab/RLCard)
 [![Codacy Badge](https://api.codacy.com/project/badge/Grade/248eb15c086748a4bcc830755f1bd798)](https://www.codacy.com/manual/daochenzha/rlcard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=datamllab/rlcard&amp;utm_campaign=Badge_Grade)
-[![Coverage Status](https://coveralls.io/repos/github/datamllab/rlcard/badge.svg?branch=master)](https://coveralls.io/github/datamllab/rlcard?branch=master)
+[![Coverage Status](https://coveralls.io/repos/github/datamllab/rlcard/badge.svg)](https://coveralls.io/github/datamllab/rlcard?branch=master)
 
-RLCard is a opensource toolkit for developing Reinforcement Learning (RL) algorithms in card games. It supports multiple challenging card game environments with common and easy-to-use interfaces. The  goal  of  the  toolkit  is  to  enable  more  people  to  study  game  AI  and  push  forward  the  research of imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University. **NOTE: The project is still in final testing!**
+RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces. The goal of RLCard is to bridge reinforcement learning and imperfect information games, and push forward the research of reinforcement learning in domains with multiple agents, large state and action space, and sparse reward. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University.
+
+*   Official Website: [http://www.rlcard.org](http://www.rlcard.org)
 
 ## Installation
-Make sure that you have **Python 3.5+** and **pip** installed. You can install `rlcard` with `pip` as follow:
-```console
+Make sure that you have **Python 3.5+** and **pip** installed. We recommend installing `rlcard` with `pip` as follow:
+
+```
 git clone https://github.com/datamllab/rlcard.git
 cd rlcard
 pip install -e .
 ```
-To check whether it is intalled correctly, try the example with random agents:
-```console
-python examples/blackjack_random.py
+
+Or you can directly install the package with
+
+```
+pip install rlcard
 ```
 
-## Getting Started
-The interfaces generally follow [OpenAI gym](https://github.com/openai/gym) style. We recommend starting with the following **toy examples**.
-* [Playing with random agents](docs/toy-examples.md#playing-with-random-agents)
-* [Deep-Q learning on Blackjack](docs/toy-examples.md#deep-q-learning-on-blackjack)
-* [DeepCFR on Blackjack](docs/toy-examples.md#deepcfr-on-blackjack)
+## Examples
+Please refer to [examples/](examples). A **short example** is as below.
+
+```python
+import rlcard
+from rlcard.agents.random_agent import RandomAgent
+
+env = rlcard.make('blackjack')
+env.set_agents([RandomAgent()])
+
+trajectories, payoffs = env.run()
+```
 
-For more examples, please refer to [examples/](examples).
+We also recommend the following **toy examples**.
+
+*   [Playing with random agents](docs/toy-examples.md#playing-with-random-agents)
+*   [Deep-Q learning on Blackjack](docs/toy-examples.md#deep-q-learning-on-blackjack)
+*   [Running multiple processes](docs/toy-examples.md#running-multiple-processes)
+*   [Having fun with pretrained Leduc model](docs/toy-examples.md#having-fun-with-pretrained-leduc-model)
+*   [Leduc Hold'em as single-agent environment](docs/toy-examples.md#leduc-holdem-as-single-agent-environment)
+*   [Training CFR on Leduc Hold'em](docs/toy-examples.md#training-cfr-on-leduc-holdem)
+
+## Demo
+Run `examples/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model:
+
+```
+>> Leduc Hold'em pre-trained model
+
+>> Start a new game!
+>> Agent 1 chooses raise
+
+=============== Community Card ===============
+┌─────────┐
+│░░░░░░░░░│
+│░░░░░░░░░│
+│░░░░░░░░░│
+│░░░░░░░░░│
+│░░░░░░░░░│
+│░░░░░░░░░│
+│░░░░░░░░░│
+└─────────┘
+===============   Your Hand    ===============
+┌─────────┐
+│J        │
+│         │
+│         │
+│    ♥    │
+│         │
+│         │
+│        J│
+└─────────┘
+===============     Chips      ===============
+Yours:   +
+Agent 1: +++
+=========== Actions You Can Choose ===========
+0: call, 1: raise, 2: fold
+
+>> You choose action (integer):
+```
 
 ## Documents
-Please refer to the [Documents](docs/README.md) for general concepts introduction. API documents are available at our [github page](https://rlcard.github.io/index.html).
+Please refer to the [Documents](docs/README.md) for general introductions. API documents are available at our [website](http://www.rlcard.org).
 
 ## Available Environments
-The table below shows the environments that are (or will be soon) available in RLCard. We provide a complexity estimation for the games on several aspects. **InfoSet Number:** the number of information set; **Avg. InfoSet Size:** the average number of states in a single information set; **Action Size:** the size of the action space. For some of the complex card games, we can only provide a range of estimation. **Name** is the name that should be passed to `env.make` to create the game environment.
-
-| Game                                                                                                                                                                                           | InfoSet Number  | Avg. InfoSet Size | Action Size | Name            | Status    |
-| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------: | :---------------: | :---------: | :-------------: | :-------: |
-| Blackjack ([wiki](https://en.wikipedia.org/wiki/Blackjack), [baike](https://baike.baidu.com/item/21%E7%82%B9/5481683?fr=aladdin))                                                              | 10^3            | 10^1              | 10^0        | blackjack       | Available |
-| Limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin))    | 10^14           | 10^3              | 10^0        | limit-holdem    | Available |
-| Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [baike](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin))                                               | 10^53 ~ 10^83   | 10^23             | 10^4        | doudizhu        | Available |
-| Mahjong ([wiki](https://en.wikipedia.org/wiki/Competition_Mahjong_scoring_rules), [baike](https://baike.baidu.com/item/%E9%BA%BB%E5%B0%86/215))                                                | 10^121          | 10^48             | 10^2        | -               | Come soon | 
-| No-limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^162          | 10^3              | 10^4        | no-limit-holdem | Available |
-| UNO ([wiki](https://en.wikipedia.org/wiki/Uno_\(card_game), [baike](https://baike.baidu.com/item/UNO%E7%89%8C/2249587))                                                                        |  10^163         | 10^10             | 10^1        | -               | Come soon |
-| Sheng Ji ([wiki](https://en.wikipedia.org/wiki/Sheng_ji), [baike](https://baike.baidu.com/item/%E5%8D%87%E7%BA%A7/3563150))                                                                    | 10^157 ~ 10^165 | 10^61             | 10^13       | -               | Come soon |
+We provide a complexity estimation for the games on several aspects. **InfoSet Number:** the number of information sets; **Avg. InfoSet Size:** the average number of states in a single information set; **Action Size:** the size of the action space. **Name:** the name that should be passed to `env.make` to create the game environment.
+
+| Game                                                                                                                                                                                           | InfoSet Number  | Avg. InfoSet Size | Action Size | Name            | Status     |
+| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------: | :---------------: | :---------: | :-------------: | :--------: |
+| Blackjack ([wiki](https://en.wikipedia.org/wiki/Blackjack), [baike](https://baike.baidu.com/item/21%E7%82%B9/5481683?fr=aladdin))                                                              | 10^3            | 10^1              | 10^0        | blackjack       | Available  |
+| Leduc Hold’em                                                                                                                                                                                  | 10^2            | 10^2              | 10^0        | leduc-holdem    | Available  |
+| Limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin))    | 10^14           | 10^3              | 10^0        | limit-holdem    | Available  |
+| Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [baike](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin))                                               | 10^53 ~ 10^83   | 10^23             | 10^4        | doudizhu        | Available  |
+| Mahjong ([wiki](https://en.wikipedia.org/wiki/Competition_Mahjong_scoring_rules), [baike](https://baike.baidu.com/item/%E9%BA%BB%E5%B0%86/215))                                                | 10^121          | 10^48             | 10^2        | mahjong         | Available  | 
+| No-limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^162          | 10^3              | 10^4        | no-limit-holdem | Available  |
+| UNO ([wiki](https://en.wikipedia.org/wiki/Uno_\(card_game), [baike](https://baike.baidu.com/item/UNO%E7%89%8C/2249587))                                                                        |  10^163         | 10^10             | 10^1        | uno             | Available  |
+| Sheng Ji ([wiki](https://en.wikipedia.org/wiki/Sheng_ji), [baike](https://baike.baidu.com/item/%E5%8D%87%E7%BA%A7/3563150))                                                                    | 10^157 ~ 10^165 | 10^61             | 10^11       | -               | Developing |
 
 ## Evaluation
-We wrap a `Logger` that conveniently saves/plots the results. Example outputs are as follows:
-![Learning Curves](docs/imgs/curves.png "Learning Curves")
+The perfomance is measured by winning rates through tournaments. Example outputs are as follows:
+![Learning Curves](http://rlcard.org/imgs/curves.png "Learning Curves")
 
-## Disclaimer
-Please note that this is a **pre-release** version of the RLCard. The toolkit is provided "**as is**," without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement.
+## Contributing
+Contribution to this project is greatly appreciated! Please create a issue for feedbacks/bugs. If you want to contribute codes, pleast contact [[email protected]](mailto:[email protected]) or [[email protected]]([email protected]).
 
 ## Acknowledgements
-We would like to thank JJ World Network Technology Co.,LTD for technical the support.
+We would like to thank JJ World Network Technology Co.,LTD for the generous support.
@@ -1,20 +1,24 @@
-# Overview
-The toolkit wraps each game by `Env` with easy-to-use interfaces. The goal of this toolkit is to enable the users to focus on algorithm design on challenging card games instead of developping game engines. The following design principles are applied:
-* **Simple.** We make the interfaces straightforward and simple. Users can easily run one game and obtain the statistics of the game.
-* **Consistent.** All the games are implemented following the same logical pattern. The main classes/functions of each game share the same class/function name. Users can easily understand each game and modify the rules for research purpose.
-* **Reproducible.** The results can be seeded for reproducibility purpose.
-* **Minimum Dependency.** We minimize the dependencies used in the toolkit so that the codes are easy to modify or migrate.
-* **Scalable.** New card environments can be added conveniently into RLCard with the above design principles.
+# Documents of RLCard
 
-# User Guide
-* [Toy examples](toy-examples.md)
-* [RLCard high-level design](high-level-design.md)
-* [Games in RLCard](games.md)
-* [Algorithms in RLCard](algorithms.md)
-* [Developping new algorithms](developping-algorithms.md)
+## Overview
+The toolkit wraps each game by `Env` class with easy-to-use interfaces. The goal of this toolkit is to enable the users to focus on algorithm development without caring about the environment. The following design principles are applied when developing the toolkit:
+*   **Reproducible.** Results on the environments can be reproduced. The same result should be obtained with the same random seed in different runs.
+*   **Accessible.** The experiences are collected and well organized after each game with easy-to-use interfaces. Uses can conveniently configure state representation, action encoding, reward design, or even the game rules.
+*   **Scalable.** New card environments can be added conveniently into the toolkit with the above design principles. We also try to minimize the dependencies in the toolkit so that the codes can be easily maintained.
 
-# Developer Guide
-* [Adding new environments](adding-new-environments.md)
+## User Guide
 
-# Application Programming Interface (API)
-The API documents are and available in [github page](https://rlcard.github.io/index.html).
+*   [Toy examples](toy-examples.md)
+*   [RLCard high-level design](high-level-design.md)
+*   [Games in RLCard](games.md)
+*   [Algorithms in RLCard](algorithms.md)
+
+## Developer Guide
+
+*   [Developping new algorithms](developping-algorithms.md)
+*   [Adding new environments](adding-new-environments.md)
+*   [Customizing environments](customizing-environments.md)
+*   [Adding pre-trained/rule-based models](adding-models.md)
+
+## Application Programming Interface (API)
+The API documents are and available at [Official Website](http://www.rlcard.org).
@@ -0,0 +1,7 @@
+# Adding Pre-trained/Rule-based models
+You can add your own pre-trained/rule-based models to the toolkit by following several steps:
+
+*   **Develop models.** You can either design a rule-based model save a neural network model. For each game, you need to develop models for all the players at the same time. You need to wrap each model as a class and make sure that `step` and `eval_step` can work correctly.
+*   **Wrap models.** You need to inherit the `Model` class in `rlcard/models.model.py`. Then put all the models for the players into a list. Rewrite `get_agent` function and return this list.
+*   **Register the model.** Register the model in `rlcard/models/__init__.py`.
+*   **Load the model in environment.** To load the model, modify `load_pretrained_models` in the corresponding game environment in `rlcard/envs`. Use the resgistered name to load the model.
@@ -1,11 +1,11 @@
 # Adding New Environments
 To add a new environment to the toolkit, generally you should take the following steps:
-* **Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player` as in existing games. The easiest way is to inherit the classed in [rlcard/core.py](rlcard/core.py) and implement the functions.
-* **Wrap the game with an environment.** The easiest way is to inherit `Env` in [rlcard/envs/env.py](rlcard/env/env.py). You need to implement `extract_state` which encodes the state, `decode_action` which decode actions from the id to the text string, and `get_payoffs` which calculate payoffs of the players.
-* **Register the game.** Now it is time to tell the toolkit where to locate the new environment. Go to [rlcard/envs/__init__.py](rlcard/envs/__init__.py), and indicate the name of the game and its entry point.
+*   **Implement a game.** Card games usually have similar structures so that they can be implemented with `Game`, `Round`, `Dealer`, `Judger`, `Player`, as in existing games. The easiest way is to inherit the classed in [rlcard/core.py](../rlcard/core.py) and implement the functions.
+*   **Wrap the game with an environment.** The easiest way is to inherit `Env` in [rlcard/envs/env.py](../rlcard/env/env.py). You need to implement `extract_state` which encodes the state, `decode_action` which decodes actions from the id to the text string, and `get_payoffs` which calculates payoffs of the players.
+*   **Register the game.** Now it is time to tell the toolkit where to locate the new environment. Go to [rlcard/envs/\_\_init\_\_.py](../rlcard/envs/__init__.py), and indicate the name of the game and its entry point.
 
 To test whether the new environment is set up successfully:
 ```python
 import rlcard
-env.make(#the new evironment#)
+rlcard.make(#the new evironment#)
 ```