[Feature Request] Double DQN

### 🚀 Feature

Add double variant of the dqn algorithm.

### Motivation

It's in the roadmap https://github.com/DLR-RM/stable-baselines3/issues/1.

### Pitch

I suggest we go from:
```
with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      # Follow greedy policy: use the one with the highest value
      next_q_values, _ = next_q_values.max(dim=1)
```
to:
```
with th.no_grad():
      # Compute the next Q-values using the target network
      next_q_values = self.q_net_target(replay_data.next_observations)
      if self.double_dqn:
          # use current model to select the action with maximal q value
          max_actions = th.argmax(self.q_net(replay_data.next_observations), dim=1)
          # evaluate q value of that action using fixed target network
          next_q_values = th.gather(next_q_values, dim=1, index=max_actions.unsqueeze(-1))
      else:
          # Follow greedy policy: use the one with the highest value
          next_q_values, _ = next_q_values.max(dim=1)
```
with `double_dqn` as additional flag to be passed to DQN init.

### Checklist

- [ x] I have checked that there is no similar [issue](https://github.com/DLR-RM/stable-baselines3/issues) in the repo (**required**)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Double DQN #487

🚀 Feature

Motivation

Pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Double DQN #487

Description

🚀 Feature

Motivation

Pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions