Possible error in critic update in SAC-AE algorithm

In SAC-AE algorithm, critic1 and 2 are updated as the following:
```
target_q = tf.stop_gradient(
rewards + not_dones * self.discount * (min_next_target_q - self.alpha * next_logps))

obs_features = self._encoder(obses, stop_q_grad=self._stop_q_grad)
current_q1 = self.qf1(obs_features, actions)
current_q2 = self.qf2(obs_features, actions)
td_loss_q1 = tf.reduce_mean((target_q - current_q1) ** 2)
td_loss_q2 = tf.reduce_mean((target_q - current_q2) ** 2)  # Eq.(6)

q1_grad = tape.gradient(td_loss_q1, self._encoder.trainable_variables + self.qf1.trainable_variables)
self.qf1_optimizer.apply_gradients(
zip(q1_grad, self._encoder.trainable_variables + self.qf1.trainable_variables))
q2_grad = tape.gradient(td_loss_q2, self._encoder.trainable_variables + self.qf2.trainable_variables)
self.qf2_optimizer.apply_gradients(
zip(q2_grad, self._encoder.trainable_variables + self.qf2.trainable_variables))
```

However, as encoder is optimized with q1 before q2 + encoder optimization, td_loss_q2 and q2_grad are inconsistent. Thus I believe q2_grad have to be calculated before optimizing qf1 and encoder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible error in critic update in SAC-AE algorithm #162

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible error in critic update in SAC-AE algorithm #162

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions