Delayed Q-learning

It’s a model-free, off-policy reinforcement learning algorithm designed to have probably approximately correct (PAC) guarantees.
Unlike standard Q-learning, it doesn’t update after every single step. Instead, it delays updates until it has enough evidence that a new estimate is significantly better than the old one.
This makes it more sample-efficient and statistically robust, avoiding noisy updates.

Opposition Learning

Opposition learning is the idea of simultaneously considering a guess and its “opposite” in the search space, to accelerate convergence toward the optimal solution