Differential TD
Cross-source consensus on Differential TD from 1 sources and 5 claims.
1 sources · 5 claims
Uses
How it works
Benefits
Risks & contraindications
Highlighted claims
- Differential TD replaces the discounted TD error with an error that subtracts the current average-reward estimate. — On the Divergence of Differential Temporal Difference Learning without Local Clocks
- Differential TD was introduced for off-policy average-reward policy evaluation. — On the Divergence of Differential Temporal Difference Learning without Local Clocks
- Global-clock differential TD does not converge for all positive eta. — On the Divergence of Differential Temporal Difference Learning without Local Clocks
- Local-clock differential TD is stable for every positive eta under the cited Wan et al. conditions. — On the Divergence of Differential Temporal Difference Learning without Local Clocks
- Differential TD is designed for the average-reward setting and incorporates a centering effect similar to reference-state centering. — On the Divergence of Differential Temporal Difference Learning without Local Clocks