Clock-State Q-Learning

Cross-source consensus on Clock-State Q-Learning from 1 sources and 7 claims.

1 sources · 7 claims

How it works

Benefits

Comparisons

Evidence quality

Highlighted claims

The clock-state Q-learning agent uses only elapsed blank time as its internal state, discarding all other odor history. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
The clock-state Q-learning agent substantially outperforms the optimized cast-and-surge heuristic in all four sparsity environments. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
A scalar clock state resets to zero upon odor detection and increments by one on each blank step, with the Q-matrix encoding a fixed deterministic sequence of moves in response to plume loss. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
Success rates for the clock-state agent are at or above 90% in all environments, approaching 100% in denser plume conditions. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
The best single-Q agent across an ensemble of 20 training runs approaches the performance of the quasi-optimal Bayesian POMDP agent despite using far simpler memory. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
The hypothesis driving this study is that plume recovery is the dominant challenge in turbulent olfactory search and that elapsed blank time captures the most relevant information for solving it. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
High policy variability across 20 training runs reflects a broad manifold of near-equivalent local optima rather than training instability. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery