Two-Q Architecture

Cross-source consensus on Two-Q Architecture from 1 sources and 5 claims.

1 sources · 5 claims

How it works

The two-Q agent selects between two Q-matrices based on whether the duration of the most recently completed blank interval exceeds a threshold, encoding a coarse local-density classifier. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
Q-minus specializes in preventing overshooting by initiating downwind return much sooner and exhibiting a more prominent initial surge. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
Q-plus specializes in escaping sparse rear regions by performing far more upwind search before returning downwind. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
The heuristic 2Qh variant, assembled without retraining by combining a dense-trained and a sparse-trained single-Q agent, matches or outperforms the best single-Q agent in most conditions. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
The performance benefit of the two-Q architecture arises from the functional complementarity of the two programs, not from jointly optimized training. — Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery