REINFORCE Policy

Cross-source consensus on REINFORCE Policy from 1 sources and 4 claims.

1 sources · 4 claims

How it works

The policy used four parameters for per-gene scores and a fifth parameter to control the feature retention fraction. — StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery
The learned retention fraction was bounded between 0.25 and 0.90 and initialized at 0.575. — StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery
The REINFORCE component mainly adjusted retention fraction rather than materially using per-gene penalty modulation. — StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery
The model's gene-selection gradient assumes independent Bernoulli outcomes, which is an approximation because ElasticNet selections are correlated. — StackFeat RL: Reinforcement Learning over Iterative Dual Criterion Feature Selection for Stable Biomarker Discovery