Causal Estimation Benchmarks
Cross-source consensus on Causal Estimation Benchmarks from 1 sources and 7 claims.
1 sources · 7 claims
Benefits
Comparisons
Evidence quality
Highlighted claims
- CausalFlow-T was benchmarked on four complete synthetic datasets with known counterfactuals. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- Causal reliability metrics included subgroup calibration, arm reconstruction error, tail variance ratio, HR recovery, and stability. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- On CVD Risk Toy, CausalFlow-T recovered a hazard ratio of 0.786 plus or minus 0.051 when the true protective hazard ratio was 0.831. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- On LDL Toy, TARNet had the lowest absolute MAE but showed systematic error and variance collapse. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- On Cox Survival, CVAE had better MAE, but CausalFlow-T had the best arm-1 error and closest hazard ratio recovery. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- The FIRE semi-synthetic oracle showed that CausalFlow-T and GNN-CVAE were the only models passing the bias threshold. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
- The findings support evaluating longitudinal causal models with criteria beyond factual MAE. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation