Causal Estimation Benchmarks

Cross-source consensus on Causal Estimation Benchmarks from 1 sources and 7 claims.

1 sources · 7 claims

Benefits

Comparisons

Evidence quality

Highlighted claims

CausalFlow-T was benchmarked on four complete synthetic datasets with known counterfactuals. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
Causal reliability metrics included subgroup calibration, arm reconstruction error, tail variance ratio, HR recovery, and stability. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
On CVD Risk Toy, CausalFlow-T recovered a hazard ratio of 0.786 plus or minus 0.051 when the true protective hazard ratio was 0.831. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
On LDL Toy, TARNet had the lowest absolute MAE but showed systematic error and variance collapse. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
On Cox Survival, CVAE had better MAE, but CausalFlow-T had the best arm-1 error and closest hazard ratio recovery. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
The FIRE semi-synthetic oracle showed that CausalFlow-T and GNN-CVAE were the only models passing the bias threshold. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation
The findings support evaluating longitudinal causal models with criteria beyond factual MAE. — Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation