Glycaemic control, vascular brain injury & cognition in type 2 diabetes
This project works a small, messy clinical dataset end to end: 88 subjects, 1,111 columns, a plan fixed in advance, transparent accounting of how the sample shrinks, small-sample-robust inference, and a machine-learning layer that declines to predict what n=44 cannot support. No exposure showed a robust association with cognition. The contribution is showing exactly what that null can and cannot rule out.
Among older adults with type 2 diabetes, is poorer current glycaemic control (higher HbA1c, a roughly 3-month marker) associated with poorer global cognition after adjustment for age, sex and BMI? Secondary, exploratory exposures considered white-matter-hyperintensity burden normalised by intracranial volume (WMH/ICV), gait speed and global cerebral vasoreactivity. The analysis is deliberately narrow: answer what the released table can support, rather than reconstructing the original study's full mechanistic aims.
The central constraint is not model choice; it is usable data. The released summary table contains 75 people with diabetes and only 13 controls, so the inferential analysis is diabetes-only. Cognitive testing is almost all-or-nothing: 44 diabetic participants have all five cognitive tests, 30 have none, and one has only two. MRI and vasoreactivity are sparser still.
| Stage | n |
|---|---|
| All summary rows | 88 |
| DM subjects | 75 |
| Cognitive composite | 44 |
| Vascular model with WMH/ICV | 41 |
| Vasoreactivity model | 36 |
The analysis is a protocol-completer story: cognition, MRI and haemodynamic measures co-occur in a much smaller subset than the released table initially suggests.
Figure 1 Selected-variable missingness in the diabetes analysis base. The heatmap makes the effective sample size visible: cognition is concentrated in the same protocol-completer subset that drives the inferential models.
Outcome
Five-test cognitive composite; Trail-Making Test Part B reverse-scored; higher values mean better cognition.
Primary model
Cognitive composite ~ HbA1c + age + sex + BMI.
Inference
OLS with HC3 robust standard errors for small-sample heteroskedasticity.
Precision checks
Percentile bootstrap CIs and post-hoc TOST equivalence against a +/-0.5 SD precision bound.
Robustness
Leave-one-out refits and Cook's distance.
Confounding transparency
Education unavailable in the released table; GDS sensitivity model added; diabetes duration, hypertension and stroke/TIA tabulated.
Machine learning
Elastic net under leakage-safe nested cross-validation, framed as feature prioritisation rather than clinical prediction.
Exposure label
Refined from "glycaemic burden" to "current glycaemic control" because a single HbA1c is a roughly 3-month marker.
All four intervals cross zero. WMH/ICV has the largest negative point estimate but also the widest, most fragile uncertainty. Gait and vasoreactivity point positive but remain imprecise and exploratory.
Figure 2 Main exposure coefficients standardised per SD exposure, in composite-SD outcome units, with HC3 95% confidence intervals.
The primary HbA1c model was null: beta = -0.052 per 1% HbA1c, HC3 95% CI -0.279 to +0.176, p=0.65. On a fully standardised scale, beta* = -0.093. A post-hoc TOST equivalence analysis rules out HbA1c-cognition associations of +/-0.5 SD or larger in this sample (TOST p=0.025; 90% CI -0.43 to +0.25), while smaller associations remain indistinguishable from noise.
Figure 3 HbA1c versus cognitive composite; unadjusted descriptive scatter and bivariate fit. The scatter is descriptive, not adjusted. It shows why the result is visually unsurprising: most participants cluster around HbA1c 6-7%, with a thin higher-HbA1c tail and no clear cognitive trend.
Under repeated nested cross-validation, the elastic net did not beat a mean-only baseline and its full-fit coefficients shrank to zero. A random forest performed worse. The disciplined conclusion is that no prediction model is warranted at n=44; the value is a leakage-safe workflow that reports its own uncertainty and declines to manufacture a result.
Can say
- The released GE-75 summary table does not show a robust association between HbA1c, WMH/ICV, gait speed, vasoreactivity and the cognitive composite in the analytic diabetes sample.
- For HbA1c, the data can reject a large standardised association of +/-0.5 SD or larger.
- The WMH/ICV estimate is fragile and method-dependent: HC3 includes zero, bootstrap marginally excludes zero, and Cook's distance identifies one high-influence observation.
- A leakage-safe ML workflow does not support prediction at this sample size.
Cannot say
- It cannot prove no glycaemic-cognition relationship exists.
- It cannot estimate small effects precisely.
- It cannot make causal claims.
- It cannot make clinical prediction claims.
- It cannot fully address residual confounding because education is effectively unavailable in the released summary table.
- It depends on a complete-case assumption that observed completer-versus-excluded comparisons are consistent with but cannot prove.
The project is scripted end to end: one audit script, one analysis script and one machine-learning script regenerate the tables and figures used in the report. The current environment verification used Python 3.14, pandas 3.0, NumPy 2.4 and scikit-learn 1.9.
Verified in the current environment; scripts and generated tables are available on request.
Data: Novak V, Quispe R, Saunders C. Cerebral perfusion and cognitive decline in type 2 diabetes (version 1.0.1). PhysioNet, 2022. CC BY 4.0. https://doi.org/10.13026/whjz-e968