Robust regression for noisy biological data

Jaron Arbet

9/2/24

Motivation

  • Ordinal or numeric outcome (continuous or discrete)
  • May have outliers or non-normally distributed
    (e.g. skewed, heavy tails)
  • Examples in our lab:
    • Normalized RNA or protein abundance
    • Copy Number
    • DNA methylation beta or m-values
  • Goal: Perform robust statistical tests comparing noisy features btwn groups, potentially adjusting for covariates
  • Robust?
    • Minimal assumptions on the data distribution
    • Control false-positive-rate for wide variety of data-types
    • Insensitive to outliers

But first… the Wilcoxon/Mann-Whitney U-test

  • Ordinal/numeric outcome btwn 2 unpaired groups
  • Robust to outliers and no assumptions on data distribution
  • Example:
    • NHANES diabetes data (source)
    • Compare HbA1c between 914 patients diagnosed with diabetes (or pre-diabetes) vs. 5881 undiagnosed

Questions:

  1. What is the Null and Alternative hypothesis?
  2. What is the effect size?

How it works

  • Analyze ranked order of the data, rather than raw values:
  1. Combine the 2 groups and rank from smallest to largest:
Example of 8 patients
diabetes HbA1c rank
No 5.4 1.0
No 5.6 2.0
Yes 5.8 3.5
No 5.8 3.5
Yes 6.4 5.0
Yes 6.9 6.0
No 7.0 7.0
Yes 9.3 8.0
  1. Wilcoxon test statistic:

\[\begin{equation} W = R^*_1 - \frac{n_1(n_1 + 1)}{2} \end{equation}\]

  • \(R^*_1\) = sum of ranks in group 1
  • \(n_1\) = sample size in group 1

Wilcoxon test using the full data:

wilcox <- wilcox.test(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
wilcox$statistic;
       W 
754851.5 
wilcox$p.value;
[1] 4.413508e-270
  • Notice \(W=\) 754851.5, which we can manually verify:
nhgh$rank <- rank(nhgh$HbA1c);
sum.ranks.group.1 <- sum(nhgh$rank[nhgh$diabetes == 'No']);
n1 <- sum(nhgh$diabetes == 'No');
sum.ranks.group.1 - n1*(n1 + 1) / 2
[1] 754851.5

Hypothesis test

  • Null: The 2 distributions are the same
  • Alt: The 2 distributions differ, i.e. one distribution tends to produce larger/smaller values than the other.

Notice that is much more general compared to the t-test:

  • Null: Both groups have the same mean

  • Alt: Both groups have different means

  • Exact and asymptotic distributions of \(W\) were derived and can be used to calculate p-value1

Effect size

  • Often people only report the p-value, software is inconsistent on what effect size is reported
  • c-index1 (concordance probability, probability index):
    • \(c = Pr (Y_2 > Y_1)\) probability that value in group 2 is larger than group 1
    • Null \(c = 0.5\)
    • For a continuous variable \(c\) = AUROC

\[\begin{equation} c = \frac{\bar{R}_2 - \frac{n_2+2}{2}}{n_1} \end{equation}\]

  • \(\bar{R}_2\): mean of ranks in group 2

  • \(n_1, n_2\) sample size in groups 1,2

  • asht::wmwTest: Estimate of \(c\) and Confidence Interval (Fay and Malinovsky 2018)

    • Unlike other effect sizes used (e.g. difference in medians), this one is consistent with the Wilcoxon p-value

Diabetes example

wilcox <- asht::wmwTest(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
sprintf('c-index = %s', as.numeric(round(wilcox$estimate, 2)));
[1] "c-index = 0.86"
round(wilcox$conf.int, 2);
[1] 0.85 0.87
attr(,"conf.level")
[1] 0.95
wilcox$p.value;
[1] 4.413508e-270

There is a 0.86 probability that diabetes patients have higher HbA1c compared to undiagnosed patients.

Wilcoxon vs. other methods

  • Table1 below compares power of t-test vs. Wilcoxon
  • Values > 1 mean Wilcoxon is more powerful

  • Wilcoxon better controls FPR compared to DESeq2, edgeR, limma-voom, NOISeq, dearseq(Li et al. 2022)
    • Recommends n \(\ge\) 8 per group2

Adjusting Wilcoxon test for covariates

  • Most people mistakenly think Wilcoxon test and similar rank-based methods can’t adjust for covariates, however:
  • The Wilcoxon test statistic \(W\) is equivalent to the score test statistic from a proportional odds (PO) regression model1-3
  • Thus the PO model can be used to perform Wilcoxon-like tests while adjusting for covariates.

Proportional Odds (PO) Model

  • Also called ordered logit or ordinal logistic regression

Let \(Y = 1, 2, ..., k\) be the k unique values of Y:

\[\begin{equation} Pr(Y \ge j | X) = \frac{1}{1 + exp[-(\alpha_j + X\beta)]} = expit(\alpha_j + X\beta) \end{equation}\]

  • \(j = 1,2, ..., k - 1\)
  • \(\alpha_j\) is the logit of \(Pr(Y \ge j | X = 0)\)

Interpretation of \(\beta\):

\[\begin{equation} exp(\beta) = \frac{\text{Odds } Y \ge j | X = x + 1}{\text{Odds } Y \ge j | X = x} \end{equation}\]

  • “odds of having larger value of \(Y\) given 1 unit increase in \(X\)
  • “Group A has __ times greater odds of having a larger \(Y\) compared to group B”
  • Parameters \(\beta, \alpha_j\) estimated using MLE or Bayesian methods

“Proportional odds”?

  • \(\beta\) is constant for all values of \(Y\). Similar to CoxPH model where HR is constant for all time.

  • Y-axis is \(log[\frac{Pr(Y > j | X)}{1 - Pr(Y > j | X)}]\)
  • Effect of predictor (slope \(\beta\)) on \(Y\) is the same for all \(Y\)

What if non-PO?

  • The true lines would not be parallel.
  • The PO model \(\hat{\beta}\) still has a meaningful interpretation: the average \(\beta\) across all lines: the average effect of \(X\) on \(Y\)

Diabetes example

fit <- rms::orm(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
odds.ratios <- exp(coef(fit));
ci <- exp(confint(fit));

round(odds.ratios['diabetes=Yes'], 1);
diabetes=Yes 
        21.4 
round(ci['diabetes=Yes',], 1);
 2.5 % 97.5 % 
  18.3   24.9 

Diabetes patients have 21.4 times greater odds of having higher HbA1c compared to undiagnosed patients.

c-index for PO model

  • Recall \(c = Pr (Y_2 > Y_1)\) is an effect size for Wilcoxon test
  • You can approximate the c-index from the PO odds ratio1:

\[\begin{equation} c = \frac{\text{OR}^{0.6453}}{1 + \text{OR}^{0.6453}} \end{equation}\]

  • For our example: \(c = \frac{21.4^{0.6453}}{1 + 21.4^{0.6453}} = 0.878\) which is close to the c-index from Wilcoxon test (0.86)

Other effect sizes for PO model

Besides the odds ratio and c-index, PO also supports several other important estimators:

  • Quantiles (e.g. 25th, median, 75th) - see rms::Quantile.orm
    • Difference in quantiles between groups
  • Mean - see rms::Mean
  • \(Pr(Y > y | X)\) - “exceedance probability”
    • Diff. in exceedance probabilities btwn groups

Simulation study to assess robustness of PO model

  • Compare the 2-sample t-test, wilcoxon test, and PO model for testing difference in HbA1c between diabetes and undiagnosed patients.
  • Goal: determine which methods control false-positive-rate
  • Approach:
  1. Randomly sample 200 patients from nhgh diabetes dataset
  2. Randomly permute the group labels (diabetes vs. undiagnosed) to create a dataset where \(H_0\) is true
  3. Record whether each test’s p value is less than \(\alpha = 0.05\)
  4. Repeat steps 1-3 10,000 times
  5. Calculate FPR as the proportion of datasets where \(p < \alpha\)
  6. A method controls the FPR iff FPR \(\le \alpha\)

PO model is more robust than t-test

Extensions

Limitations

  1. Proportional odds assumption

…violations of the proportional odds assumption usually do not prevent the PO model from providing a reasonable treatment effect assessment. - Violation of Proportional Odds is Not Fatal

  • If you only have 1 predictor the PO model will still control the FPR since PO assumption is guaranteed under \(H_0\)

Simulations show robustness of PO model under non-PO:

Resources for assessing impact of PO assumption

  1. Sample size for continuous outcomes
  • For discrete outcomes, large sample size is totally fine
  • For continuous outcomes, rms::orm claims to efficiently handle “over 6000 distinct values” - I haven’t tested on larger than this.
  1. Assumes linear and additive effects by default
  • You can manually investigate adding non-linear or interaction terms
  • It would be interesting to develop a MARS1 version of PO model to automate this

Summary

  • Proportional Odds model extends the Wilcoxon rank sum test to adjust for covariates
  • “If You Like the Wilcoxon test You Must Like the Proportional Odds Model”
  • Biological data are often non-normally distributed and contain outliers. PO model is a robust choice here compared to t-tests/linear regression.
  • PO model estimates an odds ratio (odds of having a larger value of \(Y\) given a 1 unit increase in \(X\)), but can also estimate c-index, quantiles, means
  • Some simulation studies show robustness to non-PO, but more research needed here

References

Intro to PO model:

  • https://www.fharrell.com/post/rpo/
  • French, Benjamin, and Matthew S. Shotwell. “Regression models for ordinal outcomes.” Jama 328.8 (2022): 772-773.

Wilcoxon test:

PO model for continuous outcome:

  • Liu, Qi, et al. “Modeling continuous response variables using ordinal regression.” Statistics in medicine 36.27 (2017): 4316-4335.
  • https://hbiostat.org/rmsc/cony

Other refs in presentation:

Fay, Michael P, and Yaakov Malinovsky. 2018. “Confidence Intervals of the Mann-Whitney Parameter That Are Compatible with the Wilcoxon-Mann-Whitney Test.” Statistics in Medicine 37 (27): 3991–4006.
Li, Yumei, Xinzhou Ge, Fanglue Peng, Wei Li, and Jingyi Jessica Li. 2022. “Exaggerated False Positives by Popular Differential Expression Methods When Analyzing Human Population Samples.” Genome Biology 23 (1): 79.