Robust regression for noisy biological data

Jaron Arbet

9/2/24

Motivation

Ordinal or numeric outcome (continuous or discrete)
May have outliers or non-normally distributed
(e.g. skewed, heavy tails)
Examples in our lab:
- Normalized RNA or protein abundance
- Copy Number
- DNA methylation beta or m-values
Goal: Perform robust statistical tests comparing noisy features btwn groups, potentially adjusting for covariates
Robust?
- Minimal assumptions on the data distribution
- Control false-positive-rate for wide variety of data-types
- Insensitive to outliers

But first… the Wilcoxon/Mann-Whitney U-test

Ordinal/numeric outcome btwn 2 unpaired groups
Robust to outliers and no assumptions on data distribution
Example:
- NHANES diabetes data (source)
- Compare HbA1c between 914 patients diagnosed with diabetes (or pre-diabetes) vs. 5881 undiagnosed

Questions:

What is the Null and Alternative hypothesis?
What is the effect size?

How it works

Analyze ranked order of the data, rather than raw values:

Combine the 2 groups and rank from smallest to largest:

Example of 8 patients
diabetes	HbA1c	rank
No	5.4	1.0
No	5.6	2.0
Yes	5.8	3.5
No	5.8	3.5
Yes	6.4	5.0
Yes	6.9	6.0
No	7.0	7.0
Yes	9.3	8.0

Wilcoxon test statistic:

\[\begin{equation} W = R^*_1 - \frac{n_1(n_1 + 1)}{2} \end{equation}\]

\(R^*_1\) = sum of ranks in group 1
\(n_1\) = sample size in group 1

Wilcoxon test using the full data:

wilcox <- wilcox.test(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
wilcox$statistic;

       W 
754851.5

wilcox$p.value;

[1] 4.413508e-270

Notice \(W=\) 754851.5, which we can manually verify:

nhgh$rank <- rank(nhgh$HbA1c);
sum.ranks.group.1 <- sum(nhgh$rank[nhgh$diabetes == 'No']);
n1 <- sum(nhgh$diabetes == 'No');
sum.ranks.group.1 - n1*(n1 + 1) / 2

[1] 754851.5

Hypothesis test

Null: The 2 distributions are the same
Alt: The 2 distributions differ, i.e. one distribution tends to produce larger/smaller values than the other.

Notice that is much more general compared to the t-test:

Null: Both groups have the same mean
Alt: Both groups have different means
Exact and asymptotic distributions of \(W\) were derived and can be used to calculate p-value¹

1. https://www.stat.auckland.ac.nz/~wild/ChanceEnc/Ch10.wilcoxon.pdf

Effect size

Often people only report the p-value, software is inconsistent on what effect size is reported
c-index¹ (concordance probability, probability index):
- \(c = Pr (Y_2 > Y_1)\) probability that value in group 2 is larger than group 1
- Null \(c = 0.5\)
- For a continuous variable \(c\) = AUROC

\[\begin{equation} c = \frac{\bar{R}_2 - \frac{n_2+2}{2}}{n_1} \end{equation}\]

\(\bar{R}_2\): mean of ranks in group 2
\(n_1, n_2\) sample size in groups 1,2
asht::wmwTest: Estimate of \(c\) and Confidence Interval (Fay and Malinovsky 2018)
- Unlike other effect sizes used (e.g. difference in medians), this one is consistent with the Wilcoxon p-value

1. https://hbiostat.org/bbr/nonpar#two-sample-test-wilcoxonmannwhitney

Diabetes example

wilcox <- asht::wmwTest(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
sprintf('c-index = %s', as.numeric(round(wilcox$estimate, 2)));

[1] "c-index = 0.86"

round(wilcox$conf.int, 2);

[1] 0.85 0.87
attr(,"conf.level")
[1] 0.95

wilcox$p.value;

[1] 4.413508e-270

There is a 0.86 probability that diabetes patients have higher HbA1c compared to undiagnosed patients.

Wilcoxon vs. other methods

Table¹ below compares power of t-test vs. Wilcoxon
Values > 1 mean Wilcoxon is more powerful

Wilcoxon better controls FPR compared to DESeq2, edgeR, limma-voom, NOISeq, dearseq(Li et al. 2022)
- Recommends n \(\ge\) 8 per group²

¹De Schryver, Maarten, and Jan De Neve. “A tutorial on probabilistic index models: Regression models for the effect size P (Y1< Y2).” Psychological Methods 24.4 (2019): 403.
²https://towardsdatascience.com/deseq2-and-edger-should-no-longer-be-the-default-choice-for-large-sample-differential-gene-8fdf008deae9

Adjusting Wilcoxon test for covariates

Most people mistakenly think Wilcoxon test and similar rank-based methods can’t adjust for covariates, however:
The Wilcoxon test statistic \(W\) is equivalent to the score test statistic from a proportional odds (PO) regression model^1-3
Thus the PO model can be used to perform Wilcoxon-like tests while adjusting for covariates.

¹McCullagh, Peter. “Regression models for ordinal data.” Journal of the Royal Statistical Society: Series B (Methodological) 42.2 (1980): 109-127.
²Equivalence of Wilcoxon Statistic and Proportional Odds Model
³If You Like the Wilcoxon Test You Must Like the Proportional Odds Model

Proportional Odds (PO) Model

Also called ordered logit or ordinal logistic regression

Let \(Y = 1, 2, ..., k\) be the k unique values of Y:

\[\begin{equation} Pr(Y \ge j | X) = \frac{1}{1 + exp[-(\alpha_j + X\beta)]} = expit(\alpha_j + X\beta) \end{equation}\]

\(j = 1,2, ..., k - 1\)
\(\alpha_j\) is the logit of \(Pr(Y \ge j | X = 0)\)

https://hbiostat.org/rmsc/ordinal

Interpretation of \(\beta\):

\[\begin{equation} exp(\beta) = \frac{\text{Odds } Y \ge j | X = x + 1}{\text{Odds } Y \ge j | X = x} \end{equation}\]

“odds of having larger value of \(Y\) given 1 unit increase in \(X\)”
“Group A has __ times greater odds of having a larger \(Y\) compared to group B”
Parameters \(\beta, \alpha_j\) estimated using MLE or Bayesian methods

“Proportional odds”?

\(\beta\) is constant for all values of \(Y\). Similar to CoxPH model where HR is constant for all time.

Y-axis is \(log[\frac{Pr(Y > j | X)}{1 - Pr(Y > j | X)}]\)
Effect of predictor (slope \(\beta\)) on \(Y\) is the same for all \(Y\)

What if non-PO?

The true lines would not be parallel.
The PO model \(\hat{\beta}\) still has a meaningful interpretation: the average \(\beta\) across all lines: the average effect of \(X\) on \(Y\)

https://www.datavis.ca/courses/grcat/grc6.html

Diabetes example

fit <- rms::orm(
    formula = HbA1c ~ diabetes,
    data = nhgh
    );
odds.ratios <- exp(coef(fit));
ci <- exp(confint(fit));

round(odds.ratios['diabetes=Yes'], 1);

diabetes=Yes 
        21.4

round(ci['diabetes=Yes',], 1);

 2.5 % 97.5 % 
  18.3   24.9

Diabetes patients have 21.4 times greater odds of having higher HbA1c compared to undiagnosed patients.

c-index for PO model

Recall \(c = Pr (Y_2 > Y_1)\) is an effect size for Wilcoxon test
You can approximate the c-index from the PO odds ratio¹:

\[\begin{equation} c = \frac{\text{OR}^{0.6453}}{1 + \text{OR}^{0.6453}} \end{equation}\]

For our example: \(c = \frac{21.4^{0.6453}}{1 + 21.4^{0.6453}} = 0.878\) which is close to the c-index from Wilcoxon test (0.86)

¹https://www.fharrell.com/post/powilcoxon/

Other effect sizes for PO model

Besides the odds ratio and c-index, PO also supports several other important estimators:

Quantiles (e.g. 25th, median, 75th) - see rms::Quantile.orm
- Difference in quantiles between groups
Mean - see rms::Mean
\(Pr(Y > y | X)\) - “exceedance probability”
- Diff. in exceedance probabilities btwn groups

Simulation study to assess robustness of PO model

Compare the 2-sample t-test, wilcoxon test, and PO model for testing difference in HbA1c between diabetes and undiagnosed patients.
Goal: determine which methods control false-positive-rate
Approach:

Randomly sample 200 patients from nhgh diabetes dataset
Randomly permute the group labels (diabetes vs. undiagnosed) to create a dataset where \(H_0\) is true
Record whether each test’s p value is less than \(\alpha = 0.05\)
Repeat steps 1-3 10,000 times
Calculate FPR as the proportion of datasets where \(p < \alpha\)
A method controls the FPR iff FPR \(\le \alpha\)

PO model is more robust than t-test

Extensions

Limitations

Proportional odds assumption

…violations of the proportional odds assumption usually do not prevent the PO model from providing a reasonable treatment effect assessment. - Violation of Proportional Odds is Not Fatal

If you only have 1 predictor the PO model will still control the FPR since PO assumption is guaranteed under \(H_0\)

Simulations show robustness of PO model under non-PO:

Resources for assessing impact of PO assumption

Sample size for continuous outcomes

For discrete outcomes, large sample size is totally fine
For continuous outcomes, rms::orm claims to efficiently handle “over 6000 distinct values” - I haven’t tested on larger than this.

Assumes linear and additive effects by default

You can manually investigate adding non-linear or interaction terms
It would be interesting to develop a MARS¹ version of PO model to automate this

1. https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_spline

Summary

Proportional Odds model extends the Wilcoxon rank sum test to adjust for covariates
“If You Like the Wilcoxon test You Must Like the Proportional Odds Model”
Biological data are often non-normally distributed and contain outliers. PO model is a robust choice here compared to t-tests/linear regression.
PO model estimates an odds ratio (odds of having a larger value of \(Y\) given a 1 unit increase in \(X\)), but can also estimate c-index, quantiles, means
Some simulation studies show robustness to non-PO, but more research needed here

References

Intro to PO model:

https://www.fharrell.com/post/rpo/
French, Benjamin, and Matthew S. Shotwell. “Regression models for ordinal outcomes.” Jama 328.8 (2022): 772-773.

Wilcoxon test:

PO model for continuous outcome:

Liu, Qi, et al. “Modeling continuous response variables using ordinal regression.” Statistics in medicine 36.27 (2017): 4316-4335.
https://hbiostat.org/rmsc/cony

diabetes	HbA1c	rank
No	5.4	1.0
No	5.6	2.0
Yes	5.8	3.5
No	5.8	3.5
Yes	6.4	5.0
Yes	6.9	6.0
No	7.0	7.0
Yes	9.3	8.0

diabetes	HbA1c	rank
No	5.4	1.0
No	5.6	2.0
Yes	5.8	3.5
No	5.8	3.5
Yes	6.4	5.0
Yes	6.9	6.0
No	7.0	7.0
Yes	9.3	8.0

Robust regression for noisy biological data

Motivation

But first… the Wilcoxon/Mann-Whitney U-test

Questions:

How it works

Hypothesis test

Effect size

Diabetes example

Wilcoxon vs. other methods

Adjusting Wilcoxon test for covariates

Proportional Odds (PO) Model

Interpretation of \(\beta\):

“Proportional odds”?

Diabetes example

c-index for PO model

Other effect sizes for PO model

Simulation study to assess robustness of PO model

PO model is more robust than t-test

Extensions

Limitations

Summary

References

Intro to PO model:

Wilcoxon test:

PO model for continuous outcome:

Other refs in presentation:

diabetes	HbA1c	rank
No	5.4	1.0
No	5.6	2.0
Yes	5.8	3.5
No	5.8	3.5
Yes	6.4	5.0
Yes	6.9	6.0
No	7.0	7.0
Yes	9.3	8.0