P-value histograms

Is there any signal in your study?
Or problems with your analysis?

Jaron Arbet

2025-01-23

Motivation

Project with large number of hypothesis tests
Histogram of p-values gives a lot of info about your study
Any signal in your study?
Quality control: any problems with study design or analysis?
Summarize the key results of (Breheny 2018)

Hypothesis testing

Null hypothesis (\(H_0\))

No difference between groups
No relationship between variables
\(\theta = \theta_0\)

Alternative hypothesis (\(H_A\))

The groups differ
There is a relationship between variables
\(\theta \neq \theta_0\)

What is a p-value?

When \(H_0\) is always true

Flat/uniform distribution:

When \(H_A\) is sometimes true

Decreasing slope from left to right:

Is there any signal in the data?

(Rogier et al. 2014) mouse study with 201 genes
No hypothesis rejected at 10% false discovery rate level

pvalue histogram suggests signal, but study was underpowered

“Regular” pvalue histogram

(Breheny 2018) define a regular pvalue histogram as the 2 scenarios we’ve seen so far:

Flat/uniform (\(H_0\) is always true)
Slopes down left-to-right (\(H_A\) is sometimes true)

A regular pvalue histogram suggests no errors in your study/analysis , although you might still be underpowered

“Irregular” pvalue histogram

(Breheny 2018) define an irregular pvalue histogram as any other shape, for example in (Fischl et al. 2014):

Suggests a problem in your study or analysis:

Measurement error
Parametric assumptions wrong
Correlated pvalues (not major problem, need special considerations which we’ll come back to)

Formal test for signal

Let \(\tau\) be the observed number of pvalues < 0.05
1-sided Binomial test: is \(\tau\) greater than what’s expected assuming \(H_0\) is always true?
\(m\) = number of tests
\(b\) = bin width of left-most bin (e.g. b = 0.05)
Then 95th percentile of \(Bin(m, b)\) is cutoff for the test
Thus if \(\tau > Bin_{.95}(m, b)\), we have evidence for a signal

Example

Recall (Rogier et al. 2014) 201 genes with min FDR \(> 0.10\)
bin-width: \(b = 0.05\)
\(m\) = 201 tests
Then \(Bin_{.95}(m, b)\) =

qbinom(p = 0.95, size = 201, prob = 0.05);

[1] 15

The study observed 27 p < 0.05, which exceeds the null cutoff, thus giving evidence of signal:

QC test for irregular pvalue histogram

The same idea can be used to test departures from uniformity anywhere between 0 and 1, not only near 0
A binwidth of 0.05 gives 20 bins, and Bonferroni corrected \(\alpha= 0.05/20 = 0.0025\), or the \(Bin_{.9975}(m, b)\) percentile

Example:

Recall the study of (Fischl et al. 2014)
\(m\) = 23,332 tests
bin-width: \(b = 0.05\)
Bonferroni corrected null threshold is:

qbinom(p = 0.9975, size = 23332, prob = 0.05);

[1] 1261

Correlated pvalues

All previous results assume the pvalues are independent
Rarely true for cancer ’omic studies (e.g. correlated genes)

(Breheny 2018) propose a permutation method for the previous signal and QC tests that accounts for correlation:

For example, test association between outcome \(Y\) with gene expression matrix \(X\)

Permute \(Y\) to remove relationship between \(Y\) and \(X\) while preserving the correlation structure of \(X\)
Rerun all tests on permuted dataset and record the pvalues
- Obtains p-values from the null distribution without assuming independence
Record the count in the most highest bin from (2)
Repeat (1-3) 1000 times
The permutation-corrected QC cutoff is the 95th percentile of the distribution in (4)
- Similarly, for the permutation-corrected signal cutoff, record the number of p<0.05 in each permutation dataset, then use 95th percentile of this distribution.

Example:

Unpublished gene expression study

Histogram suggests problem, but genes are correlated:

Permutation-corrected QC cutoff suggests no problem:

Summary

Flat pvalue histogram \(H_0\) always true
Slopes down left-to-right \(H_A\) sometimes true
- Binomial test for signal: far-left bin deviate from \(H_0\)?
- If no tests significant after multiple testing correction, but the Binomial test is significant for an overall signal, this suggests your study was underpowered.
Irregular histograms problem with analysis/study
- QC binomial test: does any area of hist deviate from \(H_0\)?
- Try a robust nonparametric method instead
- Check measurement error or problems in study design
In practice, apply the signal and QC tests assuming independence. If you exceed either threshold, try the permutation method to confirm.

R function

pvalue.histogram <- function(
    pvalues, # vector of pvalues
    b = 0.05, # width of each bin in histogram
    alpha = 0.05, # significance level of signal test
    ... # other args to create.histogram
    ) {
    stopifnot(all(is.numeric(pvalues)) & all(pvalues > 0) & all (pvalues < 1));
    stopifnot(length(b) == 1 & is.numeric(b) & b >= 0 & b <= 0.2);

    p.df <- data.frame(p = pvalues);
    m <- sum(!is.na(pvalues));
    signal.cutoff <- qbinom(
        p = 1 - alpha,
        size = m,
        prob = b
        );
    qc.cutoff <- qbinom(
        p = 1 - alpha / (1 / b),
        size = m,
        prob = b
        );
    BoutrosLab.plotting.general::create.histogram(
        x = p.df$p,
        ylab.label = 'Frequency',
        xlab.label = 'pvalues',
        breaks = seq(0, 1, by = b),
        type = 'count',
        abline.h = c(signal.cutoff, qc.cutoff),
        abline.col = c('red', 'blue'),
        abline.lwd = 3
        );
    }

# Example:
set.seed(123);
pvals <- c(
    runif(20, 0, 0.001),
    runif(80)
    );
pvalue.histogram(pvals);

Signal and QC thresholds

References

Breheny, Patrick et al. 2018. “P-Value Histograms: Inference and Diagnostics.” High-Throughput 7 (3): 23.

Fischl, Adrian M, Paula M Heron, Arnold J Stromberg, and Timothy S McClintock. 2014. “Activity-Dependent Genes in Mouse Olfactory Sensory Neurons.” Chemical Senses 39 (5): 439–49.

Rogier, Eric W, Aubrey L Frantz, Maria EC Bruno, Leia Wedlund, Donald A Cohen, Arnold J Stromberg, and Charlotte S Kaetzel. 2014. “Secretory Antibodies in Breast Milk Promote Long-Term Intestinal Homeostasis by Regulating the Gut Microbiota and Host Gene Expression.” Proceedings of the National Academy of Sciences 111 (8): 3074–79.