## Reference ranges, prediction intervals [GxP / QC / QA]

Hi ElMaestro,

» We have n observations from a group of people we think are normal. We define the reference range form them. Then we sample the next subjects and checks if they are within the interval of observations defining normal ("standard"), too. And so forth. That interval defining normal builds the sample mean and a sampling variance from the normal ones into the prediction. Sounds right. At least to me when I read it. To you, too?

Yep, if we agree upon that $$\small{x\in\mathcal{N}(\mu,\sigma^2)}$$.

» […] CROs or path labs sample some subjects and take the 2.5th percentile and the 97.5th percentile to define reference ranges. I have generally accepted that. I have generally accepted anything as long as the CRO or path lab has given the ref. range some consideration.

Nonparametrics are always fine for me. » […] I have not ever seen any CRO use the prediction interval approach to define ref. ranges.

» Should they?

Yes.

» What is your opinion here?

Both parametric and nonparametric are fine. I any case the number of subjects should be reasonably large.

set.seed(123456) n       <- 100 mean    <- 20 sd      <- 5 x       <- rnorm(n = n, mean = mean, sd = sd) pred.p  <- mean(x)+c(-1, +1)*qt(1-0.05/2, n-1)*sd(x)*sqrt((1+1/n)) pred.np <- as.numeric(quantile(x, p = c(0.025, 0.975))) ref     <- data.frame(method = c("parametric", "nonparametric"),                       location = c(mean(x), median(x)),                       pred.lo = c(pred.p, pred.np),                       pred.hi = c(pred.p, pred.np)) col     <- c("blue", "red") tmp.x   <- seq(min(x), max(x), length.out = 201) tmp.y   <- dnorm(tmp.x, mean = ref[1, 2], sd = sd(x)) h       <- hist(x, breaks = "FD", plot = FALSE) ylim    <- range(c(tmp.y, h\$density)) plot(h, freq = FALSE, col = "bisque", border = "darkgrey",      ylim = ylim, las = 1, font.main = 1) lines(tmp.x, tmp.y, col = "blue") abline(v = c(ref[1, 3:4], ref[2, 3:4]), lwd = 2, col = rep(col, each = 2)) rug(x, ticksize = 0.015); box() legend("topright", bg = "white", title = "95% prediction intervals",        legend = ref[, 1], lwd = 2, col = col, cex = 0.9) subj    <- 2500 smpl    <- rnorm(n = subj, mean = mean, sd = sd) comp    <- data.frame(method = c("parametric", "nonparametric"),                       within.range = c(length(which(smpl >= ref[1, 3] &                                                     smpl <= ref[1, 4]))/subj,                                        length(which(smpl >= ref[2, 3] &                                                     smpl <= ref[2, 4]))/subj)) cat("Reference ranges based on", n, "subjects:\n"); print(ref, row.names = FALSE) cat("Fraction of", subj, "tested subjects within",     "reference ranges:\n"); print(comp, row.names = FALSE) Reference ranges based on 100 subjects:         method location  pred.lo  pred.hi     parametric 20.08410 10.17835 29.98985  nonparametric 20.23954 11.06342 28.14009 Fraction of 2500 tested subjects within reference ranges:         method within.range     parametric       0.9560  nonparametric       0.9136

In Austria samples are sent to labs for – mandatory – annual interlaboratory comparisons (“Ring­ver­suche”). Funny that the lab gets a certificate of attendance but if a result is off, only a separate notification (not stated in the certificate). IMHO, reference ranges come in two flavors. (Semi-)official ones which are based on the outcome of the interlaboratory comparisons. They are often wider than the ones of labs because different methods come into play. AFAIK, sometimes when reagents are modified, vendors notify labs that results may change (and hence, the reference range has to be adapted). See this post what to do if that happens during a study.

» Have you seen the approach with prediction intervals in use in any operations relating to BE?

Not my cup of tea, sorry.

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes Ing. Helmut Schütz 