Broken-stick regression [Bioanalytics]

posted by Helmut Homepage – Vienna, Austria, 2020-08-04 18:04 (84 d 17:02 ago) – Posting: # 21822
Views: 1,893

Hi Shuanghe,

» I sent 2 emails with csv and excel file, the latter has much more columns in case you needed it.

THX!

» » An idea: Set limits based on the one batch of the method validation which was performed in the anticipated maximum size of a run.
»
» Damn, that's a nice idea. I didn't thought about it during discussion with CRO about the modification of their SOP.

We can perform a [image] broken-stick regression:

library(segmented)
data <- NULL # nullify eventual previous one
data <- read.csv("https://bebac.at/downloads/Post21469.csv", sep = ",", dec = ".",
                 stringsAsFactors = TRUE)
data <- data[!is.na(data$IS.area), ] # remove ones without IS
attach(data)
set.seed(123456)
mod <- lm(IS.area ~ injection, data = data)
seg <- segmented(mod)
con <- predict(seg, se.fit = TRUE, interval = "confidence")
pre <- suppressWarnings(predict(seg, se.fit = TRUE, interval = "prediction"))
summary(seg); confint(seg); slope(seg, digits = 9); intercept(seg, digits = 9)$injection
if (is.null(attr(dev.list(), "names"))) windows(width = 6, height = 5)
op  <- par(no.readonly = TRUE)
par(mar=c(4, 4, 0, 0.3) + 0.1, cex.axis = 1, cex.lab = 1.25)
col <- c("blue", "red", "darkgreen", "magenta")
bg  <- c("#0000FF80", "#FF000080", "#00900080", "#FF00FF80")
plot(seg, ylim = range(IS.area, na.rm = TRUE),
     xlab = "Injection Number", ylab = "IS Area", lwd = 3, col = "blue")
grid()
abline(v = confint(seg), col = "blue", lty = c(1, 3, 3))
text(confint(seg)[2], max(IS.area, na.rm = TRUE), labels = round(confint(seg)[2]),
     pos = 2, cex = 0.9)
text(confint(seg)[1], min(IS.area, na.rm = TRUE), labels = round(confint(seg)[1]),
     pos = 4, cex = 0.9)
text(confint(seg)[3], max(IS.area, na.rm = TRUE), labels = round(confint(seg)[3]),
     pos = 4, cex = 0.9)
lines(as.numeric(dimnames(pre$fit)[[1]]), con$fit[, "lwr"], col = "blue")
lines(as.numeric(dimnames(pre$fit)[[1]]), con$fit[, "upr"], col = "blue")
lines(as.numeric(dimnames(pre$fit)[[1]]), pre$fit[, "lwr"], col = "blue", lty = 2)
lines(as.numeric(dimnames(pre$fit)[[1]]), pre$fit[, "upr"], col = "blue", lty = 2)
points(injection[type == "C"], IS.area[type == "C"], pch = 21, col = col[1], bg = bg[1],
       cex = 1.35)
points(injection[type == "B"], IS.area[type == "B"], pch = 21, col = col[2], bg = bg[2],
       cex = 1.35)
points(injection[type == "Q"], IS.area[type == "Q"], pch = 21, col = col[3], bg = bg[3],
       cex = 1.35)
points(injection[type == "U"], IS.area[type == "U"], pch = 21, col = col[4], bg = bg[4],
       cex = 1.35)
par(op)


Which gives

        ***Regression Model with Segmented Relationship(s)***

Call:
segmented.lm(obj = mod)

Estimated Break-Point(s):
                   Est. St.Err
psi1.injection 107.392  1.054

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  458563.17    3100.28 147.910  < 2e-16 ***
injection       151.47      49.54   3.058  0.00269 **
U1.injection   5439.53     221.22  24.589       NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15150 on 135 degrees of freedom
Multiple R-Squared: 0.9462,  Adjusted R-squared: 0.945

Convergence attained in 2 iter. (rel. change 0)
                  Est. CI(95%).low CI(95%).up
psi1.injection 107.392     105.306    109.477
$injection
            Est.   St.Err.   t value  CI(95%).l CI(95%).u
slope1  151.4664  49.53669  3.057662   53.49811  249.4348
slope2 5591.0012 215.60080 25.932191 5164.60917 6017.3932
                Est.
intercept1  458563.2
intercept2 -125596.8


The fit is fine, you can try

psi   <- round(confint(seg)[1])
limb1 <- data[(data$injection <= psi), ]
limb2 <- data[(data$injection > psi), ]
st.r  <- rstudent(seg)
inj   <- as.numeric(names(st.r))
ylim  <- c(-1, 1)*max(abs(st.r)) # symmetric
col   <- c("blue", "red")
bg    <- c("#0000FF80", "#FF000080")
plot(inj, st.r, type = "n", ylim = ylim, xlab = "Injection",
     ylab = "Studentized residual")
grid(); box(); abline(h = 0)
rug(inj, ticksize = 0.015); rug(st.r, side = 2, ticksize = 0.015)
points(inj[inj <= psi], st.r[inj <= psi], pch = 21, col = col[1],
       bg = bg[1], cex = 1.35)
points(inj[inj > psi], st.rs[inj > psi], pch = 21, col = col[2],
       bg = bg[2], cex = 1.35)
plot(seg$fitted, st.r, type = "n", ylim = ylim, xlab = "Fitted",
     ylab = "Studentized residual")
grid(); box(); abline(h = 0)
rug(seg$fitted, ticksize = 0.015); rug(st.r, side = 2, ticksize = 0.015)
points(seg$fitted[inj <= psi], st.r[inj <= psi], pch = 21, col = col[1],
       bg = bg[1], cex = 1.35)
points(seg$fitted[inj > psi], st.r[inj > psi], pch = 21, col = col[2],
       bg = bg[2], cex = 1.35)


[image]

12 |stud. res| > 2 (8.6%) and 1 > 3 (0.7%). Doesn’t worry me.
1st residual plot: No trend, no funnel shape, similar spread in the two limbs (in the second a bit larger).
2nd residual plot: The two clusters are evident which, of course, we know from looking at the entire batch (fit, 95% confidence and prediction intervals, breakpoint and its 95% confidence interval):

[image]

It doesn’t need a spectacled rocket scientist to see that the slopes are different.
Well, we can test that.* H0 is that the difference of slopes = 0.

pscore.test(seg, k = nrow(data), alternative = "two.sided")

Gives

        Score test for one/two changes in the slope

data:  formula = IS.area ~ injection
breakpoint for variable = injection
model = gaussian , link = identity , method = segmented.lm
observed value = 26.288, n.points = 139,
p-value < 2.2e-16
alternative hypothesis: two.sided   (1 breakpoint)

Gotcha!

» » If you have data of this batch, please give it as well.
» Sorry, I don't have it.

I will try to simulate some stuff based on the first part. Then we could set control limits based on Bland-Altman.


Edit: I was asking myself how sensitive this approach is, i.e., how much can the slopes differ to get a significant p-value. Crude method: I fitted the 2nd limb and gradually decreased the slope until it was equal to the one of the 1st limb.

psi     <- round(confint(seg)[1])
slope   <- as.numeric(slope(seg, digits=15)$injection[, 1])
intcpt  <- as.numeric(intercept(seg, digits=15)$injection[, 1])
limb1   <- data[(data$injection <= psi), c(1, 4)]
limb2   <- data[(data$injection > psi), c(1, 4)]
fit1    <- intcpt[1]+slope[2]*limb1$injection
fit2    <- intcpt[2]+slope[2]*limb2$injection
sims    <- 50
slp.new <- seq(slope[2], slope[1], length.out = sims)
int.new <- seq(intcpt[2], intcpt[1], length.out = sims)
noise   <- intcpt[2]+slope[2]*limb2$injection - limb2$IS.area
res     <- data.frame(slope = rep(slp.new, each = nrow(limb2)),
                      intercept = rep(int.new, each = nrow(limb2)),
                      inj = limb2$injection, fit = NA,
                      noise = rep(noise, length(slp.new)),
                      IS.area = NA)
for (j in 1:nrow(res)) {
  res$fit[j]     <- res$intercept[j]+res$slope[j]*res$inj[j]
  res$IS.area[j] <- res$fit[j]+res$noise[j]
}
names(res)[3] <- "injection"
sens <- data.frame(slope.ratio = slp.new/slope[1],
                   p.value = NA, sign = FALSE)
for (j in 1:sims) {
  tmp <- res[, c(1, 3, 6)]   sel <- which(tmp$slope == slp.new[j])
  tmp <- rbind(limb1, tmp[sel, 2:3])
  m1  <- lm(IS.area ~ injection, data = tmp)
  s1  <- segmented(m1)
  sens$p.value[j] <- pscore.test(s1, k = nrow(tmp),
                                 alternative = "two.sided")$p.value
  if (sens$p.value[j] < 0.05) sens$sign[j] <- TRUE
}
sens[, 1:2] <- signif(sens[, 1:2], 4)
print(sens, row.names = FALSE)

Gives

 slope.ratio   p.value  sign
      36.910 3.408e-55  TRUE
      36.180 3.607e-54  TRUE
      35.450 3.946e-53  TRUE
      34.710 4.468e-52  TRUE
      33.980 5.237e-51  TRUE
      33.250 6.357e-50  TRUE
      32.520 7.998e-49  TRUE
      31.780 1.043e-47  TRUE
      31.050 1.412e-46  TRUE
      30.320 1.982e-45  TRUE
      29.580 2.889e-44  TRUE
      28.850 4.372e-43  TRUE
      28.120 6.871e-42  TRUE
      27.380 1.121e-40  TRUE
      26.650 1.900e-39  TRUE
      25.920 3.343e-38  TRUE
      25.190 6.102e-37  TRUE
      24.450 1.155e-35  TRUE
      23.720 2.266e-34  TRUE
      22.990 4.599e-33  TRUE
      22.250 9.649e-32  TRUE
      21.520 2.088e-30  TRUE
      20.790 4.650e-29  TRUE
      20.060 1.063e-27  TRUE
      19.320 2.486e-26  TRUE
      18.590 5.924e-25  TRUE
      17.860 1.432e-23  TRUE
      17.120 3.492e-22  TRUE
      16.390 8.538e-21  TRUE
      15.660 2.078e-19  TRUE
      14.930 4.994e-18  TRUE
      14.190 1.174e-16  TRUE
      13.460 2.670e-15  TRUE
      12.730 5.811e-14  TRUE
      11.990 1.194e-12  TRUE
      11.260 2.283e-11  TRUE
      10.530 4.000e-10  TRUE
       9.795 6.318e-09  TRUE
       9.062 8.835e-08  TRUE
       8.329 1.074e-06  TRUE
       7.596 1.115e-05  TRUE
       6.863 9.702e-05  TRUE
       6.130 5.533e-04  TRUE
       5.397 3.312e-03  TRUE
       4.665 1.615e-02  TRUE
       3.932 6.266e-02 FALSE
       3.199 1.904e-01 FALSE
       2.466 4.514e-01 FALSE
       1.733 8.453e-01 FALSE
       1.000 7.228e-01 FALSE

The slope of the 2nd limb can be ~4 times higher than the 1st till it gets detected.

[image]


TODO: Scale the noise (which was higher in the 2nd than in the 1st limb.



Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Activity
 Admin contact
21,179 posts in 4,414 threads, 1,474 registered users;
online 8 (1 registered, 7 guests [including 2 identified bots]).
Forum time: Wednesday 10:07 CET (Europe/Vienna)

Actually, science starts to become interesting
only where it ends.    Justus von Liebig

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5