Broken-stick regression [Bioanalytics]

posted by Helmut Homepage – Vienna, Austria, 2020-08-04 16:04 (112 d 18:51 ago) – Posting: # 21822
Views: 1,971

Hi Shuanghe,

» I sent 2 emails with csv and excel file, the latter has much more columns in case you needed it.

THX!

» » An idea: Set limits based on the one batch of the method validation which was performed in the anticipated maximum size of a run.
»
» Damn, that's a nice idea. I didn't thought about it during discussion with CRO about the modification of their SOP.

We can perform a [image] broken-stick regression:

library(segmented)
data <- NULL # nullify eventual previous one
data <- read.csv("https://bebac.at/downloads/Post21469.csv", sep = ",", dec = ".",
                 stringsAsFactors = TRUE)
data <- data[!is.na(data$IS.area), ] # remove ones without IS
attach(data)
set.seed(123456)
mod <- lm(IS.area ~ injection, data = data)
seg <- segmented(mod)
con <- predict(seg, se.fit = TRUE, interval = "confidence")
pre <- suppressWarnings(predict(seg, se.fit = TRUE, interval = "prediction"))
summary(seg); confint(seg); slope(seg, digits = 9); intercept(seg, digits = 9)$injection
if (is.null(attr(dev.list(), "names"))) windows(width = 6, height = 5)
op  <- par(no.readonly = TRUE)
par(mar=c(4, 4, 0, 0.3) + 0.1, cex.axis = 1, cex.lab = 1.25)
col <- c("blue", "red", "darkgreen", "magenta")
bg  <- c("#0000FF80", "#FF000080", "#00900080", "#FF00FF80")
plot(seg, ylim = range(IS.area, na.rm = TRUE),
     xlab = "Injection Number", ylab = "IS Area", lwd = 3, col = "blue")
grid()
abline(v = confint(seg), col = "blue", lty = c(1, 3, 3))
text(confint(seg)[2], max(IS.area, na.rm = TRUE), labels = round(confint(seg)[2]),
     pos = 2, cex = 0.9)
text(confint(seg)[1], min(IS.area, na.rm = TRUE), labels = round(confint(seg)[1]),
     pos = 4, cex = 0.9)
text(confint(seg)[3], max(IS.area, na.rm = TRUE), labels = round(confint(seg)[3]),
     pos = 4, cex = 0.9)
lines(as.numeric(dimnames(pre$fit)[[1]]), con$fit[, "lwr"], col = "blue")
lines(as.numeric(dimnames(pre$fit)[[1]]), con$fit[, "upr"], col = "blue")
lines(as.numeric(dimnames(pre$fit)[[1]]), pre$fit[, "lwr"], col = "blue", lty = 2)
lines(as.numeric(dimnames(pre$fit)[[1]]), pre$fit[, "upr"], col = "blue", lty = 2)
points(injection[type == "C"], IS.area[type == "C"], pch = 21, col = col[1], bg = bg[1],
       cex = 1.35)
points(injection[type == "B"], IS.area[type == "B"], pch = 21, col = col[2], bg = bg[2],
       cex = 1.35)
points(injection[type == "Q"], IS.area[type == "Q"], pch = 21, col = col[3], bg = bg[3],
       cex = 1.35)
points(injection[type == "U"], IS.area[type == "U"], pch = 21, col = col[4], bg = bg[4],
       cex = 1.35)
par(op)


Which gives

        ***Regression Model with Segmented Relationship(s)***

Call:
segmented.lm(obj = mod)

Estimated Break-Point(s):
                   Est. St.Err
psi1.injection 107.392  1.054

Meaningful coefficients of the linear terms:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept)  458563.17    3100.28 147.910  < 2e-16 ***
injection       151.47      49.54   3.058  0.00269 **
U1.injection   5439.53     221.22  24.589       NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15150 on 135 degrees of freedom
Multiple R-Squared: 0.9462,  Adjusted R-squared: 0.945

Convergence attained in 2 iter. (rel. change 0)
                  Est. CI(95%).low CI(95%).up
psi1.injection 107.392     105.306    109.477
$injection
            Est.   St.Err.   t value  CI(95%).l CI(95%).u
slope1  151.4664  49.53669  3.057662   53.49811  249.4348
slope2 5591.0012 215.60080 25.932191 5164.60917 6017.3932
                Est.
intercept1  458563.2
intercept2 -125596.8


The fit is fine, you can try

psi   <- round(confint(seg)[1])
limb1 <- data[(data$injection <= psi), ]
limb2 <- data[(data$injection > psi), ]
st.r  <- rstudent(seg)
inj   <- as.numeric(names(st.r))
ylim  <- c(-1, 1)*max(abs(st.r)) # symmetric
col   <- c("blue", "red")
bg    <- c("#0000FF80", "#FF000080")
plot(inj, st.r, type = "n", ylim = ylim, xlab = "Injection",
     ylab = "Studentized residual")
grid(); box(); abline(h = 0)
rug(inj, ticksize = 0.015); rug(st.r, side = 2, ticksize = 0.015)
points(inj[inj <= psi], st.r[inj <= psi], pch = 21, col = col[1],
       bg = bg[1], cex = 1.35)
points(inj[inj > psi], st.rs[inj > psi], pch = 21, col = col[2],
       bg = bg[2], cex = 1.35)
plot(seg$fitted, st.r, type = "n", ylim = ylim, xlab = "Fitted",
     ylab = "Studentized residual")
grid(); box(); abline(h = 0)
rug(seg$fitted, ticksize = 0.015); rug(st.r, side = 2, ticksize = 0.015)
points(seg$fitted[inj <= psi], st.r[inj <= psi], pch = 21, col = col[1],
       bg = bg[1], cex = 1.35)
points(seg$fitted[inj > psi], st.r[inj > psi], pch = 21, col = col[2],
       bg = bg[2], cex = 1.35)


[image]

12 |stud. res| > 2 (8.6%) and 1 > 3 (0.7%). Doesn’t worry me.
1st residual plot: No trend, no funnel shape, similar spread in the two limbs (in the second a bit larger).
2nd residual plot: The two clusters are evident which, of course, we know from looking at the entire batch (fit, 95% confidence and prediction intervals, breakpoint and its 95% confidence interval):

[image]

It doesn’t need a spectacled rocket scientist to see that the slopes are different.
Well, we can test that.* H0 is that the difference of slopes = 0.

pscore.test(seg, k = nrow(data), alternative = "two.sided")

Gives

        Score test for one/two changes in the slope

data:  formula = IS.area ~ injection
breakpoint for variable = injection
model = gaussian , link = identity , method = segmented.lm
observed value = 26.288, n.points = 139,
p-value < 2.2e-16
alternative hypothesis: two.sided   (1 breakpoint)

Gotcha!

» » If you have data of this batch, please give it as well.
» Sorry, I don't have it.

I will try to simulate some stuff based on the first part. Then we could set control limits based on Bland-Altman.


Edit: I was asking myself how sensitive this approach is, i.e., how much can the slopes differ to get a significant p-value. Crude method: I fitted the 2nd limb and gradually decreased the slope until it was equal to the one of the 1st limb.

psi     <- round(confint(seg)[1])
slope   <- as.numeric(slope(seg, digits=15)$injection[, 1])
intcpt  <- as.numeric(intercept(seg, digits=15)$injection[, 1])
limb1   <- data[(data$injection <= psi), c(1, 4)]
limb2   <- data[(data$injection > psi), c(1, 4)]
fit1    <- intcpt[1]+slope[2]*limb1$injection
fit2    <- intcpt[2]+slope[2]*limb2$injection
sims    <- 50
slp.new <- seq(slope[2], slope[1], length.out = sims)
int.new <- seq(intcpt[2], intcpt[1], length.out = sims)
noise   <- intcpt[2]+slope[2]*limb2$injection - limb2$IS.area
res     <- data.frame(slope = rep(slp.new, each = nrow(limb2)),
                      intercept = rep(int.new, each = nrow(limb2)),
                      inj = limb2$injection, fit = NA,
                      noise = rep(noise, length(slp.new)),
                      IS.area = NA)
for (j in 1:nrow(res)) {
  res$fit[j]     <- res$intercept[j]+res$slope[j]*res$inj[j]
  res$IS.area[j] <- res$fit[j]+res$noise[j]
}
names(res)[3] <- "injection"
sens <- data.frame(slope.ratio = slp.new/slope[1],
                   p.value = NA, sign = FALSE)
for (j in 1:sims) {
  tmp <- res[, c(1, 3, 6)]   sel <- which(tmp$slope == slp.new[j])
  tmp <- rbind(limb1, tmp[sel, 2:3])
  m1  <- lm(IS.area ~ injection, data = tmp)
  s1  <- segmented(m1)
  sens$p.value[j] <- pscore.test(s1, k = nrow(tmp),
                                 alternative = "two.sided")$p.value
  if (sens$p.value[j] < 0.05) sens$sign[j] <- TRUE
}
sens[, 1:2] <- signif(sens[, 1:2], 4)
print(sens, row.names = FALSE)

Gives

 slope.ratio   p.value  sign
      36.910 3.408e-55  TRUE
      36.180 3.607e-54  TRUE
      35.450 3.946e-53  TRUE
      34.710 4.468e-52  TRUE
      33.980 5.237e-51  TRUE
      33.250 6.357e-50  TRUE
      32.520 7.998e-49  TRUE
      31.780 1.043e-47  TRUE
      31.050 1.412e-46  TRUE
      30.320 1.982e-45  TRUE
      29.580 2.889e-44  TRUE
      28.850 4.372e-43  TRUE
      28.120 6.871e-42  TRUE
      27.380 1.121e-40  TRUE
      26.650 1.900e-39  TRUE
      25.920 3.343e-38  TRUE
      25.190 6.102e-37  TRUE
      24.450 1.155e-35  TRUE
      23.720 2.266e-34  TRUE
      22.990 4.599e-33  TRUE
      22.250 9.649e-32  TRUE
      21.520 2.088e-30  TRUE
      20.790 4.650e-29  TRUE
      20.060 1.063e-27  TRUE
      19.320 2.486e-26  TRUE
      18.590 5.924e-25  TRUE
      17.860 1.432e-23  TRUE
      17.120 3.492e-22  TRUE
      16.390 8.538e-21  TRUE
      15.660 2.078e-19  TRUE
      14.930 4.994e-18  TRUE
      14.190 1.174e-16  TRUE
      13.460 2.670e-15  TRUE
      12.730 5.811e-14  TRUE
      11.990 1.194e-12  TRUE
      11.260 2.283e-11  TRUE
      10.530 4.000e-10  TRUE
       9.795 6.318e-09  TRUE
       9.062 8.835e-08  TRUE
       8.329 1.074e-06  TRUE
       7.596 1.115e-05  TRUE
       6.863 9.702e-05  TRUE
       6.130 5.533e-04  TRUE
       5.397 3.312e-03  TRUE
       4.665 1.615e-02  TRUE
       3.932 6.266e-02 FALSE
       3.199 1.904e-01 FALSE
       2.466 4.514e-01 FALSE
       1.733 8.453e-01 FALSE
       1.000 7.228e-01 FALSE

The slope of the 2nd limb can be ~4 times higher than the 1st till it gets detected.

[image]


TODO: Scale the noise (which was higher in the 2nd than in the 1st limb.



Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Activity
 Admin contact
21,209 posts in 4,426 threads, 1,481 registered users;
online 21 (1 registered, 20 guests [including 13 identified bots]).
Forum time: Wednesday 10:55 UTC (Europe/Vienna)

You can’t fix by analysis
what you bungled by design.    Richard J. Light, Judith D. Singer, John B. Willett

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5