Jackknife [Bioanalytics]

posted by Helmut Homepage – Vienna, Austria, 2010-11-18 19:35 (5307 d 12:01 ago) – Posting: # 6170
Views: 10,944

Hi Marko!

Another reply; will be lenghty…

Let’s dive in. Bootstrapping is nice, but as a random resampling procedure it will not give you the same results if repeated (I would guess, regulators would hate that). Below five runs (calibrators) of 50,000 samples each:

           Estimate S.e.(boot)       50%       2.5%    97.5%
Intercept 0.1857501 0.183498823 0.1754573 -0.1034715  0.5112881
Intercept 0.1857501 0.177899951 0.1757267 -0.1050423  0.5080163
Intercept 0.1857501 0.181154692 0.1758359 -0.09813111 0.5226411
Intercept 0.1857501 0.174302130 0.1757267 -0.1005911  0.5080163
Intercept 0.1857501 0.171805180 0.1757267 -0.1050744  0.5083477

Slope     0.9891061 0.005718894 0.9886411  0.9818633  1.0044841
Slope     0.9891061 0.005606322 0.9886513  0.9819443  1.0044762
Slope     0.9891061 0.005654949 0.9885763  0.98194865 1.0043886
Slope     0.9891061 0.005613795 0.9886046  0.9819592  1.0045131
Slope     0.9891061 0.005652954 0.9886552  0.9819592  1.0044587


You see that standard errors – and thus the CI – are similar, but not identical (although the decision would always be the same).

An alternative would be the Jackknife, because you will always get the same results. Unfortunally there is no package readily available, but I found some code by Terry Therneau in the R-help archives.

# Generalized Deming regression, based on Ripley, Analyst, 1987:377-383.
#
deming <- function(x, y, xstd, ystd, jackknife=TRUE, dfbeta=FALSE,
                   scale=TRUE) {
    Call <- match.call()
    n <- length(x)
    if (length(y) !=n) stop("x and y must be the same length")
    if (length(xstd) != length(ystd))
        stop("xstd and ystd must be the same length")

    # Do missing value processing
    nafun <- get(options()$na.action)
    if (length(xstd)==n) {
        tdata <- nafun(data.frame(x=x, y=y, xstd=xstd, ystd=ystd))
        x <- tdata$x
        y <- tdata$y
        xstd <- tdata$xstd
        ystd <- tdata$ystd
        }
    else {
        tdata <- nafun(data.frame(x=x, y=y))
        x <- tdata$x
        y <- tdata$y
        if (length(xstd) !=2) stop("Wrong length for std specification")
        xstd <- xstd[1] + xstd[2]*x
        ystd <- ystd[1] + ystd[2]*y
        }

    if (any(xstd <=0) || any(ystd <=0)) stop("Std must be positive")

    minfun <- function(beta, x, y, xv, yv) {
        w <- 1/(yv + beta^2*xv)
        alphahat <- sum(w * (y - beta*x))/ sum(w)
        sum(w*(y-(alphahat + beta*x))^2)
        }

    minfun0 <- function(beta, x, y, xv, yv) {
        w <- 1/(yv + beta^2*xv)
        alphahat <- 0  #constrain to zero
        sum(w*(y-(alphahat + beta*x))^2)
        }

    afun <-function(beta, x, y, xv, yv) {
        w <- 1/(yv + beta^2*xv)
        sum(w * (y - beta*x))/ sum(w)
        }

    fit <- optimize(minfun, c(.1, 10), x=x, y=y, xv=xstd^2, yv=ystd^2)
    coef <- c(intercept=afun(fit$minimum, x, y, xstd^2, ystd^2),
               slope=fit$minimum)
    fit0 <- optimize(minfun0, coef[2]*c(.5, 1.5), x=x, y=y,
                     xv=xstd^2, yv=ystd^2)

    w <- 1/(ystd^2 + (coef[2]*xstd)^2) #weights
    u <- w*(ystd^2*x + xstd^2*coef[2]*(y-coef[1])) #imputed "true" value
    if (is.logical(scale) && scale) {
        err1 <- (x-u)/ xstd
        err2 <- (y - (coef[1] + coef[2]*u))/ystd
        sigma <- sum(err1^2 + err2^2)/(n-2)
        # Ripley's paper has err = [y - (a + b*x)] * sqrt(w); gives the same SS
        }
    else sigma <- scale^2
   
    test1 <- (coef[2] -1)*sqrt(sum(w *(x-u)^2)/sigma) #test for beta=1
    test2 <- coef[1]*sqrt(sum(w*x^2)/sum(w*(x-u)^2) /sigma) #test for a=0
                     
    rlist <- list(coefficient=coef, test1=test1, test0=test2, scale=sigma,
                  err1=err1, err2=err2, u=u)

    if (jackknife) {
        delta <- matrix(0., nrow=n, ncol=2)
        for (i in 1:n) {
            fit <- optimize(minfun, c(.5, 1.5)*coef[2],
                            x=x[-i], y=y[-i], xv=xstd[-i]^2, yv=ystd[-i]^2)
            ahat <- afun(fit$minimum, x[-i], y[-i], xstd[-i]^2, ystd[-i]^2)
            delta[i,] <- coef - c(ahat, fit$minimum)
            }
        rlist$variance <- t(delta) %*% delta
        if (dfbeta) rlist$dfbeta <- delta
        }

    rlist$call <- Call
    class(rlist) <- 'deming'
    rlist
    }

print.deming <- function(x, ...) {
    cat("\nCall:\n", deparse(x$call), "\n\n", sep = "")
    if (is.null(x$variance)) {
        table <- matrix(0., nrow=2, ncol=3)
        table[,1] <- x$coefficient
        table[,2] <- c(x$test0, x$test1)
        table[,3] <- pnorm(-2*abs(table[,2]))
        dimnames(table) <- list(c("Intercept", "Slope"),
                                c("Coef", "z", "p"))
        }
    else {
        table <- matrix(0., nrow=2, ncol=4)
        table[,1] <- x$coefficient
        table[,2] <- sqrt(diag(x$variance))
        table[,3] <- c(x$test0, x$test1)
        table[,4] <- pnorm(-2*abs(table[,3]))
        dimnames(table) <- list(c("Intercept", "Slope"),
                                c("Coef", "se(coef)", "z", "p"))
        }
    print(table, ...)
    cat("\n   Scale=", format(x$scale, ...), "\n")
    invisible(x)
    }


For your calibration data:

fit <- deming(x,y, xstd=c(1,0), ystd=c(1,0), jackknife=TRUE)
print(fit)

I got:
Call:
deming(x = x, y = y, xstd = c(1, 0), ystd = c(1, 0), jackknife = TRUE)

               Coef   se(coef)            z         p
Intercept 0.1865192 0.14052612 124.48408471 0.0000000
Slope     0.9890887 0.00546163  -0.01336272 0.4893394


Hhm, strange. Intercept and slope are not the same as with the other procedures?! Ah, some weighting is used…

Let’s calculate the 95% confidence interval based on the t-distribution anyhow:
df    <- length(x)-2
t     <- qt(1-0.05/2, df=df)
CL1.a <- fit$coefficient[[1]] - t*sqrt(fit$variance[1, 1])
CL2.a <- fit$coefficient[[1]] + t*sqrt(fit$variance[1, 1])
CL1.b <- fit$coefficient[[2]] - t*sqrt(fit$variance[2, 2])
CL2.b <- fit$coefficient[[2]] + t*sqrt(fit$variance[2, 2])
cat(paste("Intercept (95% CI):",
  signif(min(CL1.a, CL2.a), 7), signif(max(CL1.a, CL2.a), 7),
  "\nSlope     (95% CI):",
  signif(min(CL1.b, CL2.b), 7), signif(max(CL1.b, CL2.b), 7), "\n"))


I got:
Intercept (95% CI): -0.1573358 0.5303742
Slope     (95% CI):  0.9757246 1.002453

Again no constant and/or proportional bias.

If I have time (haha), I will check at R-help about the weighting. If I set all weights=1, I get still different results…

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,682 registered users;
75 visitors (0 registered, 75 guests [including 46 identified bots]).
Forum time: 08:36 CEST (Europe/Vienna)

Nerds don’t just happen to dress informally.
They do it too consistently.
Consciously or not, they dress informally
as a prophylactic measure against stupidity.    Paul Graham

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5