Bioequivalence and Bioavailability Forum • Jackknife

Jackknife [Bioanalytics]

posted by Helmut – Vienna, Austria, 2010-11-18 19:35 (5307 d 12:01 ago) – Posting: # 6170
Views: 10,944

Hi Marko!

Another reply; will be lenghty…

Let’s dive in. Bootstrapping is nice, but as a random resampling procedure it will not give you the same results if repeated (I would guess, regulators would hate that). Below five runs (calibrators) of 50,000 samples each:

           Estimate S.e.(boot)       50%       2.5%    97.5%

Intercept 0.1857501 0.183498823 0.1754573 -0.1034715  0.5112881

Intercept 0.1857501 0.177899951 0.1757267 -0.1050423  0.5080163

Intercept 0.1857501 0.181154692 0.1758359 -0.09813111 0.5226411

Intercept 0.1857501 0.174302130 0.1757267 -0.1005911  0.5080163

Intercept 0.1857501 0.171805180 0.1757267 -0.1050744  0.5083477



Slope     0.9891061 0.005718894 0.9886411  0.9818633  1.0044841

Slope     0.9891061 0.005606322 0.9886513  0.9819443  1.0044762

Slope     0.9891061 0.005654949 0.9885763  0.98194865 1.0043886

Slope     0.9891061 0.005613795 0.9886046  0.9819592  1.0045131

Slope     0.9891061 0.005652954 0.9886552  0.9819592  1.0044587

You see that standard errors – and thus the CI – are similar, but not identical (although the decision would always be the same).

An alternative would be the Jackknife, because you will always get the same results. Unfortunally there is no package readily available, but I found some code by Terry Therneau in the R-help archives.

# Generalized Deming regression, based on Ripley, Analyst, 1987:377-383.

#

deming <- function(x, y, xstd, ystd, jackknife=TRUE, dfbeta=FALSE,

                   scale=TRUE) {

    Call <- match.call()

    n <- length(x)

    if (length(y) !=n) stop("x and y must be the same length")

    if (length(xstd) != length(ystd)) 

        stop("xstd and ystd must be the same length") 



    # Do missing value processing

    nafun <- get(options()$na.action)

    if (length(xstd)==n) {

        tdata <- nafun(data.frame(x=x, y=y, xstd=xstd, ystd=ystd))

        x <- tdata$x

        y <- tdata$y

        xstd <- tdata$xstd

        ystd <- tdata$ystd

        }

    else {

        tdata <- nafun(data.frame(x=x, y=y))

        x <- tdata$x

        y <- tdata$y

        if (length(xstd) !=2) stop("Wrong length for std specification")

        xstd <- xstd[1] + xstd[2]*x

        ystd <- ystd[1] + ystd[2]*y

        }



    if (any(xstd <=0) || any(ystd <=0)) stop("Std must be positive")



    minfun <- function(beta, x, y, xv, yv) {

        w <- 1/(yv + beta^2*xv)

        alphahat <- sum(w * (y - beta*x))/ sum(w)

        sum(w*(y-(alphahat + beta*x))^2)

        }



    minfun0 <- function(beta, x, y, xv, yv) {

        w <- 1/(yv + beta^2*xv)

        alphahat <- 0  #constrain to zero

        sum(w*(y-(alphahat + beta*x))^2)

        }



    afun <-function(beta, x, y, xv, yv) {

        w <- 1/(yv + beta^2*xv)

        sum(w * (y - beta*x))/ sum(w)

        }



    fit <- optimize(minfun, c(.1, 10), x=x, y=y, xv=xstd^2, yv=ystd^2)

    coef <- c(intercept=afun(fit$minimum, x, y, xstd^2, ystd^2), 

               slope=fit$minimum)

    fit0 <- optimize(minfun0, coef[2]*c(.5, 1.5), x=x, y=y, 

                     xv=xstd^2, yv=ystd^2)



    w <- 1/(ystd^2 + (coef[2]*xstd)^2) #weights

    u <- w*(ystd^2*x + xstd^2*coef[2]*(y-coef[1])) #imputed "true" value

    if (is.logical(scale) && scale) {

        err1 <- (x-u)/ xstd

        err2 <- (y - (coef[1] + coef[2]*u))/ystd

        sigma <- sum(err1^2 + err2^2)/(n-2)

        # Ripley's paper has err = [y - (a + b*x)] * sqrt(w); gives the same SS

        }

    else sigma <- scale^2

    

    test1 <- (coef[2] -1)*sqrt(sum(w *(x-u)^2)/sigma) #test for beta=1

    test2 <- coef[1]*sqrt(sum(w*x^2)/sum(w*(x-u)^2) /sigma) #test for a=0

                      

    rlist <- list(coefficient=coef, test1=test1, test0=test2, scale=sigma,

                  err1=err1, err2=err2, u=u)



    if (jackknife) {

        delta <- matrix(0., nrow=n, ncol=2)

        for (i in 1:n) {

            fit <- optimize(minfun, c(.5, 1.5)*coef[2], 

                            x=x[-i], y=y[-i], xv=xstd[-i]^2, yv=ystd[-i]^2)

            ahat <- afun(fit$minimum, x[-i], y[-i], xstd[-i]^2, ystd[-i]^2)

            delta[i,] <- coef - c(ahat, fit$minimum)

            }

        rlist$variance <- t(delta) %*% delta

        if (dfbeta) rlist$dfbeta <- delta

        }



    rlist$call <- Call

    class(rlist) <- 'deming'

    rlist

    }



print.deming <- function(x, ...) {

    cat("\nCall:\n", deparse(x$call), "\n\n", sep = "")

    if (is.null(x$variance)) {

        table <- matrix(0., nrow=2, ncol=3)

        table[,1] <- x$coefficient

        table[,2] <- c(x$test0, x$test1)

        table[,3] <- pnorm(-2*abs(table[,2]))

        dimnames(table) <- list(c("Intercept", "Slope"),

                                c("Coef", "z", "p"))

        }

    else {

        table <- matrix(0., nrow=2, ncol=4)

        table[,1] <- x$coefficient

        table[,2] <- sqrt(diag(x$variance))

        table[,3] <- c(x$test0, x$test1)

        table[,4] <- pnorm(-2*abs(table[,3]))

        dimnames(table) <- list(c("Intercept", "Slope"),

                                c("Coef", "se(coef)", "z", "p"))

        }

    print(table, ...)

    cat("\n   Scale=", format(x$scale, ...), "\n")

    invisible(x)

    }

For your calibration data:

fit <- deming(x,y, xstd=c(1,0), ystd=c(1,0), jackknife=TRUE)

print(fit)

I got:

Call:

deming(x = x, y = y, xstd = c(1, 0), ystd = c(1, 0), jackknife = TRUE)



               Coef   se(coef)            z         p

Intercept 0.1865192 0.14052612 124.48408471 0.0000000

Slope     0.9890887 0.00546163  -0.01336272 0.4893394

Hhm, strange. Intercept and slope are not the same as with the other procedures?! Ah, some weighting is used…

Let’s calculate the 95% confidence interval based on the t-distribution anyhow:

df    <- length(x)-2

t     <- qt(1-0.05/2, df=df)

CL1.a <- fit$coefficient[[1]] - t*sqrt(fit$variance[1, 1])

CL2.a <- fit$coefficient[[1]] + t*sqrt(fit$variance[1, 1])

CL1.b <- fit$coefficient[[2]] - t*sqrt(fit$variance[2, 2])

CL2.b <- fit$coefficient[[2]] + t*sqrt(fit$variance[2, 2])

cat(paste("Intercept (95% CI):",

  signif(min(CL1.a, CL2.a), 7), signif(max(CL1.a, CL2.a), 7),

  "\nSlope     (95% CI):",

  signif(min(CL1.b, CL2.b), 7), signif(max(CL1.b, CL2.b), 7), "\n"))

I got:

Intercept (95% CI): -0.1573358 0.5303742

Slope     (95% CI):  0.9757246 1.002453

Again no constant and/or proportional bias.

If I have time (haha), I will check at R-help about the weighting. If I set all weights=1, I get still different results…

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Integration - smoothing moblak 2010-11-17 11:46 [Bioanalytics]
- Integration - smoothing Helmut 2010-11-17 14:32
  - Integration - smoothing moblak 2010-11-17 15:26
    - Integration - smoothing Helmut 2010-11-17 16:31
      - Integration - smoothing moblak 2010-11-18 14:39
        
        Deming regression: example Helmut 2010-11-18 15:33
        JackknifeHelmut 2010-11-18 18:35
        
        Torture moblak 2010-11-19 09:25
        
        Fun! Helmut 2010-11-19 13:47
        
        Sharpen the Jackknife d_labes 2010-11-19 16:00
        
        Sharpen the Jackknife Helmut 2010-11-19 16:19
- Integration - smoothing Ohlbe 2010-11-18 01:39
  - Integration - smoothing moblak 2010-11-18 10:57
    - ANVISA Ohlbe 2010-11-18 13:58