Bioequivalence and Bioavailability Forum

Bootstrapping BE: An attempt in 🇷 [Study Assessment]

posted by Helmut – Vienna, Austria, 2021-07-08 15:50 (1457 d 23:32 ago) – Posting: # 22463
Views: 8,420

Hi Weidson,

❝ I wasn't able to find any SAS code disponible for boostrap of the crossover experiments.

No idea about SAS; a lengthy (157 lines) [image]

-script at the end.
I simulated a failed study (underpowered). Output:

Simulation goalposts n : 12 theta0: 95.00% CV : 22.00% Power : 45.78% n.tgt : 22 Simulated data set PE : 95.76% 90% CI: 79.49% – 115.37% CV : 25.58% Power : 29.84% Sample size estimation based on simulated data set n : 28 Power : 81.73% 10,000 bootstrapped studies (each n = 22) PE (median) : 95.55% 95% CI : 83.71% – 110.84% lower CL (median): 84.43% 95% CI : 75.52% – 97.12% upper CL (median): 108.07% 95% CI : 92.31% – 126.57% CV (median) : 23.31% 95% CI : 16.89% – 28.05% Empiric power : 78.07%

Note that we would need 22 subjects based on the simulation goalposts but 28 would be required based on the simulated data set. Though the PE was slightly ‘better’ (95.76% instead of 95%), the CV was ‘worse’ (25.58% instead of 22%). Hence, the bootstrapped result (with n = 22) is expected to be slightly underpowered (empiric power is the number of passing studies / the number of simulations).

Show the first six rows of the simulated data and the bootstrapped studies as well as the structure of the data.frames:

print(head(data), row.names = FALSE); str(data) subject period sequence treatment PK 1 1 RT R 0.987956758728 1 2 RT T 1.138788480791 2 1 RT R 1.290885185744 2 2 RT T 0.894663486095 3 1 RT R 1.257311382884 3 2 RT T 0.879437588333 'data.frame': 24 obs. of 5 variables: $ subject : Factor w/ 12 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5 ... $ period : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ... $ sequence : Factor w/ 2 levels "RT","TR": 1 1 1 1 1 1 1 1 1 1 ... $ treatment: Factor w/ 2 levels "R","T": 1 2 1 2 1 2 1 2 1 2 ... $ PK : num 0.988 1.139 1.291 0.895 1.257 ... print(head(res), row.names = FALSE); str(res) PE lower upper BE CV 84.56 75.84 94.27 FALSE 0.203303322180 90.14 80.44 101.01 TRUE 0.217749281620 88.38 78.42 99.60 FALSE 0.232894145006 121.50 108.45 136.13 FALSE 0.221183626922 84.70 77.56 92.50 FALSE 0.169932184060 95.79 84.69 108.33 TRUE 0.239046540681 'data.frame': 10000 obs. of 5 variables: $ PE : num 84.6 90.1 88.4 121.5 84.7 ... $ lower: num 75.8 80.4 78.4 108.5 77.6 ... $ upper: num 94.3 101 99.6 136.1 92.5 ... $ BE : logi FALSE TRUE FALSE FALSE FALSE TRUE ... $ CV : num 0.203 0.218 0.233 0.221 0.17 ...

Duno why we need bootstrapping at all. We could directly use the results of the data set:

simple <- 100 * CI.BE(pe = PE / 100, CV = CVw, n = 22) cat(sprintf("%.2f%% (%.2f%% \u2013 %.2f%%)", PE, simple[1], simple[2]), "\n") 95.76% (84.01% – 109.15%)

IMHO, that’s sufficiently close to the bootstrap’s 95.55% (84.43% – 108.07%). Given, if real-world data deviates from lognormal (e.g., contains discordant outliers) the bootstrapped results (since nonparametric) may be more reliable.

❝ I am intending to use it to verify the robustness of the conclusion of bioinequivalence. :ok:

Not sure what you mean by that. For any failed study the bloody post hoc power will be <50%. When you give the PE and CV in the function sampleN.TOST() you get immediately to number of subjects you would have needed to demonstrate BE (see the example above and lines 91–92 of the script).

❝ Is there any member of this forum have experience with bootstrap methods?

ElMaestro. ;-)

❝ Does We have any code can be aplied for failed crossovers design? :confused:

I hope the one at the end helps (never done that before, bugs are possible). If you want to use your own data, ignore lines 19–44 and provide a data.frame of results (named data). The column names are mandatory: subject (integer), period (integer), sequence ("RT" or "TR"), treatment ("R" or "T"), PK (untransformed).
Once you have that, use this code

cols <- c("subject", "period", "sequence", "treatment") data[cols] <- lapply(data[cols], factor)

to convert the columns to factors needed in the model.

library(PowerTOST)

empty <- function(n, design = "2x2x2") {

  # returns empty factorized data.frame

  if (!design %in% c("2x2x2", "2x2"))

    stop("Only simple crossover implemented.")

  data <- data.frame(subject   = rep(1:n, each = 2),

                     period    = rep(1:2L, n),

                     sequence  = c(rep(c("RT"), n),

                                   rep(c("TR"), n)),

                     treatment = c(rep(c("R", "T"), n/2),

                                   rep(c("T", "R"), n/2)),

                     PK = NA_real_)

  cols       <- c("subject", "period", "sequence", "treatment")

  data[cols] <- lapply(data[cols], factor)

  return(data)

}



alpha   <- 0.05 # guess 

CV      <- 0.22 # within-subject CV

theta0  <- 0.95 # T/R-ratio

n       <- 12L  # for these values underpowered (22 needed for >=80% power)

n.new   <- 22L  # bootstrap sample size

nsims   <- 1e4L # number of simulations

setseed <- TRUE # for reproducibility (FALSE for other ones)

data    <- empty(n = n) # gets the structure (balanced sequences)

if (setseed) {

  set.seed(123456) # fixed seed

} else {

  set.seed(NULL)   # random seed

}

# PK responses in raw scale based on the lognormal distribution

sd      <- CV2se(CV)

T       <- data.frame(subject = 1:n, PK = exp(rnorm(n = n,

                                                    mean = log(theta0),

                                                    sd = sd)))

R       <- data.frame(subject = 1:n, PK = exp(rnorm(n = n,

                                                    mean = log(1),

                                                    sd = sd)))

for (j in 1:nrow(data)) { # assign responses

  if (data$treatment[j] == "R")

    data$PK[j] <- R$PK[which(R$subject == data$subject[j])]

  if (data$treatment[j] == "T")

    data$PK[j] <- T$PK[which(T$subject == data$subject[j])]

}

ow    <- options("digits")

options(digits = 12) # more digits for ANOVA

on.exit(ow)          # ensure that options are reset if an error occurs

# keep it simple (nested model with "subject %in% sequence" is bogus)

orig  <- lm(log(PK) ~ sequence + subject + period + treatment,

                      data = data)

CVw   <- mse2CV(anova(orig)["Residuals", "Mean Sq"])

PE    <- 100 * exp(coef(orig)[["treatmentT"]])

CI    <- 100 * exp(confint(orig, "treatmentT",

                           level = 1 - 2 * alpha))

res   <- data.frame(PE = rep(NA_real_, nsims),

                    lower = NA_real_, upper = NA_real_,

                    BE = FALSE, CV = NA_real_)

pb    <- txtProgressBar(0, 1, 0, char = "\u2588", width = NA, style = 3)

dummy <- data.frame(subject   = factor(NA_integer_, levels = 1:n.new),

                    period    = factor(NA_integer_, levels = c(1, 2)),

                    sequence  = factor(NA_character_, levels = c("RT", "TR")),

                    treatment = factor(NA_character_, levels = c("R", "T")),

                    PK = NA_real_)

for (j in 1L:nsims) {

  boot <- dummy

  idx  <- sample(1:n, size = n.new, replace = TRUE) # the workhorse

  for (k in 1:n.new) {

    tmp         <- data[data$subject == idx[k], ]

    tmp$subject <- k

    boot        <- rbind(boot, tmp)    # append values

  }

  boot <- boot[!is.na(boot$subject), ] # remove first row with NAs

  options(digits = 12)

  on.exit(ow)

  mod         <- lm(log(PK) ~ sequence + subject + period + treatment,

                              data = boot)

  res[j, 5]   <- mse2CV(anova(mod)["Residuals", "Mean Sq"])

  # rounding acc. to GLs

  res[j, 1]   <- round(100 * exp(coef(mod)[["treatmentT"]]), 2)

  res[j, 2:3] <- round(100 * exp(confint(mod, "treatmentT",

                                        level = 1 - 2 * alpha)), 2)

  if (res$lower[j] >= 80 & res$upper[j] <= 125) res$BE[j] <- TRUE

  setTxtProgressBar(pb, j/nsims)

} # bootstrapping in progress...

close(pb)

probs   <- c(0.025, 0.5, 0.975) # nonparametric 95% CI and median

PE.q    <- quantile(res$PE,    probs = probs)

lower.q <- quantile(res$lower, probs = probs)

upper.q <- quantile(res$upper, probs = probs)

CV.q    <- quantile(res$CV,    probs = probs)

tmp     <- sampleN.TOST(CV = CVw, theta0 = PE / 100,

                        details = FALSE, print = FALSE)

txt     <- paste0("Simulation goalposts",

                  "\n  n     : ", sprintf("%3.0f", n),

                  "\n  theta0: ", sprintf("%6.2f%%", 100 * theta0),

                  "\n  CV    : ", sprintf("%6.2f%%", 100 * CV),

                  "\n  Power : ", sprintf("%6.2f%%",

                  100 * power.TOST(CV = CV, theta0 = theta0, n = n)),

                  "\n  n.tgt : ", sprintf("%3.0f",

                  sampleN.TOST(CV = CV, theta0 = theta0,

                               details = FALSE, print = FALSE)[["Sample size"]]),

                  "\nSimulated data set",

                  "\n  PE    : ", sprintf("%6.2f%%", PE),

                  "\n  90% CI: ", sprintf("%6.2f%% \u2013 %6.2f%%",

                  CI[1], CI[2]),

                  "\n  CV    : ", sprintf("%6.2f%%", 100 * CVw),

                  "\n  Power : ", sprintf("%6.2f%%",

                  100 * power.TOST(CV = CVw, theta0 = PE / 100, n = n)),

                  "\nSample size estimation based on simulated data set",

                  "\n  n     : ",

                  sprintf("%3.0f", tmp["Sample size"]),

                  "\n  Power : ",

                  sprintf("%6.2f%%", 100* tmp["Achieved power"]),

                  "\n", formatC(nsims, big.mark = ","),

                  " bootstrapped studies (each n = ", n.new, ")",

                  "\n  PE (median)      : ",

                  sprintf("%6.2f%%", PE.q[2]),

                  "\n    95% CI         : ",

                  sprintf("%6.2f%% \u2013 %6.2f%%",

                  PE.q[1], PE.q[3]),

                  "\n  lower CL (median): ",

                  sprintf("%6.2f%%", lower.q[2]),

                  "\n    95% CI         : ",

                  sprintf("%6.2f%% \u2013 %6.2f%%",

                  lower.q[1], lower.q[3]),

                  "\n  upper CL (median): ",

                  sprintf("%6.2f%%", upper.q[2]),

                  "\n    95% CI         : ",

                  sprintf("%6.2f%% \u2013 %6.2f%%",

                  upper.q[1], upper.q[3]),

                  "\n  CV (median)      : ",

                  sprintf("%6.2f%%", 100 * CV.q[2]),

                  "\n    95% CI         : ",

                  sprintf("%6.2f%% \u2013 %6.2f%%",

                  100 * CV.q[1], 100 * CV.q[3]),

                  "\n  Empiric power    : ",

                  sprintf("%6.2f%%", 100 * sum(res$BE / nsims)), "\n")

h       <- hist(res$PE, breaks = "FD", plot = FALSE)

plot(h, freq = FALSE, axes = FALSE, font.main = 1,

     main = paste("Histogram of",

                  formatC(nsims, big.mark = ","),

                  "bootstrapped studies"), xlab = "Point estimate",

                  col = "bisque", border = "darkgrey")

axis(1, at = pretty(h$breaks),

     labels = sprintf("%.0f%%", pretty(h$breaks)))

axis(2, las = 1)

axis(3, at = c(lower.q[2], PE.q[2], upper.q[2]),

     labels = sprintf("%.2f%%", c(lower.q[2], PE.q[2], upper.q[2])),

     tick = FALSE, line = -0.75)

abline(v = c(lower.q[2], PE.q[2], upper.q[2]),

       col = "blue", lwd = c(1, 2, 1))

abline(v = c(80, 125), lty = 2, col = "red")

legend("topright", lty = c(2, 1, 1), lwd = c(1, 2, 1), bg = "white",

       col = c("red", "blue", "blue"), cex = 0.9,

       legend = c("BE limits", "Median", "Nonparametric 90% CI"))

box()

cat(txt)

Edit: Couldn’t resist. I introduced an ‘outlier’ (divided the PK-value of subject 1 after R by three). Homework: Find out where to insert the following lines.

data$PK[1] <- data$PK[1] / 3

outl       <- data.frame(sim = 1L:nsims, count = 0L)

  outl$count[j] <- length(which(boot$PK == data$PK[1]))

max.ol     <- max(outl$count)

num.ol     <- data.frame(outliers = 0L:max.ol, boostraps = 0L)

for (j in 1:nrow(num.ol)) {

  num.ol$boostraps[j] <- formatC(sum(outl$count == num.ol$outliers[j]),

                                 big.mark = ",")

}

extremes       <- res[outl$sim[outl$count == max.ol], ]

extremes$width <- extremes$upper - extremes$lower

extremes$CV    <- sprintf("%.2f%%", 100 * extremes$CV)

print(num.ol, row.names = FALSE)

print(extremes[with(extremes, order(width, PE)), ], row.names = FALSE)

I got:

Simulated data set PE : 104.94% 90% CI: 80.28% – 137.19% CV : 37.43% Power : 4.38% Sample size estimation based on simulated data set n : 58 Power : 81.29% 10,000 bootstrapped studies (each n = 22) PE (median) : 104.50% 95% CI : 86.72% – 129.29% lower CL (median): 87.58% 95% CI : 75.87% – 106.79% upper CL (median): 124.98% 95% CI : 97.89% – 157.60% CV (median) : 33.99% 95% CI : 20.77% – 43.83% Empiric power : 37.89%

The simple (parametric) approach gave 104.94% (86.93% – 126.69%) which is wider than the bootstrapped 104.50% (87.58% – 124.98%). Reasonable at a first look cause ANOVA is sensitive to outliers. Is this what you are interested in?
However, since we want to bootstrap larger studies, we have to sample with replacement (line 66). Some studies contain the outlier as well but others none or even more than one. In my example:

outliers bootstraps 0 1,452 1 2,959 2 2,870 3 1,678 4 704 5 240 6 77 7 16 8 2 9 2

These are the most extreme bootstraps with 9 ‘outliers’ (note the PE, the CV, and the width of the CI):

PE lower upper BE CV width 150.07 120.72 186.55 FALSE 40.51% 65.83 143.55 109.21 188.68 FALSE 49.51% 79.47

The bootstrapped distribution should resemble the original one. Whereas in the original we have 1/12 (8.33%) outliers, due to sampling with replacement in these extreme cases we have 9/22 (40.9%).

[image]

If we desire a substantially larger sample size, I’m not convinced whether bootstrapping is useful at all.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

Results Analysis unique_one 2014-07-08 08:58 [Study Assessment]
- Results Analysis ElMaestro 2014-07-08 10:08
  - Results Analysis unique_one 2014-07-08 10:42
    - Don’t hurry; explore consequences Helmut 2014-07-08 12:00
      - Don’t hurry; explore consequences unique_one 2014-07-08 14:01
        
        The smaller a study, the more uncertain its results Helmut 2014-07-08 17:39
        
        The smaller a study, the more uncertain its results unique_one 2014-07-09 06:59
        
        The smaller a study, the more uncertain its results unique_one 2014-07-10 07:09
    - 156 ? d_labes 2014-07-08 12:03
      - 156 ? unique_one 2014-07-08 14:03
    - Results Analysis Shuanghe 2014-07-08 12:57
      - Results Analysis unique_one 2014-07-08 14:08
        
        Results Analysis Samaya B 2014-07-09 07:58
        
        bootstrap Shuanghe 2014-07-09 18:26
        
        bootstrap Weidson 2021-07-03 01:36
        
        Bootstrapping BE: An attempt in 🇷Helmut 2021-07-08 13:50
        
        Bootstrapping BE: An attempt in 🇷 ElMaestro 2021-07-09 14:15
        
        Bootstrapping BE: Desultory thoughts Helmut 2021-07-09 14:50
        
        Bootstrapping BE: Desultory thoughts ElMaestro 2021-07-09 16:29
        
        Bootstrapping BE: Desultory thoughts Helmut 2021-07-09 21:04
- Results Analysis Achievwin 2021-08-17 19:26

Bootstrapping BE: An attempt in 🇷 [Study As­sess­ment]

Complete thread:

Bootstrapping BE: An attempt in 🇷 [Study Assessment]