Bioequivalence and Bioavailability Forum

Main page Policy/Terms of Use Abbreviations Latest Posts

 Log in |  Register |  Search

Back to the forum  2018-07-21 02:13 CEST (UTC+2h)

Validation of PhEq_bootstrap [Software]

posted by Helmut Homepage - Vienna, Austria, 2018-07-13 12:54  - Posting: # 19045
Views: 256

Hi ElMaestro,

» » Dokumentation, darling! […]
» Yes, that is a lot. Perhaps a lot of time is spent writing the datasets to file streams and flushing file buffers.

This was my first guess as well. I assumed that the author used a HD and the 25,000 on my machine were so fast because I have SSDs. Then I inspected the source and observed what happened with the 1 mio bootstraps. The file is allocated with the computed size, then bootrapping is performed, the array sorted, and at the end written to the file. No disk IO, everything in RAM.

» […] ~144 Mb, plus the array of f2*. Allocating that sort of memory should not be an issue ever on newer hardware.


» What is the use of having all that info cached??

Cause at the end it has to be written to the file. Not my code but I would include options to specify what – if anything at all – has to be documented.

» Under bootstrapping the i'th dataset is as valid as the j'th dataset where (i,j) is in ([0;B], [0;B]). When I write resampling code I usually just extract a few dataset and print e.g. the 111'th sample or whatever and check that it has the right set of elements (i.e. that the 111'th dataset (or whatver) for Test only has Test values at the 'right' time points and so forth).


» What I find little more tricky is to convince myself that the derivation of CI's from the vector of f2* values is valid. A key element here is the sorting.

Not only that. Getting the CI from the array index needs interpolation. More further down.

» What I did once was to output the sorted first 65536 rows of the sorted array and import them in a spreadsheet (if I recall correctly 65536 is the maximum number of rows in my spreadsheet). Then copy those values to another column, sort this column, and check if row difference exist. A row difference would indicate a sorting mishap in one of the applications.

Checking sorting? Kudos!
Tricky is the CI. Try it with a very small data set and use the spreadsheet’s PERCENTILE() function. I was suspicious when I looked at the source of PhEq_bootstrap whether it will be the same. Example in R:

x      <- c(51.6730571183459, 52.3852052174518, 51.4902456409739,
            52.1787527612649, 50.9739796238878, 52.1273407615101,
            52.6282854486968, 51.5029428204135, 51.5056701956140,
            52.5994758210502, 52.3035556756045, 51.6613007946535)
n      <- length(x)
mean.x <- mean(x)
var.x  <- var(x)
sd.x   <- sqrt(var.x)
alpha  <- 0.05 # for two-sided 90% CI
CI.col <- c(paste0(100*alpha, "%"), paste0(100*(1-alpha), "%"))
err.z  <- qnorm(1-alpha)*sd.x/sqrt(n)      # based on normal distribution
err.t  <- qt(1-alpha, df=n-1)*sd.x/sqrt(n) # based on t-distribution
CI.z   <- c(mean.x-err.z, mean.x+err.z)
CI.t   <- c(mean.x-err.t, mean.x+err.t)
names(CI.t) <- names(CI.z) <- CI.col
CI.rep <- c(lower=50.97, upper=52.60) # reported 90% CI
result <- matrix(c(CI.z, CI.t, rep(FALSE, 2), rep(FALSE, 2)), nrow=2, ncol=4,
                 dimnames=list(c("CI.z", "CI.t"), c(CI.col, "lo", "hi")))
for (type in 1:9) {
  CI.boot.type. <- c(quantile(x, probs=c(alpha, 1-alpha), type=type),
                     rep(FALSE, 2))
  result <- rbind(result, CI.boot.type.)
row.names(result)[3:11] <- paste0(row.names(result)[3:11], 1:9)
for (j in 1:nrow(result)) {
  if (round(as.numeric(result[j,  "5%"]), 2) == CI.rep[1])
    result[j, "lo"] <- TRUE
  if (round(as.numeric(result[j, "95%"]), 2) == CI.rep[2])
    result[j, "hi"] <- TRUE
print( # 1=match, 0=no match

                     5%      95% lo hi
CI.z           51.67159 51.64886  0  0
CI.t           52.16671 52.18944  0  0
CI.boot.type.1 50.97398 52.62829  1  0
CI.boot.type.2 50.97398 52.62829  1  0

CI.boot.type.3 50.97398 52.59948  1  1
CI.boot.type.4 50.97398 52.61100  1  0
CI.boot.type.5 51.02561 52.62540  0  0
CI.boot.type.6 50.97398 52.62829  1  0

CI.boot.type.7 51.25793 52.61244  0  0
CI.boot.type.8 50.97398 52.62829  1  0
CI.boot.type.9 50.97398 52.62829  1  0

R and SAS provide various types of interpolations. PhEq_bootstrap uses “Type 3” (in R lingo) which, AFAIK, is also the default in SAS (PCTLDEF=5). All spreadsheets I have tested (Excel, OO Calc, Gnumeric) use “Type 7”. Of course, with already moderate samples differences will practically disappear. But checking sumfink like this is the fun of validation, right?

Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. ☼
Science Quotes

Complete thread:

Back to the forum Activity
 Mix view
Bioequivalence and Bioavailability Forum |  Admin contact
18,547 posts in 3,941 threads, 1,190 registered users;
online 14 (0 registered, 14 guests [including 12 identified bots]).

In the field of observation,
chance favors only the mind that is prepared.    Louis Pasteur

BEBAC Ing. Helmut Schütz