## Validation of PhEq_bootstrap [Software]

» » Dokumentation, darling! […]

» Yes, that is a lot. Perhaps a lot of time is spent writing the datasets to file streams and flushing file buffers.

This was my first guess as well. I assumed that the author used a HD and the 25,000 on my machine were so fast because I have SSDs. Then I inspected the source and observed what happened with the 1 mio bootstraps. The file is allocated with the computed size, then bootrapping is performed, the array sorted, and at the end written to the file. No disk IO, everything in RAM.

» […] ~144 Mb, plus the array of f

_{2}*. Allocating that sort of memory should not be an issue ever on newer hardware.

Sure.

» What is the use of having all that info cached??

Cause at the end it has to be written to the file. Not my code but I would include options to specify what – if anything at all – has to be documented.

» Under bootstrapping the i'th dataset is as valid as the j'th dataset where (i,j) is in ([0;B], [0;B]). When I write resampling code I usually just extract a few dataset and print e.g. the 111'th sample or whatever and check that it has the right set of elements (i.e. that the 111'th dataset (or whatver) for Test only has Test values at the 'right' time points and so forth).

Yep.

» What I find little more tricky is to convince myself that the derivation of CI's from the vector of f

_{2}* values is valid. A key element here is the sorting.

Not only that. Getting the CI from the array index needs interpolation. More further down.

» What I did once was to output the sorted first 65536 rows of the sorted array and import them in a spreadsheet (if I recall correctly 65536 is the maximum number of rows in my spreadsheet). Then copy those values to another column, sort this column, and check if row difference exist. A row difference would indicate a sorting mishap in one of the applications.

Checking sorting? Kudos!

Tricky is the CI. Try it with a

*very small*data set and use the spreadsheet’s

`PERCENTILE()`

function. I was suspicious when I looked at the source of PhEq_bootstrap whether it will be the same. Example in R:`x <- c(51.6730571183459, 52.3852052174518, 51.4902456409739,`

52.1787527612649, 50.9739796238878, 52.1273407615101,

52.6282854486968, 51.5029428204135, 51.5056701956140,

52.5994758210502, 52.3035556756045, 51.6613007946535)

n <- length(x)

mean.x <- mean(x)

var.x <- var(x)

sd.x <- sqrt(var.x)

alpha <- 0.05 # for two-sided 90% CI

CI.col <- c(paste0(100*alpha, "%"), paste0(100*(1-alpha), "%"))

err.z <- qnorm(1-alpha)*sd.x/sqrt(n) # based on normal distribution

err.t <- qt(1-alpha, df=n-1)*sd.x/sqrt(n) # based on t-distribution

CI.z <- c(mean.x-err.z, mean.x+err.z)

CI.t <- c(mean.x-err.t, mean.x+err.t)

names(CI.t) <- names(CI.z) <- CI.col

CI.rep <- c(lower=50.97, upper=52.60) # reported 90% CI

result <- matrix(c(CI.z, CI.t, rep(FALSE, 2), rep(FALSE, 2)), nrow=2, ncol=4,

dimnames=list(c("CI.z", "CI.t"), c(CI.col, "lo", "hi")))

for (type in 1:9) {

CI.boot.type. <- c(quantile(x, probs=c(alpha, 1-alpha), type=type),

rep(FALSE, 2))

result <- rbind(result, CI.boot.type.)

}

row.names(result)[3:11] <- paste0(row.names(result)[3:11], 1:9)

for (j in 1:nrow(result)) {

if (round(as.numeric(result[j, "5%"]), 2) == CI.rep[1])

result[j, "lo"] <- TRUE

if (round(as.numeric(result[j, "95%"]), 2) == CI.rep[2])

result[j, "hi"] <- TRUE

}

print(as.data.frame(result)) # 1=match, 0=no match

5% 95% lo hi

CI.z 51.67159 51.64886 0 0

CI.t 52.16671 52.18944 0 0

CI.boot.type.1 50.97398 52.62829 1 0

CI.boot.type.2 50.97398 52.62829 1 0

CI.boot.type.3 50.97398 52.59948 1 1

CI.boot.type.4 50.97398 52.61100 1 0

CI.boot.type.5 51.02561 52.62540 0 0

CI.boot.type.6 50.97398 52.62829 1 0

CI.boot.type.7 51.25793 52.61244 0 0

CI.boot.type.8 50.97398 52.62829 1 0

CI.boot.type.9 50.97398 52.62829 1 0

`PCTLDEF=5`

). All spreadsheets I have tested (Excel, OO Calc, Gnumeric) use “Type 7”. Of course, with already moderate samples differences will practically disappear. But checking sumfink like this is the fun of validation, right?Cheers,

Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. ☼

Science Quotes

### Complete thread:

- Validation of PhEq_bootstrap Helmut 2018-07-12 17:47 [Software]
- Validation of PhEq_bootstrap ElMaestro 2018-07-12 22:21
- Validation of PhEq_bootstrap Helmut 2018-07-13 01:29
- Validation of PhEq_bootstrap ElMaestro 2018-07-13 10:14
- Validation of PhEq_bootstrap Helmut 2018-07-13 12:54
- Validation of PhEq_bootstrap ElMaestro 2018-07-13 14:28

- Validation of PhEq_bootstrap Helmut 2018-07-13 12:54

- Validation of PhEq_bootstrap ElMaestro 2018-07-13 10:14

- Validation of PhEq_bootstrap Helmut 2018-07-13 01:29
- Validation of PhEq_bootstrap Shuanghe 2018-07-17 15:54
- PhEq_bootstrap in R d_labes 2018-08-15 09:45

- Validation of PhEq_bootstrap ElMaestro 2018-07-12 22:21