Validation of PhEq_bootstrap [Software]
❝ ❝ Dokumentation, darling! […]
❝ Yes, that is a lot. Perhaps a lot of time is spent writing the datasets to file streams and flushing file buffers.
This was my first guess as well. I assumed that the author used a HD and the 25,000 on my machine were so fast because I have SSDs. Then I inspected the source and observed what happened with the 1 mio bootstraps. The file is allocated with the computed size, then bootrapping is performed, the array sorted, and at the end written to the file. No disk IO, everything in RAM.
❝ […] ~144 Mb, plus the array of f2*. Allocating that sort of memory should not be an issue ever on newer hardware.
Sure.
❝ What is the use of having all that info cached??
Cause at the end it has to be written to the file. Not my code but I would include options to specify what – if anything at all – has to be documented.
❝ Under bootstrapping the i'th dataset is as valid as the j'th dataset where (i,j) is in ([0;B], [0;B]). When I write resampling code I usually just extract a few dataset and print e.g. the 111'th sample or whatever and check that it has the right set of elements (i.e. that the 111'th dataset (or whatver) for Test only has Test values at the 'right' time points and so forth).
Yep.
❝ What I find little more tricky is to convince myself that the derivation of CI's from the vector of f2* values is valid. A key element here is the sorting.
Not only that. Getting the CI from the array index needs interpolation. More further down.
❝ What I did once was to output the sorted first 65536 rows of the sorted array and import them in a spreadsheet (if I recall correctly 65536 is the maximum number of rows in my spreadsheet). Then copy those values to another column, sort this column, and check if row difference exist. A row difference would indicate a sorting mishap in one of the applications.
Checking sorting? Kudos!
Tricky is the CI. Try it with a very small data set and use the spreadsheet’s
PERCENTILE()
function. I was suspicious when I looked at the source of PhEq_bootstrap whether it will be the same.Example in :
x <- c(51.6730571183459, 52.3852052174518, 51.4902456409739,
52.1787527612649, 50.9739796238878, 52.1273407615101,
52.6282854486968, 51.5029428204135, 51.5056701956140,
52.5994758210502, 52.3035556756045, 51.6613007946535)
n <- length(x)
mean.x <- mean(x)
var.x <- var(x)
sd.x <- sqrt(var.x)
alpha <- 0.05 # for two-sided 90% CI
CI.col <- c(paste0(100*alpha, "%"), paste0(100*(1-alpha), "%"))
err.z <- qnorm(1-alpha)*sd.x/sqrt(n) # based on normal distribution
err.t <- qt(1-alpha, df=n-1)*sd.x/sqrt(n) # based on t-distribution
CI.z <- mean.x + c(-1, +1)*err.z
CI.t <- mean.x + c(-1, +1)*err.t
names(CI.t) <- names(CI.z) <- CI.col
CI.rep <- c(lower=50.97, upper=52.60) # reported 90% CI
result <- matrix(c(CI.z, CI.t, rep(FALSE, 2), rep(FALSE, 2)), nrow=2, ncol=4,
dimnames=list(c("CI.z", "CI.t"), c(CI.col, "lo", "hi")))
for (type in 1:9) {
CI.boot.type. <- c(quantile(x, probs=c(alpha, 1-alpha), type=type),
rep(FALSE, 2))
result <- rbind(result, CI.boot.type.)
}
row.names(result)[3:11] <- paste0(row.names(result)[3:11], 1:9)
for (j in 1:nrow(result)) {
if (round(as.numeric(result[j, "5%"]), 2) == CI.rep[1])
result[j, "lo"] <- TRUE
if (round(as.numeric(result[j, "95%"]), 2) == CI.rep[2])
result[j, "hi"] <- TRUE
}
print(as.data.frame(result)) # 1=match, 0=no match
5% 95% lo hi
CI.z 51.67159 51.64886 0 0
CI.t 52.16671 52.18944 0 0
CI.boot.type.1 50.97398 52.62829 1 0
CI.boot.type.2 50.97398 52.62829 1 0
CI.boot.type.3 50.97398 52.59948 1 1
CI.boot.type.4 50.97398 52.61100 1 0
CI.boot.type.5 51.02561 52.62540 0 0
CI.boot.type.6 50.97398 52.62829 1 0
CI.boot.type.7 51.25793 52.61244 0 0
CI.boot.type.8 50.97398 52.62829 1 0
CI.boot.type.9 50.97398 52.62829 1 0
PCTLDEF=5
). All spreadsheets I have tested (Excel, OO Calc, Gnumeric) use “Type 7”. Of course, with already moderate samples differences will practically disappear. But checking sumfink like this is the fun of software validation, right?Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Validation of PhEq_bootstrap Helmut 2018-07-12 17:47 [Software]
- Validation of PhEq_bootstrap ElMaestro 2018-07-12 22:21
- Validation of PhEq_bootstrap Helmut 2018-07-13 01:29
- Validation of PhEq_bootstrap ElMaestro 2018-07-13 10:14
- Validation of PhEq_bootstrapHelmut 2018-07-13 12:54
- Validation of PhEq_bootstrap ElMaestro 2018-07-13 14:28
- Validation of PhEq_bootstrapHelmut 2018-07-13 12:54
- Validation of PhEq_bootstrap ElMaestro 2018-07-13 10:14
- Validation of PhEq_bootstrap Helmut 2018-07-13 01:29
- Validation of PhEq_bootstrap Shuanghe 2018-07-17 15:54
- PhEq_bootstrap in R d_labes 2018-08-15 09:45
- bootf2BCA v1.1 Helmut 2019-10-05 00:59
- bootf2BCA v1.2: seed Helmut 2020-09-25 19:37
- bootf2BCA v1.2: seed PTM139 2021-04-16 07:53
- Which packages, working directory? Helmut 2021-04-16 10:50
- Which packages, working directory? PTM139 2021-04-27 13:02
- Which packages, working directory? Helmut 2021-04-16 10:50
- bootf2BCA v1.2: seed Rayhope 2022-03-28 14:35
- bootf2BCA v1.3 Helmut 2022-03-28 14:48
- bootf2BCA v1.3 Rayhope 2022-03-30 06:34
- bootf2BCA v1.3 Helmut 2022-03-28 14:48
- bootf2BCA v1.2: seed PTM139 2021-04-16 07:53
- bootf2BCA v1.1 Flam 2023-06-21 03:41
- bootf2BCA v1.3.1 (current since 2022-06-25) Helmut 2023-06-23 13:47
- bootf2BCA v1.3.1 (current since 2022-06-25) Flam 2023-06-26 23:16
- Install required packages first Helmut 2023-06-27 11:47
- bootf2BCA v1.3.1 (current since 2022-06-25) Flam 2023-06-26 23:16
- bootf2BCA v1.3.1 (current since 2022-06-25) Helmut 2023-06-23 13:47
- bootf2BCA v1.2: seed Helmut 2020-09-25 19:37
- bootf2BCA v1.1 Helmut 2019-10-05 00:59
- Validation of PhEq_bootstrap Rayhope 2021-07-29 08:49
- Validation of PhEq_bootstrap Helmut 2021-08-02 18:12
- Validation of PhEq_bootstrap ElMaestro 2018-07-12 22:21