Bioequivalence and Bioavailability Forum • Adaptive Design for the FDA’s RSABE?

Helmut
★★★

Vienna, Austria,
2023-12-18 12:20
(570 d 19:29 ago)

Posting: # 23794
Views: 6,771

Adaptive Design for the FDA’s RSABE? [Two-Stage / GS Designs]

Dear all,

I stumbled across this (goody ?):

An adaptive trial design for testing the
bioequivalence of generics of highly
variable drugs

Seemingly¹ a well-known framework² is followed with Pocock’s $\small{\alpha_\text{adj}=0.0294}$ (for superiority – the correct one for equivalence is $\small{0.0304}$…).
I don’t have the paper yet (behind a paywall) but I have some doubts that the method is well thought out:

$\small{\alpha_\text{adj}=0.0294}$ in the two-stage approach for 2×2×2 designs (Method B) was a mere lucky punch and does not inflate the Type I Error only for an assumed T/R-ratio of 0.95 and target power of 80%. Any other combination requires other adjustments. Lots of papers…
Did the authors believe (‼) that Pocock’s $\small{{\color{Red}{0.0294}}}$ is a ‘natural constant’ which is applicable in any adaptive design?
Assuming a T/R-ratio of 0.95 for HVD(P)s is courageous. A more conservative 0.90 is recommended³ and hence, also the default in the reference-scaling functions of the -package PowerTOST.
The authors’ -package adaptIVPT employs only 1,000 simulations by default (for T/R 0.95 and target power 80%). It is pretty fast because some parts are using C++ code and it supports multiple cores. Furthermore, $\small{s_\text{wT}\neq s_\text{wR}}$ is supported.
RSABE itself is a framework (if $\small{s_\text{wR}<0.294}$ → ABE; if $\small{s_\text{wR}\geq0.294}$ → scaling and $\small{PE\in\left\{0.8000-1.2500\right\}}$). At least 100,00 simulations are required to obtain a stable result in the function sampleN.RSABE().

To summarize:

Simulation-based adaptive designs require exploration of a reasonable narrow grid of stage 1 sample sizes and CV for an assumed T/R-ratio and target power. Simulations have to be performed under the null hypothesis. Since the convergence under the null is poor, 10⁶ simulations have to be performed. Together with the 100,000 needed for the sample size, one would need 10¹¹ simulations for every grid point.
Under these conditions an $\small{\alpha_\text{adj}}$ has to be found which maintains the empirical Type I Error.
Since that was obviously not done, with $\small{\alpha_\text{adj}=0.0294}$ the Type I Error possibly will be inflated. Furthermore, RSABE itself inflates the Type I Error if $\small{s_\text{wR}<0.294}$, a fact the FDA prefers to ignore.⁴
BTW, inspecting the source code of adaptIVPT I didn’t find 0.0294 anywhere. Strange.
IMHO, the method is questionable at least.

Lim D, Rantou E, Kim J, Choi S, Choi NH, Grosser S. Adaptive designs for IVPT data with mixed scaled average bioequivalence. Pharm Stat. 2023; 22(6): 1116–34. doi:10.1002/pst.2333.
Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs. Pharm Stat. 2008; 7(4): 245–62. doi:10.1002/pst.294. Open access.
Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmaceut Sci. 2012; 15(1): 73–84. doi:10.18433/J3Z88F. Open access.
Schütz H, Labes D, Wolfsegger MJ. Critical Remarks on Reference-Scaled Average Bioequivalence. J Pharm Pharmaceut Sci. 2022; 25: 285–96. doi:10.18433/jpps32892. Open access.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2023-12-19 12:10
(569 d 19:40 ago)

@ Helmut
Posting: # 23796
Views: 5,777

Likely it does not work (potentially inflated Type I Error)

Post reply

Dear all,

I could not resist and had a closer look (a lengthy [image]

-script at the end).

Say, we have a 2-sequence 4-period (full) replicate design and start the study in 16 subjects (n₁).
We observe a CV_wR of 0.30. Since s_wR < 0.294, we have to go with ABE (no scaling). Power based on a fixed GMR 0.95 is below the target of 0.80. Hence, we initiate a second stage. With Pocock’s adjusted α 0.0294 we recruit 6 subjects (n₂). We observe a CV_wR of 0.30 again and GMR 0.92 in the final analysis of pooled data.
Power will be only 0.7093 since the GMR is worse than assumed. However, the Type I Error will be significantly inflated (0.0861 > α). We would have needed at least an adjusted α of 0.0149 (which is substantially lower than the one we used) in order to control the Type I Error.

Call the script with the example’s data:

RSABE.TSD(adj = 0.0294, design = "2x2x4", n1 = 16, CVwR = 0.3, GMR = 0.95, target = 0.8, CVwR.2 = 0.3, GMR.2 = 0.92) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 n1 : 16 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.6812 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9500 (fixed) target power : 0.8000 (fixed) n2 : 6 N : 22; less than the FDA’s minimum of 24 subjects! CVwR : 0.3000 (observed) GMR : 0.9200 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.7093 (estimated, may pass RSABE) empirical TIE : 0.0861 (all publications; significantly inflated)) An adjusted alpha of 0.0149 (or less) would be needed to control the Type I Error.

Note that the Type I Error in RSABE depends strongly on the sample size. Hence, even if we re-estimate the sample size with an adjusted α of 0.0149 we would see an inflated Type I Error of 0.0546 due to the larger stage 2 sample size of 10 subjects. For a total sample size (N) of 26 subjects we would need an even smaller adjusted α of 0.01333…
Try a fixed GMR of 0.90 – which is more realistic for HVD(P)s – and you will be surprised.
Note also that in the RSABE-branch (s_wR ≥ 0.294) the empirical Type I Error drops to the adjusted α for CV_wR infinitesimal greater than 30%. Example above changed to:

CVwR.2 = 0.30 + 1e-9

Part of the output:

theta1 : 0.7695 (lower implied limit) theta2 : 1.2996 (upper implied limit) empirical TIE : 0.0294 (all publications)

For any larger CV_wR the empirical Type I Error will be lower than the adjusted α.

Of course, increasing the sample size in stage 1 does not help in the ABE-branch (s_wR < 0.294).

RSABE.TSD(adj = 0.0304, design = "2x2x4", n1 = 38, ...) adjusted alpha : 0.0304 (Pocock’s for equivalence) design : 2x2x4 n1 : 38 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) theta2 : 1.2500 (upper implied limit) power : 0.9708 (estimated) Study stopped in stage 1 (sufficient power) empirical TIE : 0.1122 (all publications; significantly inflated) An adjusted alpha of 0.0100 (or less) would be needed to control the Type I Error.

Let’s go fully adaptive, i.e., use the observed stage 1 GMR rather than a fixed one. In the final analysis the GMR is worse than in the first stage and the CV_wR lower. We use some of the defaults. We have to increase our target power. That’s a guessing game because in the interim we don’t know what will happen in the second stage.

RSABE.TSD(adj = 0.0304, design = "2x2x4", n1 = 24, CVwR = 0.35, GMR = 0.9, usePE = TRUE, target = 0.9, CVwR.2 = 0.32, GMR.2 = 0.88) adjusted alpha : 0.0304 (Pocock’s for equivalence) design : 2x2x4 n1 : 24 futility on N : none CVwR : 0.3500 (observed) theta1 : 0.7383 (lower implied limit) theta2 : 1.3545 (upper implied limit) power : 0.6906 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9000 (observed) target power : 0.9000 (fixed) n2 : 20 N : 44 CVwR : 0.3200 (observed) GMR : 0.8800 (observed) theta1 : 0.7568 (lower implied limit) theta2 : 1.3214 (upper implied limit) power : 0.7762 (estimated, may pass RSABE) empirical TIE : 0.0271 (all publications

Only if you are a devout follower of the FDA church and believe in the ‘desired consumer risk model’,¹ run the first example with

RSABE.TSD(adj = 0.0294, design = "2x2x4", n1 = 16, CVwR = 0.3, GMR = 0.95, target = 0.8, CVwR.2 = 0.3, GMR.2 = 0.92, risk = TRUE) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 n1 : 16 futility on N : none CVwR : 0.3000 (observed) theta1 : 0.8000 (lower implied limit) 0.7695 (lower limit of the ‘desired consumer risk model’) theta2 : 1.2500 (upper implied limit) 1.2996 (upper limit of the ‘desired consumer risk model’) power : 0.6812 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.9500 (fixed) target power : 0.8000 (fixed) n2 : 6 N : 22; less than the FDA’s minimum of 24 subjects! CVwR : 0.3000 (observed) GMR : 0.9200 (observed) theta1 : 0.8000 (lower implied limit) 0.7695 (lower limit of the ‘desired consumer risk model’) theta2 : 1.2500 (upper implied limit) 1.2996 (upper limit of the ‘desired consumer risk model’) power : 0.8354 (estimated, may pass RSABE) empirical TIE : 0.0861 (all publications; significantly inflated) 0.0294 (‘desired consumer risk model’) An adjusted alpha of 0.0149 (or less) would be needed to control the Type I Error.

By means of Harry Potter’s magic wand, the inflation of the Type I Error apparently disappears because the null hypothesis is assessed at wider limits.² If you don’t want to chew reference #4 in the post above, maybe the first four slides of a contribution to the discussion (5^th GBHI International Workshop. Amsterdam, 28 September 2022) will help.

Davit BM, Chen ML, Conner DP, Haidar SH, Kim S, Lee CH, Lionberger RA, Makhlouf FT, Nwakama PE, Patel DT, Schuirmann DJ, Yu LX. Implementation of a Reference-Scaled Average Bioequivalence Approach for Highly Variable Generic Drug Products by the US Food and Drug Administration. AAPS J. 2012; 14(4): 915–24. doi:10.1208/s12248-012-9406-x. Free Full text.
The FDA’s regulatory constant $\small{\theta_\text{s}=\log_{e}(1.25)/0.25\cong0.8925742\ldots}$
The null hypothesis on inequivalence $\small{\theta_0}$ is assessed in the ‘desired consumer risk model’:
- If $\small{s_\text{wR}\leq0.25}$ with $\small{\theta_0=\left\{0.8000,1.2500\right\}}$
- If $\small{s_\text{wR}>0.25}$ with $\small{\theta_0=\exp(\mp\theta_\text{s}\times s_\text{wR})}$
That’s different to the ‘implied limits’ assessed in all (‼) other publications:
- If $\small{CV_\text{wR}\leq30\%}$ with $\small{\theta_0=\left\{0.8000,1.2500\right\}}$
- If $\small{CV_\text{wR}>30\%}$ with $\small{\theta_0=\exp(\mp\theta_\text{s}\times s_\text{wR})}$
A simple script to calculate the null hypotheses:
- nulls <- function(CVwR, risk = FALSE) { # null hypotheses in RSABE theta.s <- log(1.25) / 0.25 # regulatory constant swR <- sqrt(log(CVwR^2 + 1)) # within-subject standard deviation of R if (risk) { # ‘desired consumer risk model’ if (swR <= 0.25) { thetas <- c(0.8, 1.25) } else { thetas <- exp(c(-1, +1) * theta.s * swR) } } else { # ‘implied limits’ if (CVwR <= 0.3) { thetas <- c(0.8, 1.25) } else { thetas <- exp(c(-1, +1) * theta.s * swR) } } names(thetas) <- c("H0.1", "H0.2") return(thetas) }
  Examples (CV_wR = 0.27, s_wR ≈ 0.2652645…):
  nulls(CVwR = 0.27, risk = TRUE) H0.1 H0.2 0.7891741 1.2671474 nulls(CVwR = 0.27, risk = FALSE) H0.1 H0.2 0.80 1.25

RSABE.TSD <- function(adj = 0.0294, design = "2x2x4", n1, CVwR, GMR = 0.9, target = 0.8, usePE = FALSE, nmax = Inf, final = TRUE, CVwR.2, GMR.2 = 0.9, risk = FALSE, details = TRUE) { # adj : adjusted alpha (stage 1 and final analysis) like Potvin ‘Method B’ # design : "2x2x2": 2-sequence 4-period full replicate # "2x2x3": 2-sequence 4-period full replicate # "2x3x3": 3-sequence 2-period partial replicate # n1 : stage 1 sample size # CvwR : within-subject CV of R in stage 1 # GMR : T/R-ratio in stage 1 # usePE : FALSE: use the fixed GMR, TRUE: use the observed GMR # nmax : futility on total sample size # final : TRUE : final analysis (requires CVwR.2 and GMR.2) # FALSE: interim analyis only # risk : FALSE: TIE acc. to all publications based on the ‘implied limits’ # TRUE : additionallly TIE acc. to the ‘desired consumer risk model’ # details: TRUE : output to the console # FALSE: data.frame of results if (!design %in% c("2x2x4", "2x2x3", "2x3x3")) stop ("design ", design, " not supported.") if (missing(n1)) stop ("n1 must be given.") if (missing(CVwR)) stop ("CVwR must be given.") if (GMR <= 0.8 | GMR >= 1.25) stop ("GMR must be within 0.8 – 1.25.") if (nmax <= n1) stop ("nmax <= n1 does not make sense.") if (target <= 0.5 | target >= 1) stop ("target ", target, " does not make sense.") if (final) { if (missing(CVwR.2)) stop ("CVwR.2 must be given.") if (missing(GMR.2)) stop ("GMR.2 must be given.") } suppressMessages(require(PowerTOST)) # ≥1.5-4 (2022-02-21) limits <- function(CVwR, risk = FALSE) { # limits thetas <- scABEL(CV = CVwR, regulator = "FDA") # implied if (risk) { # ‘desired consumer risk model’ swR <- CV2se(CVwR) if (swR > 0.25) { thetas <- setNames(exp(c(-1, +1) * log(1.25) / 0.25 * swR), c("lower", "upper")) } else { thetas <- setNames(c(0.8, 1.25), c("lower", "upper")) } } return(thetas) } power <- function(alpha = 0.05, CVwR, GMR, n, design) { return(power.RSABE(alpha = alpha, CV = CVwR, theta0 = GMR, n = n, design = design)) } TIE <- function(alpha = 0.05, CVwR, n, design, risk) { return(power.RSABE(alpha = alpha, CV = CVwR, n = n, theta0 = limits(CVwR, risk)[["upper"]], design = design, nsims = 1e6)) } TIE.1.1 <- TIE.1.2 <- TIE.2.1 <- TIE.2.2 <- NA pwr.1 <- power(adj, CVwR, GMR, n = n1, design) sig <- binom.test(0.05 * 1e6, 1e6, alternative = "less", conf.level = 0.95)$conf.int[2] txt <- paste("adjusted alpha :", sprintf("%.4f", adj)) if (adj == 0.0294) { txt <- paste(txt, "(Pocock’s for superiority)") } else { if (adj == 0.0304) { txt <- paste(txt, "(Pocock’s for equivalence)") } else { if (adj == 0.0250) { txt <- paste(txt, "(Bonferroni’s for two tests)") } else { txt <- paste(txt, "(custom)") } } } txt <- paste(txt, "\ndesign :", design, "\nn1 :", sprintf("%3.0f", n1)) if (nmax < Inf) { txt <- paste(txt, "\nfutility on N :", sprintf("%3.0f", nmax)) } else { txt <- paste(txt, "\nfutility on N : none") } txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (lower limit of the ‘desired consumer risk model’)", limits(CVwR, risk)[["lower"]])) } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (upper limit of the ‘desired consumer risk model’)", limits(CVwR, risk)[["upper"]])) } txt <- paste(txt, "\npower :", sprintf("%.4f (estimated)", pwr.1)) if (pwr.1 >= target) { # stop in the interim TIE.1.1 <- TIE(adj, CVwR, n1, design, risk = FALSE) txt <- paste0(txt, "\nStudy stopped in stage 1 (sufficient power)", "\nempirical TIE :", sprintf(" %.4f", TIE.1.1), " (all publications") if (TIE.1.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { TIE.1.2 <- TIE(adj, CVwR, n1, design, risk = TRUE) txt <- paste(txt, "\n ", sprintf("%.4f", TIE.1.2), "(‘desired consumer risk model’") if (TIE.1.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.1.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR, CV = CVwR, design = design, regulator = "FDA", n = n1, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } } else { # initiate stage 2 N <- sampleN.RSABE(alpha = adj, CV = CVwR, theta0 = GMR, targetpower = target, design = design, print = FALSE, details = FALSE)[["Sample size"]] if (N > nmax) { txt <- paste(txt, "\nStage 2 not initiated", "(insufficent power in stage 1\nbut total sample size", N, "above futility limit)") } else { if (final) { pwr.2 <- power(adj, CVwR.2, GMR.2, n = N, design) if (GMR.2 <= 0.8 | GMR.2 >= 1.25) { final.est <- FALSE } else { final.est <- TRUE TIE.2.1 <- TIE(adj, CVwR.2, N, design, risk = FALSE) if (risk) TIE.2.2 <- TIE(adj, CVwR.2, N, design, risk = TRUE) } } else { CVwR.2 <- GMR.2 <- pwr.2 <- TIE.2.1 <- TIE.2.2 <- NA theta1.1 <- theta1.2 <- req <- NA } txt <- paste(txt, "\nStage 2 initiated (insufficent power in stage 1)", "\nGMR :", sprintf("%.4f", GMR)) ifelse (usePE, txt <- paste(txt, "(observed)"), txt <- paste(txt, "(fixed)")) txt <- paste(txt, "\ntarget power :", sprintf("%.4f (fixed)", target), "\nn2 :", sprintf("%3.0f", N - n1), "\nN :", sprintf("%3.0f", N)) if (N < 24) txt <- paste0(txt, "; less than the FDA’s minimum of 24 subjects!") if (final) { txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR.2), "\nGMR :", sprintf("%.4f (observed)", GMR.2), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR.2)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (lower limit of the ‘desired consumer risk model’)", limits(CVwR.2, risk)[["lower"]])) } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR.2)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f (upper limit of the ‘desired consumer risk model’)", limits(CVwR.2, risk)[["upper"]])) } txt <- paste(txt, "\npower :", sprintf("%.4f (estimated,", pwr.2)) ifelse (pwr.2 < 0.5, txt <- paste(txt, "fails RSABE)"), txt <- paste(txt, "may pass RSABE)")) if (final.est) { txt <- paste0(txt, "\nempirical TIE :", sprintf(" %.4f", TIE.2.1), " (all publications") if (TIE.2.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", TIE.2.2), "(‘desired consumer risk model’") if (TIE.2.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.2.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR.2, CV = CVwR.2, design = design, regulator = "FDA", n = N, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } } } } } if (details) { # output to the console cat(txt, "\n") } else { # data.frame of results # limits in stage 1 L.1.1 <- limits(CVwR, FALSE)[["lower"]] U.1.1 <- limits(CVwR, FALSE)[["upper"]] L.1.2 <- U.1.2 <- NA if (risk) { L.1.2 <- limits(CVwR, TRUE)[["lower"]] U.1.2 <- limits(CVwR, TRUE)[["upper"]] } if (final) { # limits in the final analysis L.2.1 <- limits(CVwR.2, FALSE)[["lower"]] U.2.1 <- limits(CVwR.2, FALSE)[["upper"]] L.2.2 <- U.2.2 <- NA if (risk) { L.2.2 <- limits(CVwR.2, TRUE)[["lower"]] U.2.2 <- limits(CVwR.2, TRUE)[["upper"]] } } else { L.2.1 <- U.2.1 <- L.2.2 <- U.2.2 <- NA } result <- data.frame(alpha.adj = adj, design = design, n1 = n1, CVwR = CVwR, GMR = GMR, usePE = usePE, nmax = nmax, risk.model = risk, L.1.1 = L.1.1, U.1.1 = U.1.1, L.1.2 = L.1.2, U.1.2 = U.1.2, power.1 = pwr.1, TIE.1.1 = TIE.1.1, TIE.1.2 = TIE.1.2, n2 = N - n1, N = N, CVwR.2 = CVwR.2, GMR.2 = GMR.2, L.2.1 = L.2.1, U.2.1 = U.2.1, L.2.2 = L.2.2, U.2.2 = U.2.2, power.2 = pwr.2, TIE.2.1 = TIE.2.1, TIE.2.2 = TIE.2.2, alpha.req = req) result <- result[, colSums(is.na(result)) < nrow(result)] return(result) } }

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2023-12-20 14:27
(568 d 17:23 ago)

@ Helmut
Posting: # 23798
Views: 5,699

Exploring package adaptIVPT, function rss()

Post reply

Dear all,

I was wrong in this post; the adjusted α can be specified indeed.
By trial and error (the documentation is not particularly helpful) I discovered that only a 2×2×4 full replicate design is implemented. Contrary to the functions of PowerTOST unequal variances of T and R are not supported. The total sample size (N) can be an odd number and has to be rounded up in order to obtain balanced sequences.
I could reproduce the examples of my yesterday’s post. [image]

-script at the end.

The first example:

RSS(n1 = 16, CVwR = 0.3, GMR = 0.95) Sample size re-estimation by package ‘adaptIVPT’; only the 2x2x4 full replicate design is supported! adjusted alpha : 0.0294 n1 : 16 futility on N : 100 CVwR : 0.3000 (observed) swR : 0.2936 (observed) GMR : 0.9500 (fixed) target power : 0.8000 (fixed) Stage 2 initiated (insufficent power in stage 1) n2 : 6 N : 22; less than the FDA’s minimum of 24 subjects!

However, it is not possible to estimate power and the empirical Type I Error.
By ‘hand’ with yesterday’s GMR 0.92 in the final analysis:

library(PowerTOST) x <- setNames(c(power.RSABE(alpha = 0.0294, CV = 0.3, n = 22, design = "2x2x4", theta0 = 0.92), power.RSABE(alpha = 0.0294, CV = 0.3, n = 22, design = "2x2x4", theta0 = scABEL(CV = 0.3, regulator = "FDA")[["upper"]], nsims = 1e6)), c("power", "TIE")) print(round(x, 4)) power TIE 0.7093 0.0861

Verdict:

Only the 2-sequence 4-period full replicate design is implemented. That’s a substantial limitation.
Equal variances of T and R have to be assumed.
Faster than sampleN.RSABE(). However, its measured runtime is about two magnitudes longer than the one reported in the list of results… Not all that glitters is gold.
The method has a futility criterion nmax (default 100) on the total sample size, i.e., a re-estimated N > nmax is considered a failure in stage 1. If you want no futility like in the original Potvin methods, you can’t set nmax = Inf as in the package Power2Stage. Set it to a large value instead.
It’s impossible to estimate power (in stage 1 and the final analysis) and the empirical Type I Error directly. Therefore, power.RSABE() is still required.
Results are not reproducible (see the comment below).

Comment:

I discovered that some tweaks are needed. It’s bad coding practice in simulations that the seed of the PRNG is not fixed (obviously it’s random). The user should have the option to use a random seed to assess reproducibility (hence, in the functions of PowerTOST and Power2Stage the argument setseed). Working with seeds in parallel computing requires special attention. However, I saw no improvement with only one core. Still some irreproducibility with 10⁵ simulations (the function’s default are only 1,000). With 10⁶ results are stable but that’s brute force.
In the script I opted for multiple calls of rss() and use the maximum of the re-estimated stage 2 sample sizes.

Why the heck do we have to debug code – for free! – which was developed in a project sponsored by the FDA?

[image] If debugging is the process of removing bugs,
then programming must be the process of putting them in. Edsger W. Dijkstra

RSS <- function(adj = 0.0294, n1, CVwR, GMR = 0.95, nmax = 100, target = 0.8, calls = 10, details = TRUE, inter = FALSE) { suppressMessages(require(adaptIVPT)) up2even <- function(x) { 2 * (x %/% 2 + as.logical(x %% 2)) } if (missing(n1)) stop ("n1 must be given.") if (missing(CVwR)) stop ("CVwR mst be given.") if (GMR <= 0.8 | GMR >= 1.25) stop ("GMR must be within 0.8-1.25.") if (nmax <= n1) stop ("nmax <= n1 does not make sense.") if (target <= 0.5 | target >= 1) stop ("target ", target, " does not make sense.") if (calls <= 1) stop ("calls must be at > 1 (at least 5 recommended).") swR <- sqrt(log(CVwR^2 + 1)) n2 <- integer(calls) params <- list(GMR = GMR, sig_level = adj, nmax = nmax, target_power = target) ncores <- NULL if (!nchar(system.file(package = "parallel")) == 0) { cores <- parallel::detectCores() ncores <- cores - 1 } txt <- paste("Sample size re-estimation by package ‘adaptIVPT’;", "\nonly the 2x2x4 full replicate design is supported!", sprintf("\nadjusted alpha : %.4f", adj), sprintf("\nn1 : %2.0f", n1), "\nfutility on N : nmax), sprintf("\nCVwR : %.4f (observed)", CVwR), sprintf("\nswR : %.4f (observed)", swR), sprintf("\nGMR : %.4f (fixed)", GMR), sprintf("\ntarget power : %.4f (fixed)", target)) # workaround because of random seeds (irreproducible results) for (j in seq_along(n2)) { res <- unlist(rss(n = n1, r = 2, S_WR = swR, params = params, ncores = ncores)) # r must be 2 N <- up2even(res[["rss"]]) # might be odd n2[j] <- N - n1 } if (inter) { # show intermediate n2 estimates ns <- data.frame(n2 = sort(unique(n2)), times = NA_integer_) for (j in 1:nrow(ns)) { ns$times[j] <- length(which(n2 == ns$n2[j])) } print(ns, row.names = FALSE); cat("\n") } n2 <- max(n2) N <- n1 + n2 if (N == nmax) { txt <- paste(txt, sprintf("\nN : %2.0f >= nmax (%.0f)", N, nmax), "\nStudy stopped in stage 1 for futility.\n") } else { if (N > n1) { txt <- paste(txt, "\nStage 2 initiated (insufficent power in stage 1)", sprintf("\nn2 : %2.0f", n2), sprintf("\nN : %2.0f", N)) if (N < 24) { txt <- paste0(txt, "; less than the FDA’s minimum of 24 subjects!\n") } else { txt <- paste0(txt, "\n") } } else { txt <- paste(txt, "\nStudy stopped in stage 1 (sufficient power)\n") n2 <- N <- NA } } if (details) { # output to the console cat(txt) } else { # data.frame of results result <- data.frame(alpha.adj = adj, n1 = n1, CVwR = CVwR, GMR = GMR, target = target, n2 = n2, N = N, futile = FALSE) if (N == nmax) result$futile <- TRUE return(result) } }

Test of reproducibility

suppressMessages(library(PowerTOST)) suppressMessages(library(adaptIVPT)) swR <- CV2se(0.3) adj <- 0.0294 calls <- 2500L # time consuming params <- list(sig_level = adj) ncores <- NULL if (!nchar(system.file(package = "parallel")) == 0) { cores <- parallel::detectCores() ncores <- cores - 1 } comp <- data.frame(n2.rss = rep(NA_integer_, calls), rt.rss.1 = NA_real_, rt.rss.2 = NA_real_, n2.PT = NA_integer_, rt.PT = NA_real_) # target_power = 0.8 is the default in both functions for (j in 1L:calls) { pt <- proc.time()[[3]] res <- unlist(rss(n = 16, r = 2, S_WR = swR, params = params, ncores = ncores)) comp$rt.rss.2[j] <- proc.time()[[3]] - pt # measured runtime comp$rt.rss.1[j] <- res[8] # reported runtime N <- res[["rss"]] # as reported, no rounding comp$n2.rss[j] <- N - 16 # We use random seeds in the function as well (not the default); # however, it will always return balanced sequences (even numbers) pt <- proc.time()[[3]] comp$n2.PT[j] <- sampleN.RSABE(alpha = adj, CV = 0.3, theta0 = 0.95, design = "2x2x4", print = FALSE, details = FALSE, setseed = FALSE)[["Sample size"]] - 16 comp$rt.PT[j] <- proc.time()[[3]] - pt } names(comp)[2:3] <- c("rt.rss.reported", "rt.rss.measured") ns <- sort(unique(c(as.integer(unlist(comp[1])), as.integer(unlist(comp[4]))))) vals <- data.frame(n2 = ns, rss = NA_character_, PT = NA_character_) for (j in seq_along(ns)) { vals$rss[j] <- sprintf("%5.2f%%", 100 * sum(comp$n2.rss == ns[j]) / calls) vals$PT[j] <- sprintf("%5.2f%%", 100 * sum(comp$n2.PT == ns[j]) / calls) } txt <- paste("\nMean runtimes", sprintf("%-34s %6.2f", "\n rss() measured / reported", mean(comp[[3]]) / mean(comp[[2]])), sprintf("%-34s %6.2f", "\n sampleN.RSABE() / rss() reported", mean(comp[[5]]) / mean(comp[[2]])), sprintf("%-34s %6.2f", "\n sampleN.RSABE() / rss() measured", mean(comp[[5]]) / mean(comp[[3]])), "\n") summary(comp, digits = 5); print(vals, row.names = FALSE); cat(txt)

Result (if you call the script, yours will be different):
n2.rss rt.rss.reported rt.rss.measured n2.PT rt.PT Min. :3.000 Min. :0.00000 Min. :0.030000 Min. :4.0000 Min. :0.14000 1st Qu.:4.000 1st Qu.:0.00000 1st Qu.:0.040000 1st Qu.:6.0000 1st Qu.:0.19000 Median :5.000 Median :0.00000 Median :0.050000 Median :6.0000 Median :0.19000 Mean :4.552 Mean :0.00204 Mean :0.044624 Mean :5.9968 Mean :0.19408 3rd Qu.:5.000 3rd Qu.:0.00000 3rd Qu.:0.050000 3rd Qu.:6.0000 3rd Qu.:0.20000 Max. :6.000 Max. :0.02000 Max. :0.070000 Max. :6.0000 Max. :0.24000 n2 rss PT 3 3.32% 0.00% 4 43.68% 0.16% 5 47.48% 0.00% 6 5.52% 99.84% Mean runtimes rss() measured / reported 21.87 sampleN.RSABE() / rss() reported 95.14 sampleN.RSABE() / rss() measured 4.35

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Helmut
★★★

Vienna, Austria,
2023-12-24 14:01
(564 d 17:49 ago)

@ Helmut
Posting: # 23804
Views: 5,246

Extreme test case

Post reply

Dear all,

something is rotten in the state of Denmark…
For the test case you need the script of the previous post and the library PowerTOST.

adj <- 0.0294 n1 <- 24 CVwR <- 0.25 # mostly no scaling GMR <- 0.88 # a bad one

First the functions of PowerTOST:

power.RSABE(alpha = adj, CV = CVwR, theta0 = GMR, design = "2x2x4", n = n1) [1] 0.49905 sampleN.RSABE(alpha = adj, CV = CVwR, theta0 = GMR, design = "2x2x4", details = FALSE) ++++++++ Reference scaled ABE crit. +++++++++ Sample size estimation --------------------------------------------- Study design: 2x2x4 (4 period full replicate) log-transformed data (multiplicative model) 1e+05 studies for each step simulated. alpha = 0.0294, target power = 0.8 CVw(T) = 0.25; CVw(R) = 0.25 True ratio = 0.88 ABE limits / PE constraints = 0.8 ... 1.25 Regulatory settings: FDA Sample size n power 52 0.80636

Since power in the first stage <0.8, we would initiate a second stage with 52 – 24 = 28 subjects.

Now the function with its defaults. Additionally I set details = FALSE to return the data.frame of results, inter = TRUE to show the intermediate stage 2 sample sizes, and increased the number of calls for the workaround to a crazy number:

RSS(n1 = n1, CVwR = CVwR, GMR = GMR, calls = 2500, details = FALSE, inter = TRUE) n2 times 22 1 24 24 26 699 76 1776 alpha.adj n1 CVwR GMR target n2 N futile 1 0.0294 24 0.25 0.88 0.8 76 100 TRUE

What’s going on here – where does the discontinuity in estimated stage 2 sample sizes come from? The study would stop for futility in the first stage!
Another try with a higher nmax:

RSS(n1 = n1, CVwR = CVwR, GMR = GMR, calls = 50, nmax = 250, details = FALSE, inter = TRUE) n2 times 24 1 26 16 28 26 30 7 alpha.adj n1 CVwR GMR target n2 N futile 1 0.0294 24 0.25 0.88 0.8 30 54 FALSE

Now the study would proceed to the second stage but with 30 subjects instead of the 28 estimated by sampleN.RSABE().

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Naksh
☆

India,
2023-12-25 05:16
(564 d 02:34 ago)

@ Helmut
Posting: # 23805
Views: 5,155

Extreme GMR

Post reply

Hi Helmut,

Should't we calculate stage-2 sample size with GMR of 0.95??

I am trying to understand if this will be helpful in case of drugs which are very highly variable e.g. Mesalamine.

Hypothetically, if study conducted with N=24, fully replicate design but T/R ratio observed worse like 0.72.

And if calculate power with reduced alfa in study using actual result of T/R=0.72, ISCV=0.6978 and N=24 using powerTOST package.

power.RSABE(alpha = 0.0294, theta0 = 0.72, CV = 0.6978, n = 24, design = "2x2x4")

[1] 0.18378

(which is less than 80%).

To recalculate the sample size with actual variability and assumed T/R of 95% and reduced alfa.

rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294))

$rss

21

this N=21 is total sample size right?? not the additional sample size (plz correct me if i am wrong)

Since, we have already started study with N=24, we cant go ahead with stage-2.

Am i missing something?

Edit: Subject line changed; see also this post #2. [Helmut]

Helmut
★★★

Vienna, Austria,
2023-12-25 11:54
(563 d 19:56 ago)

@ Naksh
Posting: # 23806
Views: 5,287

PE outside {0.80, 1.25} not possible

Post reply

Namaste Naksh,

❝ Should't we calculate stage-2 sample size with GMR of 0.95??

❝ I am trying to understand if this will be helpful in case of drugs which are very highly variable e.g. Mesalamine.

According to the paper, yes. However, for HVD(P)s 0.95 is extremely risky.
First of all let’s attach the libraries and define some variables.

suppressMessages(library(PowerTOST)) suppressMessages(library(adaptIVPT)) CVwR <- 0.6978 swR <- CV2se(CVwR) GMR <- 0.72 adj <- 0.0294 n1 <- 24

❝ Hypothetically, if study conducted with N=24, fully replicate design but T/R ratio observed worse like 0.72.

Such a T/R-ratio is the end of the story in RSABE with any (‼) sample size (see below).

❝ And if calculate power with reduced alfa in study using actual result of T/R=0.72, ISCV=0.6978 and N=24 using powerTOST package.

❝ power.RSABE(alpha = 0.0294, theta0 = 0.72, CV = 0.6978, n = 24, design = "2x2x4")

❝ [1] 0.18378 (which is less than 80%).

RSABE allows unlimited scaling (check the ‘implied limits’):

round(scABEL(CV = CVwR, regulator = "FDA"), 4) lower upper 0.5700 1.7545

The FDA’s decision tree:

[image]

Even if you would pass in the first step of RSABE (bound ≤0), you would fail with the PE of 0.72 because it has to lie within 0.8000 – 1.2500 (details in this article). Confirmed (with any alpha and targetpower you like):

sampleN.RSABE(theta0 = GMR, CV = CVwR, design = "2x2x4") Error: True ratio 0.72 not within margins 0.8 ... 1.25!

❝ To recalculate the sample size with actual variability and assumed T/R of 95% and reduced alfa.

❝ rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294))

❝ $rss

❝ 21

❝ this N=21 is total sample size right??

Yes. Apart from the bug in the function (see the Comment in the previous post) …

N <- integer(20) # multiple calls to assess reproducibility for (j in seq_along(N)) { N[j] <- unlist(rss(n = n1, r = 2, S_WR = swR, params = list(sig_level = adj)))[["rss"]] } cat(paste(N, collapse = ", "), "\n") 21, 21, 21, 22, 21, 21, 21, 22, 22, 21, 21, 21, 21, 21, 20, 20, 22, 21, 21, 21
(will be different if you call it)

… that’s nonsense. Even if we ignore the irreproducible results due to the random seed, N < n1 is yet another bug.

❝ not the additional sample size (plz correct me if i am wrong)

See above. Since you didn’t specify the GMR, the function’s default 0.95 is used – which is much better than the 0.72 you expect.

❝ Since, we have already started study with N=24, we cant go ahead with stage-2.

❝ Am i missing something?

Yes, you do.
rss() is buggy. Though the PE-constraint is implemented (m = 1.25) in the list of parameters, the function should throw an error like sampleN.RSABE() if you specify a T/R-ratio outside 1/m … m. Instead it returns nmax, which defaults to 100.

unlist(rss(n = n1, r = 2, S_WR = swR, params = list(sig_level = adj, GMR = GMR)))[["rss"]] 100

If you would perform the second stage with 76 subjects …

power.RSABE(alpha = adj, theta0 = GMR, CV = CVwR, n = 24 + 76, design = "2x2x4") [1] 0.04751

… the study would fail because any power <0.5 is a failure by definition (see this article).

LALA mesalamine is a nasty drug. It’s not the high CV_wR which is problematic but the T/R-ratio. If you know already (say, from a pilot study or a failed one) that it’s outside 0.8000 – 1.2500, forget it. Even with a ‘better’ one:

sampleN.RSABE(theta0 = 0.84, CV = CVwR, design = "2x2x4", details = FALSE) ++++++++ Reference scaled ABE crit. +++++++++ Sample size estimation --------------------------------------------- Study design: 2x2x4 (4 period full replicate) log-transformed data (multiplicative model) 1e+05 studies for each step simulated. alpha = 0.05, target power = 0.8 CVw(T) = 0.6978; CVw(R) = 0.6978 True ratio = 0.84 ABE limits / PE constraints = 0.8 ... 1.25 Regulatory settings: FDA Sample size n power 120 0.80100

With any ’worse’ T/R-ratio the sample size will go through the roof.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Naksh
☆

India,
2023-12-25 12:42
(563 d 19:07 ago)

@ Helmut
Posting: # 23807
Views: 5,143

PE outside {0.80, 1.25} not possible

Post reply

Namaste Helmut !

Thanks for the reply and I agree that PE outside is not possible but it has been observed in pilot studies of mesalamine where many BLQs are observed in a subject which skewed the T/R to the extreme.

95% is a very bold to assume in HVD. but should actual T/R ratio for re-estimation of sample size? somehow it should be according to potvin method B?

This wonders me with T/R ratio of 85% as well.

power.RSABE(alpha = 0.0294, theta0 = 0.85, CV = 0.6978, n = 24, design = "2x2x4")

[1] 0.6369

rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294))

$rss

[1] 21



rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294, GMR=0.85))

$rss

[1] 100

Helmut
★★★

Vienna, Austria,
2023-12-25 14:15
(563 d 17:35 ago)

@ Naksh
Posting: # 23808
Views: 5,131

Forget rss()

Post reply

Namaste Naksh,

❝ […] I agree that PE outside is not possible but it has been observed in pilot studies of mesalamine where many BLQs are observed in a subject which skewed the T/R to the extreme.

❝ 95% is a very bold to assume in HVD. but should actual T/R ratio for re-estimation of sample size? somehow it should be according to potvin method B?

Although the authors of the paper follow Potvin ‘Method B’ in principle, they introduced a futility criterion on the total sample size (default nmax = 100). That means any re-estimated total sample size >nmax will be considered a failure in stage 1 and reported as nmax. For 2×2×2 and parallel designs that’s implemented in the functions of the package Power2Stage (argument Nmax).
As in the Potvin methods a fixed GMR (not the observed one in stage 1) is used. The authors discussed that in the paper and argued against it (i.e., not going fully adaptive). Implemented in the functions of Power2Stage (argument usePE = TRUE), though its use requires a careful selection of futility criteria in order not to compromise power.

❝ This wonders me with T/R ratio of 85% as well.

❝ power.RSABE(alpha = 0.0294, theta0 = 0.85, CV = 0.6978, n = 24, design = "2x2x4")

❝ [1] 0.6369

That would mean the second stage should be initiated because power <0.8.

❝ rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294))

❝ $rss

❝ [1] 21

❝ rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level=0.0294, GMR=0.85))

❝ $rss

❝ [1] 100

❝ !!

Sure, because in the first case GMR = 0.95 is used (the function’s default). Of course a total sample size less the one in stage 1 is nonsense (read my previous post again).
In the second case nmax is reported. By chance? Increase nmax and – surprise, surprise:

suppressMessages(library(adaptIVPT)) N <- integer(100) for (j in seq_along(N)) { N[j] <- unlist(rss(n = 24, r = 2, S_WR = 0.630, params = list(sig_level = 0.0294, GMR = 0.85, nmax = 500)))[["rss"]] } vals <- data.frame(N = sort(unique(N)), times = NA_character_) for (j in 1:nrow(vals)) { vals$times[j] <- sprintf("%5.0f%%", 100 * sum(N == vals$N[j]) / 100) } hist(Ns, xlab = "Total sample size", main = "", las = 1) print(vals, row.names = FALSE) N times 62 2% 67 1% 68 5% 71 9% 72 3% 73 2% 74 6% 75 10% 76 13% 77 3% 78 4% 79 12% 80 1% 81 3% 82 7% 83 5% 85 4% 86 2% 88 5% 89 2% 94 1%

Note the poor reproducibility (your results will be different).

Try the script of this post.

RSABE.TSD(n1 = 24, CVwR = 0.6978, GMR = 0.85, nmax = 100, final = FALSE) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 n1 : 24 futility on N : 100 CVwR : 0.6978 (observed) theta1 : 0.5700 (lower implied limit) theta2 : 1.7545 (upper implied limit) power : 0.6369 (estimated) Stage 2 initiated (insufficent power in stage 1) GMR : 0.8500 (fixed) target power : 0.8000 (fixed) n2 : 54 N : 78

However, if you really expect a T/R-ratio of 0.72, it would be both economically and ethically more than questionable to perform a study designed for 0.85. That’s a recipe for disaster!

power.RSABE(alpha=0.0294, CV = 0.6978, theta0 = 0.72, design = "2x2x4", n = 78) [1] 0.07005

Any power ≤0.5 is a failure by definition.

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Naksh ☆ India, 2023-12-26 05:49 (563 d 02:01 ago) @ Helmut Posting: # 23809 Views: 5,040	Forget rss() Post reply
	Hi Helumt, Thank you for the nmax info, didnt know. BTW your RSABE.TSD script seems very useful !! Thanks again for sharing with us.

Helmut
★★★

Vienna, Austria,
2023-12-26 13:50
(562 d 18:00 ago)

@ Naksh
Posting: # 23811
Views: 5,079

TSD useful at all?

Post reply

❝ Hi Helumt,

Please learn my name…

❝ Thank you for the nmax info, didnt know.

Welcome.

❝ BTW your RSABE.TSD script seems very useful !!

Use the updated [image]

-script at the end.

I don’t think that a TSD is useful at all, unless you use the observed GMR of stage 1 (argument usePE = TRUE) and a reasonable Nmax. But (‼) if your final GMR is really that bad, you will fail – even if the CV_wR is larger (which means more scaling).

RSABE.TSD(n1 = 24, CVwR = 0.6978, GMR = 0.85, Nmax = 100, usePE = TRUE, CVwR.2 = 0.8, GMR.2 = 0.72) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 (2-sequence 4-period full replicate) n1 : 24 futility on N : 100 futility on PE : outside 0.8000 – 1.2500 (guidances) CVwR : 0.6978 (observed) theta1 : 0.5700 (lower implied limit) theta2 : 1.7545 (upper implied limit) GMR : 0.8500 (observed) power : 0.6369 (estimated) Stage 2 initiated (insufficent power in stage 1) target GMR : 0.8500 (observed) target power : 0.8000 (fixed) n2 : 54 N : 78 CVwR : 0.8000 (observed) GMR : 0.7200 (observed) theta1 : 0.5338 (lower implied limit) theta2 : 1.8735 (upper implied limit) power : 0.0932 (estimated, fails RSABE)

You can massage the numbers as you like, but – irrespective of the CV_wR and sample size – with any GMR in the final analysis outside the PE-constraints {0.80, 1.25} you ~~will~~ must fail.

We could observe the PE-constraints according to the guidances:

Stop for futility in the interim if $\small{GMR\notin\left\{>0.80\;\wedge<1.25\right\}}$.

However, for such an approach the GMR has to be reasonably accurate, i.e., the stage 1 sample size sufficiently large. Don’t ask me what would be sufficient… Larger deviations between GMRs are a direct consequence of higher variability. The FDA requires at least 24 enrolled subjects in replicate designs intended for reference-scaling. If the CV_wR is expected to be large, I would opt for substantially more in order to get a reliable estimate.
With the results of your first stage with 24 subjects:

RSABE.TSD(n1 = 24, CVwR = 0.6978, GMR = 0.72, final = FALSE) adjusted alpha : 0.0294 (Pocock’s for superiority) design : 2x2x4 (2-sequence 4-period full replicate) n1 : 24 futility on PE : outside 0.8000 – 1.2500 (guidances) CVwR : 0.6978 (observed) theta1 : 0.5700 (lower implied limit) theta2 : 1.7545 (upper implied limit) GMR : 0.7200 (observed) power : 0.1838 (estimated) Study stopped for futility (lower PE constraint)

RSABE.TSD <- function(adj = 0.0294, design = "2x2x4", n1, CVwR, GMR, target = 0.8, Nmax = Inf, usePE = FALSE, final = TRUE, CVwR.2, GMR.2, risk = FALSE, details = TRUE) { # adj : adjusted alpha (stage 1 and final analysis) like Potvin ‘Method B’ # defaults to 0.0294 (Pocock’s for superiority) # design : "2x2x4": 2-sequence 4-period full replicate (default) # "2x2x3": 2-sequence 3-period full replicate # "2x3x3": 3-sequence 3-period partial replicate # n1 : stage 1 sample size # CvwR : observed within-subject CV of R in stage 1 # GMR : observed T/R-ratio in stage 1 # target : target (desired) power for sample size re-estimation (default 0.8) # Nmax : futility on total sample size (default: unrestricted) # usePE : FALSE: use a fixed (assumed) GMR # TRUE: use the GMR observed in stage 1 # final : TRUE : final analysis (requires CVwR.2 and GMR.2) # FALSE: interim analyis only # risk : FALSE: TIE acc. to all publications based on the ‘implied limits’ # TRUE : additionallly TIE acc. to the ‘desired consumer risk model’ # details: TRUE : output to the console # FALSE: data.frame of results if (adj >= 0.05) warning ("Unadjusted alpha ", adj, " is not recommended.") designs <- c("2x2x4", "2x2x3", "2x3x3") if (!design %in% designs) stop ("design ", design, " is not supported.") if (missing(n1)) stop ("Number of subjects in stage 1 (n1) must be given!") if (missing(CVwR)) stop ("CVwR in stage 1 must be given!") if (missing(GMR)) stop ("GMR in stage 1 must be given!") if (target <= 0.5 | target >= 1) stop ("target ", target, " doesn’t make sense.") nseq <- as.integer(substr(design, 3, 3)) if (Nmax <= n1 + nseq) stop (paste("Nmax <= n1 +", nseq, "doesn’t make sense.")) fClower <- 0.80 # lower boundary of the PE-constraints acc. to the guidance fCupper <- 1.25 # upper boundary of the PE-constraints acc. to the guidance des.verb <- c(" (2-sequence 4-period full replicate)", " (2-sequence 3-period full replicate)", " (3-sequence 3-period partial replicate)") if (final) { if (missing(CVwR.2)) stop ("CVwR in the final analysis (CVwR.2) must be given!") if (missing(GMR.2)) stop ("GMR in the final analysis (GMR.2) must be given!") } suppressMessages(require(PowerTOST)) # ≥1.5-4 (2022-02-21) limits <- function(CVwR, risk = FALSE) { # limits thetas <- scABEL(CV = CVwR, regulator = "FDA") # implied if (risk) { # ‘desired consumer risk model’ swR <- CV2se(CVwR) if (swR > 0.25) { thetas <- setNames(exp(c(-1, +1) * log(1.25) / 0.25 * swR), c("lower", "upper")) } else { thetas <- setNames(c(0.8, 1.25), c("lower", "upper")) } } return(thetas) } power <- function(alpha = 0.05, CVwR, GMR, n, design) { return(suppressMessages(power.RSABE(alpha = alpha, CV = CVwR, theta0 = GMR, n = n, design = design))) } TIE <- function(alpha = 0.05, CVwR, n, design, risk) { return(suppressMessages(power.RSABE(alpha = alpha, CV = CVwR, theta0 = limits(CVwR, risk)[["upper"]], n = n, design = design, nsims = 1e6))) } TIE.1.1 <- TIE.1.2 <- TIE.2.1 <- TIE.2.2 <- N <- req <- NA pwr.1 <- power(adj, CVwR, GMR, n = n1, design) futile <- FALSE sig <- binom.test(0.05 * 1e6, 1e6, alternative = "less", conf.level = 0.95)$conf.int[2] txt <- paste("adjusted alpha :", sprintf("%.4f", adj)) # Note: Pockock’s adjusted alphas are for group-sequential designs # with one interim analysis at exactly N/2 and known variances! if (adj == 0.0294) { txt <- paste(txt, "(Pocock’s for superiority)") } else { if (adj == 0.0304) { txt <- paste(txt, "(Pocock’s for equivalence)") } else { if (adj == 0.0250) { txt <- paste(txt, "(Bonferroni’s for two independent tests)") } else { txt <- paste(txt, "(custom)") } } } txt <- paste(txt, "\ndesign :", design, des.verb[design == designs], "\nn1 :", sprintf("%3.0f", n1)) if (Nmax < Inf) { txt <- paste(txt, "\nfutility on N :", sprintf("%3.0f", Nmax)) } txt <- paste(txt, "\nfutility on PE : outside", sprintf("%.4f – %.4f (guidances)", fClower, fCupper)) txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", limits(CVwR.2, risk)[["lower"]]), "(lower limit of the ‘desired consumer risk model’)") } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", limits(CVwR.2, risk)[["upper"]]), "(upper limit of the ‘desired consumer risk model’)") } txt <- paste(txt, "\nGMR :", sprintf("%.4f", GMR), "(observed)") txt <- paste(txt, "\npower :", sprintf("%.4f (estimated)", pwr.1)) if (pwr.1 >= target) { # stop in stage 1 TIE.1.1 <- TIE(adj, CVwR, n1, design, risk = FALSE) txt <- paste0(txt, "\nStudy stopped in stage 1 (sufficient power)", "\nempirical TIE :", sprintf(" %.4f", TIE.1.1), " (all publications") if (TIE.1.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { TIE.1.2 <- TIE(adj, CVwR, n1, design, risk = TRUE) txt <- paste(txt, "\n ", sprintf("%.4f", TIE.1.2), "(‘desired consumer risk model’") if (TIE.1.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.1.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR, CV = CVwR, design = design, regulator = "FDA", n = n1, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } } else { # possibly initiate stage 2 if (GMR > fClower & GMR < fCupper) { N <- sampleN.RSABE(alpha = adj, CV = CVwR, theta0 = GMR, targetpower = target, design = design, print = FALSE, details = FALSE)[["Sample size"]] } if (is.na(N) | N > Nmax | GMR <= fClower | GMR >= fCupper) { txt <- paste(txt, "\nStudy stopped for futility") if (GMR <= fClower | GMR >= fCupper) { # stop for GMR futility if (GMR <= fClower) txt <- paste(txt, "(lower PE constraint)") if (GMR >= fCupper) txt <- paste(txt, "(upper PE constraint)") } else { # stop for Nmax futility txt <- paste(txt, "(insufficent power in stage 1\nbut total sample size") if (!is.na(N)) txt <- paste(txt, N) txt <- paste(txt, "above futility limit)") } } else { if (final) { pwr.2 <- power(adj, CVwR.2, GMR.2, n = N, design) if (GMR.2 <= 0.8 | GMR.2 >= 1.25) { final.est <- FALSE } else { final.est <- TRUE TIE.2.1 <- TIE(adj, CVwR.2, N, design, risk = FALSE) if (risk) TIE.2.2 <- TIE(adj, CVwR.2, N, design, risk = TRUE) } } else { CVwR.2 <- GMR.2 <- pwr.2 <- TIE.2.1 <- TIE.2.2 <- NA theta1.1 <- theta1.2 <- req <- NA } txt <- paste(txt, "\nStage 2 initiated (insufficent power in stage 1)", "\ntarget GMR :", sprintf("%.4f", GMR)) ifelse (usePE, txt <- paste(txt, "(observed)"), txt <- paste(txt, "(fixed)")) txt <- paste(txt, "\ntarget power :", sprintf("%.4f (fixed)", target), "\nn2 :", sprintf("%3.0f", N - n1), "\nN :", sprintf("%3.0f", N)) if (N < 24) txt <- paste0(txt, "; less than the FDA’s minimum of 24 subjects!") if (final) { txt <- paste(txt, "\nCVwR :", sprintf("%.4f (observed)", CVwR.2), "\nGMR :", sprintf("%.4f (observed)", GMR.2), "\ntheta1 :", sprintf("%.4f (lower implied limit)", limits(CVwR.2)[["lower"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", limits(CVwR.2, risk)[["lower"]]), "(lower limit of the ‘desired consumer risk model’)") } txt <- paste(txt, "\ntheta2 :", sprintf("%.4f (upper implied limit)", limits(CVwR.2)[["upper"]])) if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", limits(CVwR.2, risk)[["upper"]]), "(upper limit of the ‘desired consumer risk model’)) } txt <- paste(txt, "\npower :", sprintf("%.4f (estimated,", pwr.2)) ifelse (pwr.2 < 0.5, txt <- paste(txt, "fails RSABE)"), txt <- paste(txt, "may pass RSABE)")) if (final.est) { txt <- paste0(txt, "\nempirical TIE :", sprintf(" %.4f", TIE.2.1), " (all publications") if (TIE.2.1 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } if (risk) { txt <- paste(txt, "\n ", sprintf("%.4f", TIE.2.2), "(‘desired consumer risk model’") if (TIE.2.2 > sig) { txt <- paste0(txt, "; significantly inflated)") } else { txt <- paste0(txt, ")") } } if (TIE.2.1 > sig) { req <- scABEL.ad(alpha.pre = adj, theta0 = GMR.2, CV = CVwR.2, design = design, regulator = "FDA", n = N, print = FALSE, details = FALSE)[["alpha.adj"]] txt <- paste(txt, "\nAn adjusted alpha of", sprintf("%.4f", req), "(or less)\nwould be needed to control the Type I Error.") } else { req <- NA } } } } } if (details) { # output to the console cat(txt, "\n") } else { # data.frame of results # limits in stage 1 L.1.1 <- limits(CVwR, FALSE)[["lower"]] U.1.1 <- limits(CVwR, FALSE)[["upper"]] L.1.2 <- U.1.2 <- NA if (risk) { L.1.2 <- limits(CVwR, TRUE)[["lower"]] U.1.2 <- limits(CVwR, TRUE)[["upper"]] } if (final) { # limits in the final analysis L.2.1 <- limits(CVwR.2, FALSE)[["lower"]] U.2.1 <- limits(CVwR.2, FALSE)[["upper"]] L.2.2 <- U.2.2 <- NA if (risk) { L.2.2 <- limits(CVwR.2, TRUE)[["lower"]] U.2.2 <- limits(CVwR.2, TRUE)[["upper"]] } } else { L.2.1 <- U.2.1 <- L.2.2 <- U.2.2 <- NA } result <- data.frame(alpha.adj = adj, design = design, n1 = n1, CVwR = CVwR, GMR = GMR, usePE = usePE, Nmax = Nmax, risk.model = risk, L.1.1 = L.1.1, U.1.1 = U.1.1, L.1.2 = L.1.2, U.1.2 = U.1.2, power.1 = pwr.1, TIE.1.1 = TIE.1.1, TIE.1.2 = TIE.1.2, futile = futile, n2 = N - n1, N = N) if (final) { res2 <- data.frame(CVwR.2 = CVwR.2, GMR.2 = GMR.2, L.2.1 = L.2.1, U.2.1 = U.2.1, L.2.2 = L.2.2, U.2.2 = U.2.2, power.2 = pwr.2, TIE.2.1 = TIE.2.1, TIE.2.2 = TIE.2.2, alpha.req = req) result <- cbind(result, res2) } result <- result[, colSums(is.na(result)) < nrow(result)] return(result) } }

—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes