Helmut
★★★
avatar
Homepage
Vienna, Austria,
2021-08-27 15:51
(20 d 05:38 ago)

Posting: # 22550
Views: 643
 

 Desperate reader [General Sta­tis­tics]

Dear all (and esp. ElMaestro),

with great interest I read two recent papers.1,2

On [image] the author posted:
    A short time back I had a paper published showing that a major issue with current statistical software is that the model is mis-specified when one uses a mixed model to evaluate a semi-replicate BE trials. […]

    One of the conclusions drawn is that if we want to evaluate the data with a mixed model without running into model/convergence issues then we need to specify the covariance matrix \(\small{V }\) without using \(\small{V = ZGZ^t + R}\).

    Even so there is still no recipe for deriving the confidence interval. Hence, the options with the mixed model are:
    1. Work with the wrong model and the software may not converge.
    2. Work with the right model but major stats packages do not support it, and
      they do not derive a CI for T/R.
    That is an agonizing choice. This has huge practical implications.

I absolutely agree with this statement.
However, what can we do? We need the bloody confidence interval. At the end of the day we are [image] left out in the rain if we are dealing with a partial replicate design and the study has to be evaluated for average bioequivalence (according to the [image]’s wishful thinking if \(\small{s_\textrm{wR}<0.294}\)).
To obtain the variance components which are estimable and the point estimate is just not enough. As long as we can’t get the standard error of the treatment difference and its associated degrees of freedom, we are at a loss. Hence, I repeat my mantra:

 Avoid the partial replicate design 
 if intended for RSABE and if 
 there is even the slightest chance that you have to assess it for ABE. 


Of note: I used the supplementary [image] script and got for

res <- Run.REML()
PE  <- 100 * exp(res$Fixed.eff[["TrtT", 1]] - res$Fixed.eff[["TrtR", 1]])
print(res); cat("PE", PE, "\n")

$Outcome
[1] "The model converged."

$LogLikelihood
[1] 15.33728

$Fixed.eff
              [,1]
TrtT    8.02569184
TrtR    8.00330042
Per1   -0.04811877
Per2   -0.04682231
SeqRTR  0.00385480
SeqRRT -0.05092437

$Random.eff
  Var.Component  Ini.guess REML.Estimate
1          varT 0.06841615    0.07049570
2         varBR 0.03576077    0.03621596
3         varWR 0.01240137    0.01324664
4        varBRT 0.03576077    0.04563785

PE 102.2644


Pretty close to the modified script we discussed last year (given at the end). Much faster than mine. :-D

Nelder-Mead optimization (31.6 seconds)
  LogLikelihood 75.98722
  Evaluations   239
  Successful completion.

   Component    Initial   Estimate
   sigma²(T) 0.06841615 0.07049451
  sigma²(bR) 0.03576077 0.03621602
  sigma²(wR) 0.01240137 0.01324648
 covar(bTbR) 0.05208846 0.04563728

 Parameter  Estimate
  sigma(T) 0.2655080
     CV(T)   27.026%
 sigma(bR) 0.1903051
    CV(bR)   19.204%
 sigma(wR) 0.1150934
    CV(wR)   11.548%
        PE 102.2644%


[image]




  1. Fuglsang A. Professor Endrenyi’s Legacy: An Evaluation of the Regulatory Requirement “Fixed Effects, Rather Than Random Effects, Should Be Used for All Terms”. J Pharm Pharm Sci. 2021; 24: 413–20. doi:10.18433/jpps31872. [image] Open access.
  2. Fuglsang A. Mitigation of the convergence issues associated with semi-replicated bioequivalence data. Pharm Stat. 2021. Early View. doi:10.1002/pst.2142.


###########################
# Section 1: Housekeeping #
###########################
library(PowerTOST)
library(TeachingDemos)
GetData <- function(path, sep, dec) {
  # read data from a CSV-file
  if (missing(sep)) sep <- ","
  if (!sep %in% c(",", ";", "t"))
    stop("Column separator must be any of ',', ';', 't'.")
  if (sep == "t") sep <- "\t"
  if (missing(dec)) dec <- "."
  if (!dec %in% c(".", ","))
    stop("Decimal separator must be '.', or ','.")
  # strip quotes and skip eventual commentary lines
  D        <- read.csv(path, sep = sep, dec = dec,
                       quote = "", comment.char = "#")
  names(D) <- c("Subj", "Per", "Seq", "Trt", "Y")
  cols     <- c("Subj", "Per", "Seq", "Trt")
  D[cols]  <- lapply(D[cols], factor)
  return(invisible(D))
}

CreateX <- function(D) {
  # first the two treatments
  TrtT <- as.numeric(as.character(D$Trt) == "T")
  TrtR <- as.numeric(as.character(D$Trt) == "R")
  X <- cbind(TrtT, TrtR)
  R <- qr(X)$rank
  # now the periods
  for (p in unique(D$Per)) {
    v  <- as.numeric(D$Per == p)
    XX <- data.frame(X, v)
    names(XX)[ncol(XX)] <- paste0("P", p)
    rnk <- qr(XX)$rank
    if (rnk > R) {
      X <- XX
      R <- rnk
    }
  }
  for (q in unique(D$Seq)) {
    v  <- as.numeric(D$Seq == q)
    XX <- data.frame(X, v)
    names(XX)[ncol(XX)] <- paste0("Q", q)
    rnk <- qr(XX)$rank
    if (rnk > R) {
      X <- XX
      R <- rnk
    }
  }
  return(as.matrix(X))
}
Create.CovM <- function(Params) {
  # block diagonal covariance matrix
  varT  <- Params[1]
  varBR <- Params[2]
  varWR <- Params[3]
  varRT <- Params[4]
  Nobs  <- length(D$Y)
  V     <- matrix(0, ncol = Nobs, nrow = Nobs)
  for (iRow in 1:Nobs) {
    for (iCol in 1:Nobs) {
      if (iCol == iRow) { # diagonal elements
        if (D$Trt[iRow] == "T") {
          V[iRow, iCol] <- V[iRow, iCol] + varT
        } else {
          V[iRow, iCol] <- V[iRow, iCol] + varWR + varBR
        }
      } else {            # off diagonal elements
        if (D$Subj[iRow] == D$Subj[iCol]) {
          if (D$Trt[iCol] == D$Trt[iRow]) {
            V[iRow, iCol] <- V[iRow, iCol] + varBR
          } else {
            V[iRow, iCol] <- V[iRow, iCol] + varRT
          }
        }
      }
    }
  }
  return(V)
}

############################
# Section 2: Matrix things #
############################
Obj.F12 <- function(Pars) {
  j <<- j + 1
  int[[j]] <<- Pars
  # this is line 3 of page 10 of:
  # http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
  CovM  <- Create.CovM(Pars)
  A     <- -0.5*log(det(CovM))
  # Note: NaNs can be proceded in the next line.
  B     <- suppressWarnings(-0.5*log(det(t(X) %*% solve(CovM) %*% X)))
  est.b <- solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp   <- y - X %*% est.b
  C     <- -0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  C     <- as.numeric(C) # make a scalar for other solvers
  return(A + B + C)
}
Final <- function(Pars) {
  CovM <- Create.CovM(Pars)
  tmp  <- solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  PE   <- exp(as.numeric(tmp["TrtT", ] - tmp["TrtR", ]))
  return(PE)
}

########################
# Section 3: Execution #
########################
Some.Initial.Guesses <- function(check = FALSE) {
  # design (all possible ones, error handling, full or partial)
  subjs <- unique(as.numeric(D$Subj))
  trts  <- sort(unique(as.character(D$Trt)), decreasing = TRUE)
  if (!length(trts) == 2) stop("Only two treatments supported.")
  if (sum(!trts %in% c("T", "R")) !=0)
    stop("treatments must be coded as 'T' and 'R'.")
  seqs  <- sort(unique(as.character(D$Seq)), decreasing = TRUE) # T first
  type  <- paste(seqs, collapse = "|") # identifier
  nseq  <- length(unique(as.character(D$Seq)))
  nper  <- length(unique(as.numeric(D$Per)))
  nsubj <- length(subjs)
  if (nchar(type) == 19) {  # 4-period 4-sequence full replicate designs
    if (nper != 4) stop("4 periods required in this full replicate design.")
    if (nseq != 4) stop("4 sequences required in this full replicate design.")
    des <- "full"
  }
  if (nchar(type) == 9) {  # 4-period full replicate designs
    if (nper != 4) stop("4 periods required in this full replicate design.")
    if (nseq != 2) stop("2 sequences required in this full replicate design.")
    des <- "full"
  }
  if (nchar(type) == 7) {  # 3-period replicates
    if (type %in% c("TRT|RTR", "TRR|RTT")) {
      if (nper != 3) stop("3 periods required in this full replicate design.")
      if (nseq != 2) stop("2 sequences required in this full replicate design.")
      des <- "full"
    }
    if (type == "TRR|RTR") {
      if (nper != 3) stop("3 periods required in the extra-reference design.")
      if (nseq != 2) stop("2 sequences required in the extra-reference design.")
      des <- "partial"
    }
  }
  if (nchar(type) == 11) { # Balaam's design or 3-sequence partial
    if (!type == "TR|RT|TT|RR") { # Balamm's
      if (nper != 3) stop("3 periods required in this partial replicate design.")
      if (nseq != 3) stop("3 sequences required in this partial replicate design.")
      des <- "partial"
    } else {                      # TRT|RTR|RRT
      if (nper != 2) stop("2 periods required in Balaam's design.")
      if (nseq != 4) stop("4 sequences required in Balaam's design.")
      des <- "full"
    }
  }
  if (type == "TR|RT") stop("TR|RT design not supported yet.")
  gm.T <- gm.R <- numeric(nsubj) # vector of geometric means
  for (subj in seq_along(subjs)) {
    gm.T[subj] <- mean(log(D$Y[which(D$Subj == subj & D$Trt == "T")]))
    gm.R[subj] <- mean(log(D$Y[which(D$Subj == subj & D$Trt == "R")]))
  }
  # remove NAs
  gm.T <- gm.T[!is.na(gm.T)]  
  gm.R <- gm.R[!is.na(gm.R)]  
  if (des == "partial") { # guess varT (between + within)
    m    <- lm(log(Y) ~ Seq + Per, data = D[D$Trt == "T", ])
    varT <- anova(m)["Residuals", "Mean Sq"]
  } else {                # guess VarWT
    m     <- lm(log(Y) ~ Seq + Per + Subj, data = D[D$Trt == "T", ])
    varWT <- anova(m)["Residuals", "Mean Sq"]
  }
  # guess VarWR
  m     <- lm(log(Y) ~ Seq + Per + Subj, data = D[D$Trt == "R", ])
  varWR <- anova(m)["Residuals", "Mean Sq"]   # guess varBR
  m     <- lm(log(Y) ~ Seq + Per, data = D[D$Trt == "R", ])
  varBR <- anova(m)["Residuals", "Mean Sq"] - varWR
  # TODO: full replicates
  varRT <- 0.5 * (varT + varBR)
  rslt  <- c(varT, varBR, varWR, varRT)
  return(rslt)
}
MyREML <- function(maxit = 1000, method = "Nelder-Mead") {
  Ini <- Some.Initial.Guesses()
  ptm <- proc.time()[[3]]
  F   <- optim(par = Ini, fn = Obj.F12, method = method,
               control = list(reltol = 1E-12, maxit = maxit,
               fnscale = -1))
  et  <- proc.time()[[3]] - ptm
  Nms <- c("var_T", "var_bR", "var_wR", "covar_bTbR")
  X   <- data.frame(Nms, Ini, F$par)
  names(X) <- c("Component", "Initial", "Estimate")
  if (F$convergence == 0)  msg <- "Successful completion."
  if (F$convergence == 1)  msg <- paste("Iteration limit", maxit, "reached.")
  if (F$convergence == 10) msg <- "Degeneracy of the Nelder–Mead simplex."
  cat(paste0("\nNelder-Mead optimization (", signif(et, 3), " seconds)"),
             "\n  LogLikelihood", F$value,
             "\n  Evaluations  ", F$counts[["function"]], "\n ", msg, "\n")
  return(X)
}

# Data has to have five columns in this order:
#   Subject (numeric), Period (numeric), Sequence (character),
#   Treatment (character), Response (numeric)
# Column separator 'sep' comma (default), semicolon, or t (tab)
# Decimal separator 'dec' period (default) or comma
path <- "https://bebac.at/downloads/ds02.csv" # EMA partial replicate
D    <- res <- PE <- int <- NULL # nullify eventual previous data and results
D    <- GetData(path)
X    <- CreateX(D)
y    <- log(D$Y)
# capture intermediate results:
# https://stackoverflow.com/questions/23975101/getting-more-details-from-optim-function-from-r
j    <- 0
int  <- list()
# the workhorse
#  optional arguments:
#  maxit = X to reduce the number of iterations from the default 1000
#  plot = TRUE for a plot of the progress of optimization
#  print = TRUE to print the progress of optimization
res <- MyREML()
res[, 1]     <- c("sigma\u00B2(T)", "sigma\u00B2(bR)",
                  "sigma\u00B2(wR)", "covar(bTbR)")
fin          <- data.frame(Parameter = c("sigma(T)", "CV(T)",
                                         "sigma(bR)", "CV(bR)",
                                         "sigma(wR)", "CV(wR)", "PE"),
                           Estimate = NA_real_)
for (j in 1:3) {
  fin$Estimate[j * 2 - 1] <- sprintf("%.7f", sqrt(res[j, 3]))
  fin$Estimate[j * 2]     <- sprintf("%3.3f%%", 100 * mse2CV(res[j, 3]))
}
fin$Estimate[7] <- sprintf("%3.4f%%", 100 * Final(res$Estimate))
int          <- data.frame(matrix(unlist(int), nrow = length(int),
                                  byrow = TRUE))
if (is.null(attr(dev.list(), "names"))) windows(width = 6.5, height = 4)
# variance components
lbls <- c(expression(sigma[T]^2),
          expression(sigma[bR]^2),
          expression(sigma[wR]^2),
          expression(italic(cov)(bTbR)))
cols <- c("darkgreen", "blue", "magenta", "red")
op   <- par(no.readonly = TRUE)
par(mar = c(3.3, 3.1, 1.6, 4.1),  mgp = c(2.2, 0.5, 0), lend = 2,
    ljoin = 1, tcl = 0.25, cex.axis = 0.8)
plot(1:nrow(int), int$X1, type = "n", ylim = range(int), log = "x",
     xlab = "Iteration", ylab = "Estimate", main = "Progress of optimization",
     cex.lab = 0.95, cex.main = 1, font.main = 1, axes = FALSE)
grid()
axis(1)
axis(2, las = 1)
box()
ax <- axTicks(1, log = TRUE)
for (j in 1:4) {
  lines(1:nrow(int), int[, j], type="s", lwd = 2, col = cols[j])
  shadowtext(sqrt(2), int[1, j], cex = 0.65, pos = 3,
             labels = signif(int[1, j], 4), col = cols[j],
             bg = "white", r = 0.2)
  shadowtext(nrow(int)*0.9, tail(int[, j], 1), cex = 0.65, pos = 3,
             labels = signif(tail(int[, j], 1), 4), col = cols[j],
             bg = "white", r = 0.2)
  mtext(lbls[j], 4, at = tail(int[, j], 1), cex = 0.8, line = 0.5, las = 1)
}
par(op)
print(res, row.names = FALSE); cat("\n"); print(fin, row.names = FALSE)

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2021-08-27 22:18
(19 d 23:12 ago)

@ Helmut
Posting: # 22552
Views: 541
 

 Desperate reader

Hi Hötzi,

» We need the bloody confidence interval.
» As long as we can’t get the standard error of the treatment difference and its associated degrees of freedom, we are at a loss.

If we look at SAS documentation e.g. here and here, then
  1. I do not understand the underlying math (just try and look into the spectral decomposition things mentioned as part of DDFM derivation; it sends my head spinning).
  2. Whatever SAS does, it seems to require that we work in V through ZGZt+R in order to estimate CI's for the fixed effects.
It would be interesting to see someone who has solid understanding of matrix theory and stats to work out the equations when the between and within-variability for T cannot be separated.
However, since FDA now threw the mixed model out for semi-replicate designs, interesting may in reality just mean academically attractive but possible not too useful at this moment. But who knows - until someone does it and explores it, it is not known if there is an advantageous property hidden somewhere.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2021-08-27 23:54
(19 d 21:36 ago)

@ ElMaestro
Posting: # 22553
Views: 542
 

 Misunderstanding?

Hi ElMaestro,

» If we look at SAS documentation e.g. here and here, then
» 1. I do not understand the underlying math (just try and look into the spectral decomposition things mentioned as part of DDFM derivation; it sends my head spinning).

Wooaah!!

» 2. Whatever SAS does, it seems to require that we work in V through ZGZt+R in order to estimate CI's for the fixed effects.

Seems so.

» It would be interesting to see someone who has solid understanding of matrix theory and stats to work out the equations when the between and within-variability for T cannot be separated.

Definitely not me…

» However, since FDA now threw the mixed model out for semi-replicate designs, …

What on earth gives you this impression? I’m talking about ABE (Stats guidance Appendix E, ANDA guidance page 29):

The following codes are an example of the determination of unscaled average BE for LAUCT with a partially replicate 3-way BE design:

PROC MIXED
  data=pk;
  CLASSES SEQ SUBJ PER TRT;
  MODEL LAUCT = SEQ PER TRT/ DDFM=SATTERTH;
  RANDOM TRT/TYPE=FA0(2) SUB=SUBJ G;
  REPEATED/GRP=TRT SUB=SUBJ;
  ESTIMATE 'T vs. R' TRT 1 -1/CL ALPHA=0.1;
  ods output Estimates=unsc1;
  title1 'unscaled BE 90% CI - guidance version';
  title2 'AUCt';
run;

data unsc1;
  set unsc1;
  unscabe_lower=exp(lower);
  unscabe_upper=exp(upper);
run;

[image]

If reference-scaling is acceptable (right branch), even a spreadsheet (‼) would do.

» … interesting may in reality just mean academically attractive but possible not too useful at this moment.

Nope. Talk to John about his data sets which didn’t converge. Given, FA0(1) did – with a warning – but I think there is a data set posted at Certara’s forum where nothing worked. Study done, no result. THX, [image].

» But who knows - until someone does it and explores it, it is not known if there is an advantageous property hidden somewhere.

Might be. ;-)

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2021-08-28 10:00
(19 d 11:30 ago)

@ Helmut
Posting: # 22554
Views: 505
 

 Misunderstanding?

Hi Hötzi,

» » However, since FDA now threw the mixed model out for semi-replicate designs, …
»
» What on earth gives you this impression? I’m talking about ABE (Stats guidance Appendix E, ANDA guidance page 29):

Wow, that is actually right.
Page 27: "PROC GLM should be used for partially replicate (3-way) BE studies"
Page 29: "For PK parameters with a sWR < 0.294, use the unscaled average BE approach." (which implies PROC MIXED in their code)

So, we must start to work out sWR using the equations. Once we got it, we know whether we need a mixed model or not for the evaluation of BE. We may need the mixed model for one metric, like AUCt and AUCinf, and the equations (ref scaled ABE) for another, like Cmax.
I guess this is also the gist of the decision tree you gave. I wonder if this is truly their intention. The mixed model therefore seems to have been saved by the bell. This really emphasizes the need for a CI solution that allows V estimated without the existence of Z and R separately.

I am curious now - we should make a little study on the type I error in case of missing ref values.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2021-08-28 10:38
(19 d 10:52 ago)

@ ElMaestro
Posting: # 22555
Views: 491
 

 Here we are

Hi ElMaestro,

» » What on earth gives you this impression? I’m talking about ABE (Stats guidance Appendix E, ANDA guidance page 29):
»
» Wow, that is actually right.
» Page 27: "PROC GLM should be used for partially replicate (3-way) BE studies"

And Page 32:
  • PROC MIXED should be used for fully replicate (4-way) BE studies
Only the [image] knows why.

» Page 29: "For PK parameters with a sWR < 0.294, use the unscaled average BE approach." (which implies PROC MIXED in their code)

Now you got it. ;-) Some stuff there.

» So, we must start to work out sWR using the equations.

Yep. At least that’s easy.

» Once we got it, we know whether we need a mixed model or not for the evaluation of BE. We may need the mixed model for one metric, like AUCt and AUCinf, and the equations (ref scaled ABE) for another, like Cmax.

Correct.

» I guess this is also the gist of the decision tree you gave. I wonder if this is truly their intention.

That’s the interpretation of all authors who assessed the Type I Error. Sent you Mehl.

» The mixed model therefore seems to have been saved by the bell.

🥊

» This really emphasizes the need for a CI solution that allows V estimated without the existence of Z and R separately.

Happy that finally I conveyed the message.

» I am curious now - we should make a little study on the type I error in case of missing ref values.

Oh dear, reaching for the stars! So far we have no clue how to come up with a solution for complete data. Appetizer.
Since there were comments on [image] about study costs, see there.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Activity
 Admin contact
21,678 posts in 4,533 threads, 1,541 registered users;
online 4 (0 registered, 4 guests [including 4 identified bots]).
Forum time: Thursday 21:30 CEST (Europe/Vienna)

Ignorance more frequently begets confidence
than does knowledge.    Charles Darwin

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5