d_labes
★★★

Berlin, Germany,
2009-11-23 14:32
(4231 d 08:59 ago)

Posting: # 4374
Views: 11,424
 

 Exact power and sample size, Part I [R for BE/BA]

Dear All!

In the era of 'Appelssstrudel' and Owen's Q-function as implemented in R (see here) R-users don't have the necessity to use approximations but should calculate always exact power and sample size!

Since other designs (parallel group, cross-over with more than 2 formulations, replicate designs) used occasionally in BE studies also rely on the two-one-sided t-tests procedure (TOST) here comes my R-super-code:

# -- The functions of normal-, t-distributions and integrate() --------------
require(stats)
# ---------------------------------------------------------------------------
# Owen's Q-function

OwensQ <- function (nu, t, delta, a, b)
{
   # Observation: for 'really' large df (nu>2000) and large delta/b the 
   # density function is zero over nearly all its range. Q is than 
   # returned sometimes as =0! See documentation (?integrate) for that.
   # example: OwensQ(3000,1.64,-10,0,300) = 1
   #          OwensQ(3000,1.64,-10,0,350) = 5.614476e-12 !
   # Idea: adapt upper and/or lower to account for that

   low <- a; up<- b
   if (nu>=1000){
     # try to shorten the range over that is integrated
     x <- seq(a, b, by=(b-a)/400)
     dens <- .Q.integrand(x, nu, t, delta)
     r <- data.frame(x=x, dens=dens)
     r <- r[r$dens>0,]      if (length(r)>0) {# if any >0
       up <- max(r$x)+(b-a)/600  # only the upper x+step is used
       rm(r,x,dens)
     }
   }
   Qintegral <- integrate(.Q.integrand, lower = low, upper = up, nu=nu, t=t,
                delta = delta, subdivisions = 10000,
                rel.tol = 1.e-10, abs.tol=1.e-10,
                stop.on.error = TRUE, keep.xy = FALSE, aux = NULL)
   Q <- Qintegral[[1]]

   return(Q) 
}
# --------------------------------------------------------------------------
# Integrand of the definit integral in Owen's Q. Used in the call of integrate()
# Not useful alone, I think ? Leading . hides this function
   
.Q.integrand <- function(x, nu, t, delta)
{ #version without for - loop
   lnQconst <- -((nu/2.0)-1.0)*log(2.0) - lgamma(nu/2.)
   dens <- pnorm( t*x/sqrt(nu) - delta, mean = 0, sd = 1, log = FALSE) *
                 exp( (nu-1)*log(x) - 0.5*x^2 + lnQconst )

   return(dens)
}
# --------------------------------------------------------------------------
# 'raw' power function to be used in sampleN.TOST avoiding overhead
# bk is the so called design constant (see later)

.power.TOST <- function(alpha=0.05, theta1, theta2, diffm, se, n, df, bk=2)
{
    tval   <- qt(1 - alpha, df, lower.tail = TRUE, log.p = FALSE)
    delta1 <- (diffm-theta1)/(se*sqrt(bk/n))
    delta2 <- (diffm-theta2)/(se*sqrt(bk/n))
    R      <- (delta1-delta2)*sqrt(df)/(2.*tval)
   
    # to avoid numerical errors in OwensQ implementation
    if (df>10000) {
      # Joulious formula (57) or (67), normal approximation
      p1 <- pnorm( (abs(delta1)-tval), lower.tail = TRUE)
      p2 <- pnorm( (abs(delta2)-tval), lower.tail = TRUE)
      return (p1 + p2 -1)
    }
   
    p1 <- OwensQ(df,  tval, delta1, 0, R)
    p2 <- OwensQ(df, -tval, delta2, 0, R)

    return( p2-p1 )
}

Continuation follows.

Use it on your own risk! The author does not take any responsibility for damage of your computer, your career or of the world :-D .

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2009-11-23 14:38
(4231 d 08:52 ago)

(edited by d_labes on 2009-11-23 15:26)
@ d_labes
Posting: # 4375
Views: 9,843
 

 Exact power and sample size, Part II

Dear All!

Here comes part II of the story:

# a bunch of functions to get the design characteristics
known.designs<-function()
{
# Note: the df for the replicate designs are those without carry-over
# n is the total number in case of cross-over design,
# n is number per group in case of parallel group design
# df2 = degrees of freedom for robust analysis (aka Senn's basic estimator)
# bk is the so-called design constant

  des <- ("no  design    df      df2    steps  bk
           0  parallel 2*(n-1) 2*(n-1)  1      2
           1  2x2      n-2     n-2      2      2
           2  3x3      2*n-3   n-3      3      2
           3  4x4      3*n-5   n-4      4      2
           4  2x2x3    2*n-3   n-2      2      1.5
           5  2x2x4    3*n-4   n-2      2      1
           6  2x4x4    3*n-4   n-4      4      1")
# eventually it would be better to have steps=6 in case of 3x3 (6 seq. design)
# the df2 have to be checked! Not used so far. TODO !

   
  des2 <- textConnection(des)
  designs <- read.table(des2, header=TRUE, sep="", strip.white=TRUE, as.is=TRUE)           
  close(des2)   # without this close() warnings are generated
 
  # names for nicer output of design
  designs$name[designs$no==0] <- "2 parallel groups"
  designs$name[designs$no %in% c(1,2,3)] <-
      paste(designs$design[designs$no %in% c(1,2,3)],"crossover")
  designs$name[designs$no %in% c(4,5,6)] <-
      paste(designs$design[designs$no %in% c(4,5,6)],"replicate crossover")

  return(designs)

}
#--- return no of design ---
# design: a character string describing the design

.design.no<-function(design)
{
  #take the first word if more then one
  desi <- unlist(strsplit(tolower(design)," "))[1]
  des <- known.designs()
  i   <- match(desi, des$design)
  if (!is.na(i)) return(des$no[i]) else return(NA)
}
#--- return design constant ---
.design.bk<-function(design.no)
{
  des <- known.designs()
  return(des$bk[des$no==design.no])
}
#--- return design degrees of freedom ---
# as expression

.design.df<-function(design.no, nvar="n")
{
  des <- known.designs()
  df  <- (des$df[des$no==design.no])
  #replace with calling variable name
  df  <- gsub('n',nvar,df)
  e   <- parse(text=df, srcfile=NULL)
  return(as.expression(e))
}
#--- return step for sample size search ---
.design.step<-function(design.no)
{
  des <- known.designs()
  return (des$steps[des$no==design.no])
}
#--- return step for sample size search ---
.design.name<-function(design.no)
{
  des <- known.designs()
  return (des$name[des$no==design.no])
}

Continuation follows.

Again: Use it on your own risk! The author does not take any responsibility for
damage of your computer, your career or of the world :-D .

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2009-11-23 14:50
(4231 d 08:40 ago)

(edited by d_labes on 2009-11-23 15:24)
@ d_labes
Posting: # 4376
Views: 9,613
 

 Exact power and sample size, Part III

Dear All!

Here comes part III of the story. Puzzling all the helpers together we obtain:

# ----- helper functions for sampleN.TOST ---------------------------
# --------------------------------------------------------------------
# Sample size for a desired power, large sample approx.
# bk = design constant, see known.designs()

.sampleN0 <- function(alpha=0.05, targetpower=0.8, theta1, theta2, diffm, se,
                     steps=2, bk=2)
{
    z1 <- qnorm(1-alpha)
    if (abs(diffm)>0.0001) z2 <- qnorm(targetpower)
      else z2 <- qnorm(1-(1-targetpower)/2)
    n01<-(bk/2)*((z1+z2)*(se*sqrt(2)/(diffm-theta1)))^2;
    n02<-(bk/2)*((z1+z2)*(se*sqrt(2)/(diffm-theta2)))^2;
    n0 <- max(n01,n02)
    n0 <- ceiling(n0)
    #make an even multiple of step (=2 in case of 2x2 cross-over)
    n0 <- steps*trunc(n0/steps)
    if (n0<4) n0 <- 4   # minimum sample size
   
    return(n0)
}

# --------------------------------------------------------------------------
# Power of two-one-sided-t-tests using OwensQ or approx. using non-central t
# (this is a wrapper to .power.TOST(...) and .approx.power.TOST(...))
# In case of logscale=TRUE give ldiff, ltheata1 and ltheta2 as ratios
# f.i. ldiff=0.95, ltheta1=0.8, ltheta2=1.25
# In case of logscale=FALSE give ldiff, ltheata1 and ltheta2 as
# difference to 1, f.i. ldiff=0.05 (5% difference) ltheata1=-0.2,
# ltheta2=0.2 (20% equiv. margins)
# CV is always the coefficient of variation but as ratio, not %


power.TOST <- function(alpha=0.05, logscale=TRUE, ltheta1=0.8, ltheta2,
                       ldiff=0.95, CV, n, design="2x2", exact=TRUE)
{
    # design characteristics
    d.no <- .design.no(design)
    if (is.na(d.no)) stop("Err: unknown design!")
   
    dfe <- .design.df(d.no) # degrees of freedom as expression
    bk  <- .design.bk(d.no) # design const.
   
    if (logscale) {
        if (missing(ltheta2)) ltheta2 <- 1/ltheta1
        theta1 <- log(ltheta1)
        theta2 <- log(ltheta2)
        diffm  <- log(ldiff)
        se     <- sqrt(log(1.+CV^2))
    } else {
        if (missing(ltheta2)) ltheta2 <- -ltheta1
        theta1 <- ltheta1
        theta2 <- ltheta2
        diffm  <- ldiff
        se     <- CV
    }
   
    df <- eval(dfe)
    if ( !exact )
      pow <- .approx.power.TOST(alpha, theta1, theta2, diffm, se, n, df, bk)
    else
      pow <- .power.TOST(alpha, theta1, theta2, diffm, se, n, df, bk)
   
    return( pow )

}

Again: Use it on your own risk! The author does not take any responsibility for damage of your computer, your career or of the world :-D .

Regards,

Detlew
d_labes
★★★

Berlin, Germany,
2009-11-23 14:57
(4231 d 08:33 ago)

(edited by d_labes on 2009-11-23 15:48)
@ d_labes
Posting: # 4378
Views: 9,615
 

 Exact power and sample size, Part IV

Dear All!

Ok, ok, I will quit with this "Eierlegende Wollmilchsau".
# --------------------------------------------------------------------------
# Sample size for a desired power: see known.designs()
# for covered experimental designs
sampleN.TOST <- function(alpha=0.05, targetpower=0.8, logscale=TRUE,
                         ltheta1=0.8, ltheta2, ldiff=0.95, CV,
                         design="2x2", exact=TRUE, print=TRUE, details=FALSE)
{
  #number of the design
  d.no <- .design.no(design)
  if (is.na(d.no)) stop("Err: design not known")
  # design charcteristics
  d.name <- .design.name(d.no)
  # get the df for the design as an unevaluated expression
  dfe    <- .design.df(d.no,"Ns")
  # get stepsize for search
  steps  <- .design.step(d.no)
  # get design constant
  bk     <- .design.bk(d.no)

  if (missing(CV)) stop("Err: CV must be given!")

  # print the configuration:
  if (print) {
    cat("\n++++++++ Equivalence test - TOST ++++++++\n")
    cat("         Sample size estimatio\n\n")
    cat("-----------------------------------------\n")
    cat("Study design: ",d.name,"\n")
    if (details) {
      cat("Design characteristics:\n")
      cat("df = ",gsub("Ns","n",gsub(" ","",as.character(dfe))),
          ", design const = ",bk,", step = ",steps,"\n\n", sep="")
    }     
  }
  # handle the log transformation
  if (logscale) {
    if (missing(ltheta2)) ltheta2=1/ltheta1
    if ( (ldiff<=ltheta1) | (ldiff>=ltheta2) ) {
      cat("Err: ratio not between margins!\n")
      return(NaN)
    }
    theta1 <- log(ltheta1)
    theta2 <- log(ltheta2)
    diffm  <- log(ldiff)
    se     <- sqrt(log(1.+CV^2))
    if (print) cat("log-transformed data (multiplicative model)\n\n")
  } else {
    if (missing(ltheta2)) ltheta2=-ltheta1
    if ( (ldiff<=ltheta1) | (ldiff>=ltheta2) ) {
      cat("Err: diff not between margins!\n")
      return(NaN)
    }
    theta1 <- ltheta1
    theta2 <- ltheta2
    diffm  <- ldiff
    se     <- CV
    if (print) cat("untransformed data (additive model)\n\n")
  }


  if (print) {
    cat("alpha = ",alpha,", target power = ", targetpower,"\n", sep="")
    cat("BE margins        =",ltheta1,"...", ltheta2,"\n")
    if (logscale)
      cat("Null (true) ratio = ",ldiff,",  CV = ",CV,"\n", sep="")
    else
      cat("Null (true) diff. = ",ldiff,",  CV = ",CV,"\n", sep="")
  }

  #start value from large sample approx. (hidden func.)
  Ns  <- .sampleN0(alpha, targetpower, theta1, theta2, diffm, se, steps, bk)
  df  <- eval(dfe)
  if (exact) pow <- .power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
    else  pow <- .approx.power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
  if (details) {
    cat("\nSample size search\n")
    if (d.no == 0) cat("(n is sample size per group)\n")
    cat(" n     power\n")
    cat( Ns," ", formatC(pow, digits=6, format="f"),"\n")
  }
  # --- loop until power >= target power
  iter <- 0
  # iter>50 is emergency brake
  # this is eventually not necessary, depends on quality of sampleN0
  # in experimentation I have seen max of six steps

  if (pow<targetpower) {
    while (pow<targetpower) {
      Ns   <- Ns+steps
      iter <- iter+1
      df   <- eval(dfe)
      if (exact)
        pow  <- .power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
      else
        pow  <- .approx.power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
      if (details) cat( Ns," ", formatC(pow, digits=6, format="f"),"\n")
      if (iter>50) break
    }
  } else {
    while (pow>targetpower) {
      if (Ns<=4) break     # min. sample size
      Ns   <- Ns-steps     # step down if start power is to high
      iter <- iter+1
      df   <- eval(dfe)
      if (exact)
        pow  <- .power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
      else
        pow  <- .approx.power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
      if (details) cat( Ns," ", formatC(pow, digits=6),"\n")
      if (iter>50) break 
    }
    if ( pow<targetpower ) {
      Ns   <- Ns+steps     #step up once to have n with pow>=target
      df   <- eval(dfe)
      if (exact)
        pow  <- .power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
      else
        pow  <- .approx.power.TOST(alpha, theta1, theta2, diffm, se, n=Ns, df, bk)
    }
  }
  if (print && !details) {
    cat("\nSample size\n")
    if (d.no == 0) cat("(n is sample size per group)\n")
    cat(" n     power\n")
    cat( Ns," ", formatC(pow, digits=6, format="f"),"\n")
  }
  if (details && print) {
    if (exact) cat("\nExact power calculation with\nOwen's Q functions.\n")
      else cat("\nApproximate power calculation with\nnon-central t-distribution.\n")
  }
  # always print if approx.
  if (print && !exact)
    cat("\nApproximate power calculation with\nnon-central t-distribution.\n")
  if (print) cat("\n")

  res <- data.frame(design=design, alpha=alpha, CV=CV, theta0=ldiff,
                    theta1=ltheta1, theta2=ltheta2, n=Ns, power=pow,
                    targetpower=targetpower)
  names(res) <-c("Design","alpha","CV","theta0","theta1","theta2",
                 "Sample size", "Achieved power", "Target power")

  if (print)
    return(invisible(res))
  else
    return(res)

}

The function .approx.power.TOST() not shown here is the code using non-central t-approximation for the power.

The whole code is released under the copy left (aka GPL).
Eventually there is someone out there to tying together all this snippets into a package, for christmas :cool:.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2009-11-23 15:13
(4231 d 08:18 ago)

@ d_labes
Posting: # 4379
Views: 10,020
 

 Backslashes in R-code

Dear D. Labes,

wow, what a monster! In the meantime I wouldn't call you an [image] novice any more.
One note: The forum's scripts remove the backslash '\\' (both in the preview and at posting) due to security reasons [PHP-function stripslashes()]. You used them in a couple of lines (mainly in the cat()-statements to force a line-feed in the output). Can you please edit your post and insert double backslashes at the right places - only the first one will be removed, and users may copy-and-paste the code afterwards. THX.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2009-11-23 15:19
(4231 d 08:12 ago)

@ Helmut
Posting: # 4380
Views: 9,567
 

 Backslashes eater

Dear Helmut,

THX for the hint. Meanwhile I had noticed this. But for each click on Preview one backslash is eaten :crying:

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2009-11-23 15:28
(4231 d 08:02 ago)

@ d_labes
Posting: # 4381
Views: 9,712
 

 Backslashes eater

Dear D. Labes!

» […] But for each click on Preview one backslash is eaten :crying:

Yes, I know, sorry. It’s unavoidable, unless a code-wrapper like GeShi is used. Although it’s implemented in the current version 2.1.1 of ‘my little forum’, I opted to stay within the 1.7.6-branch due to performance reasons. We are working on a solution in version 1.7.7.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Activity
 Admin contact
21,540 posts in 4,502 threads, 1,525 registered users;
online 8 (0 registered, 8 guests [including 4 identified bots]).
Forum time: Friday 00:31 CEST (Europe/Vienna)

Statistics is the grammar of science.    Karl Pearson

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5