ElMaestro
★★★

Belgium?,
2020-07-10 13:19
(102 d 08:41 ago)

Posting: # 21669
Views: 11,403
 

 Semireplicated + REML in R [R for BE/BA]

Hi all,

I am writing a little optimizer based on REML in R for use with semi-replicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not.

Does anyone have a dataset with well-established REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated :-)

Many thanks.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-10 19:03
(102 d 02:57 ago)

@ ElMaestro
Posting: # 21670
Views: 10,661
 

 Semireplicated + REML in R

Hi ElMaestro,

» I am writing a little optimizer based on REML in R for use with semi-replicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not.

Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all.

» Does anyone have a dataset with well-established REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated :-)

Here you are (for DS II, of course). Results in Phoenix for the FDA’s covariance structure „Banded No-Diagonal Factor Analytic (f=2)” which is in SAS-lingo FA0(2). After 7 iterations (-2REML -48.2, AIC -26.2):
lambda(1,1) 0.19030501
lambda(1,2) 0.23981134
lambda(2,2) 0.074350441
s2wR        0.013246493
s2wT        0.0074570998

G Matrix*
0.03621600 0.04563730
0.04563730 0.06303747

As usual:
WARNING: Newton's algorithm converged with modified Hessian. Output is suspect.
Model may be over-specified. A simpler model could be tried.

s2wT is nonsense, since T is not replicated and the model is over-specified.
The lambdas are called in SAS-lingo FA(x,y). After 7 iterations (-2REML -30.7, AIC -20.7):
Cov Parm     Subject    Group    Estimate
FA(1,1)      Subj                  0.1903
FA(2,1)      Subj                  0.2398
FA(2,2)      Subj                  0.1072
Residual     Subj       trt R      0.01325
Residual     Subj       trt T      0.001497

    Estimated G Matrix
Row  trt    Col1      Col2
 1    R   0.03622   0.04564
 2    T   0.04564   0.06900

Also:
NOTE: Convergence criteria met but final hessian is not positive definite.
NOTE: Asymptotic variance matrix of covariance parameter estimates has been found to be
      singular and a generalized inverse was used. Covariance parameters with zero variance do
      not contribute to degrees of freedom computed by DDFM=SATTERTH.


No problems with convergence with f=1 (or FA0(1) in SAS). FA0(1) helps in general but not always.
lambda(1,1) 0.19030501
lambda(1,2) 0.23981134
s2wR        0.013246493
s2wT        0.012985088

Different s2wT but crap as well.

[image]You can get anything you like. The same data set throwing warnings in one software but not the other; same with convergence. In Certara’s forum a user posted a data set which failed to converge both in SAS and Phoenix, even with FA0(1). Then one is in deep shit.

Even if a data set converges without warnings, you might get different estimates (except the first one and s2wR) in SAS and Phoenix. Not surprising cause the optimizer walks around in parameter-space trying to find the maximum of a surface which is essentially flat. Increasing the number of iterations and/or decreasing the tolerance rarely helps.

I still wait for someone explaining why he/she is using such a wacky design. Beyond me.


  • The G (variance-covariance) matrix is obtained by
    \(\small{\begin{matrix}
    s_\textrm{bT}^2 & \textrm{cov(bT,bR)}\\
    \textrm{cov(bT,bR)} & s_\textrm{bR}^2
    \end{matrix}}\),
    where \(\small{s_\textrm{bT}^2}\) and \(\small{s_\textrm{bR}^2}\) are the between subject variances of T and R, respectively and \(\small{\textrm{cov(bT,bR)}}\) is the between subject covariance for T and R given by \(\small{s_\textrm{bT}\times s_\textrm{bR}}\). The subject-by-formulation interaction variance is given by \(\small{s_\textrm{bT}^2+s_\textrm{bR}^2-2\times\textrm{cov(bT,bR)}}.\)
    Whereas one might get a negative lambda(2,2) in PHX, in such a case SAS forces FA(2,2) to zero. With both approaches one might end up with a negative subject-by-formulation interaction variance…
    The same holds for the within subject variance of T (can be negative in PHX but is forced to zero in SAS).
    However, in partial replicate designs \(s_\textrm{bT}^2\) and \(s_\textrm{wT}^2\) cannot be unambiguously estimated, only their sum. Even then, PHX and SAS don’t agree (0.04367 vs. 0.03771).

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-10 19:24
(102 d 02:36 ago)

@ Helmut
Posting: # 21671
Views: 10,631
 

 Semireplicated + REML in R

Hi Hötzi,

» Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all.

Challenge accepted :-)

Thanks for the results :-):cool:

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-11 11:57
(101 d 10:03 ago)

@ ElMaestro
Posting: # 21672
Views: 10,143
 

 Avoid partial replicate designs, pleeeze!

Hi ElMaestro,

» » Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all.
»
» Challenge accepted :-)

Good luck! I added the variance-covariance matrices and the complete warnings above.

From this SASians’ document:
  • If you experience convergence problems, consider these points:
    • Try different TYPE= covariance structures that are supported by the data or expert knowledge.
    • Verify that the model is not misspecified or overspecified. A misspecified or overspecified model is one of the common causes of convergence issues. Always check your model specifications.
Further down about

NOTE: Convergence criteria met but final hessian is not positive definite.
The reasons for why this note occurs can vary. The most common reasons for this note relates to scaling issues or misspecified or overspecified models.
A nonpositive definite Hessian matrix can indicate a surface saddlepoint […]. If the MIXED procedure has converged to a saddlepoint, then the final solution given by PROC MIXED is not the optimal solution. To get around this problem, run the model again using different starting values. Try adding a PARMS statement to your PROC MIXED code. Either use the OLS option to specify ordinary least squares starting values (versus the default MIVQUE0 values), or specify your own starting values in the PARMS statement. The syntax for the PARMS statement with the OLS option is simple, as shown below:
   parms / ols;


In my understanding:
  • Trying to estimate within subject variances separate for both treatments in the common semi­repli­cate design (TRR|RTR|RRT) or – even worse – the extra-reference design (TRR|RTR) is futile due to the overspecified model. Hence, TYPE=FA0(2) and its relatives in other software is always wrong and should not be used.
  • The default starting values starting values PROC MIXED are obtained by MIVQUE0, which are “similar to ANOVA estimates” (whatever that means). Phoenix uses the Method of Moments though you can specify starting values as well (like in SAS).
  • Since there are data sets in the wild where nothing helped (neither in SAS nor in PHX) to achieve convergence, I can only recommend to avoid replicate designs at all.
    Otherwise, one may have performed a study with no means to evaluate it.

You can’t fix by analysis
what you bungled by design.
   Richard J. Light, Judith D. Singer, John B. Willett

100% of all disasters are failures of design, not analysis.   Ronald G. Marks

To propose that poor design
can be corrected by subtle analysis techniques
is contrary to good scientific thinking.
   Stuart J. Pocock

[…] an inappropriate study design is incapable of answering
a research question, no matter how careful the subsequent
methodology, conduct, analysis, and interpretation:
Flawless execution of a flawed design achieves nothing worthwhile.
   J. Rick Turner


Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-11 14:43
(101 d 07:17 ago)

@ Helmut
Posting: # 21673
Views: 10,103
 

 Avoid partial replicate designs, pleeeze!

Hi again,

it is rather experimental.
I do not have a level of understanding that allows me to use V=ZGZt+R,
so I will be constructing V directly from the data listing without the intermediary Z and G. I do not wish to get bogged down by the various aspects of covariance matrices; if CSH does not pan out well in SAS then it is an implenetation or convergence issue only, it is not because CSH is irrelevant. CSH is for all practical purposes the covariance matrix that makes sense, which means we have:
A variance for T (combined within and between)
A variance for within-R
A variance for between-R


Hessian, positive definite, etc.. all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with optim in R (BFGS, L-BFGS-B bounded, and Nelder-Mead) and with one of my own making.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-11 15:38
(101 d 06:22 ago)

@ ElMaestro
Posting: # 21674
Views: 10,095
 

 Braveheart!

Hi ElMaestro,

» it is rather experimental.

:-D

» […] which means we have:
» A variance for T (combined within and between)
» A variance for within-R
» A variance for between-R

Correct. Will be interesting what you get because \(\sigma_{wT}^2+\sigma_{bT}^2\):
0.04367 (Phoenix) and 0.03771 (SAS).

» Hessian, positive definite, etc.. [image]all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with optim in R (BFGS, L-BFGS-B bounded, and Nelder-Mead) and with one of my own making.

If you succeed, the community will love you. Might win you the Fields medal.

As long as a genius (you?) is not able to come up with a solution working in all cases, I can only repeat:

Avoid partial replicate designs!

It might not be possible to evaluate the study for ABE (though RSABE works in any case).

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-12 12:28
(100 d 09:32 ago)

@ Helmut
Posting: # 21675
Views: 10,024
 

 Who can help?

Hi all,

some help needed.:-D

Specifically, I am asking for help to make code work and not to make it smarter (achieving the same with less key strokes, using built-in alternatives etc) although there is generally a preferred focus on the latter here :-)

Hopefully we will discuss later the need for additional or other variance components once we have made the optimiser work somehow (or while we are making it work, if such vc's are totally necessary). There are probably a ton of errors to iron out. There is at least one big one whose effects I a suffering :-D

The idea is to construct the model from the bottoms up.
  1. First I do a full-rank design matrix X for the fixed effects, without subjects
  2. Next I do the covariance matrix with just three variance components as discussed above.
  3. I generate some cheap starter guesses for the variance components
  4. I define an objective function for "a constant + log likelihood"
  5. I optimise the variance components, in REML the effect estimates will come along with them.
I am so far facing a singularity with the apparent log likelihood, so obviously something is terribly wrong. I include a ref to the function where I got inspiration for the log likelihood objective function.
Another site for the objective function is: https://www.stt.msu.edu/users/pszhong/Lecture_16_Spring_2017.pdf (page 13+14), but this also did not quite get me to convergence.


So, this is what I have so far. I don't understand matrix algebra totally well, and I am at a loss of you say Hermitian, expectation-minimisation, Jacobian, etc.

What bothers me is that some years ago I wrote a working optimiser in C on basis of slides some some Danish guy called Waagepetersen, who walked the reader slowly through the process of optimising REML. I can't find my code now, it was on another computer, and I can't find his slides any more (there are some slides with his name out there, which I can find with google etc but they do not have that old recipe).






rm(list=ls(all=TRUE))

###### par vector is c(varT, varBR, varWR)

######
######   Section 1: Household things
######



CreateX=function(D)
{
 ##first the two treatments
 TrtT=as.numeric(as.character(D$Trt)=="T")
 TrtR=as.numeric(as.character(D$Trt)=="R")
 X=cbind(TrtT, TrtR)
 R= qr(X)$rank

 ## subjects: in a mixed model with subj as random
 ## we do not do subjects also as fixed, therefore they are #'ed away here
 ## for (s in unique(D$Subj))
 ## {
 ## v=as.numeric(D$Subj==s)
 ## #print(v)
 ## XX=data.frame(X, v)
 ## names(XX)[ncol(XX)] = paste("S", s, sep="")
 ## rnk=qr(XX)$rank
 ## if (rnk>R)
  ##  {
  ##     X=XX
  ##     R=rnk
  ##  }
  ## }
 ##now the Pers
 for (p in unique(D$Per))
 {
  v=as.numeric(D$Per==p)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("P", p, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }

 for (q in unique(D$Seq))
 {
  v=as.numeric(D$Seq==q)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("Q", q, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }
 return(as.matrix(X))
}




Create.CovM=function(Params)
##block diagonal covariance matrix
{
  varT=Params[1]
  varBR=Params[2]
  varWR=Params[3]
  #varRT=Params[4]

  #cat("Vars:", varT, varBR, varWR,"\n")

  Nobs=length(D$Y)
  V=matrix(0,ncol=Nobs, nrow=Nobs)
  for (iRow in 1:Nobs)
  for (iCol in 1:Nobs)
  {
   
   ## the diagonal
   if (iCol==iRow)
    {
      if (D$Trt[iRow]=="T") V[iRow,iCol]=V[iRow,iCol]+varT
      if (D$Trt[iRow]=="R") V[iRow,iCol]=V[iRow,iCol]+varWR
    }

   ## off diagonal
   if (iCol!=iRow)
   if (D$Subj[iRow]==D$Subj[iCol])
    {
     if (D$Trt[iCol]==D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varBR
     #if (D$Trt[iCol]!=D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varRT
    }
  }
  return(as.matrix(V))
}

######
######   Section 2: Matrix things
######


Obj.F12=function(Pars)
##this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
{
  CovM=Create.CovM(Pars)

  A= -0.5*log(det(CovM))
  B= -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(A+B+C)
}



Some.Initial.Guesses=function(foo)
{
  #guess variance within R
  D1=subset(D, D$Trt=="R")
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per) , data=D1)
  #print(anova(m))
  varWR= anova(m)["Residuals", "Mean Sq"]
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per)+factor(Trt) , data=D)
  varBR= anova(m)["factor(Subj)", "Mean Sq"]
  ##total var T, cheaply
  ##actually, this may be much better: varT=varBR+varWR
  D2=subset(D, D$Trt=="T")
  varT=var(log(D2$Y))
  varT=varBR+varWR
  L=list(varT=varT, varBR=varBR, varWR=varWR)
  return(L)
}



####
#### section 3: execution
####



D=read.csv("EMAII.a.csv" , header=T, sep="\t") ##public = easier
X=CreateX(D) ##public = easier
y=log(D$Y) ##public = easier


Check.Surf=function(i, x1,x2, N)
## Checks the log likelihood surface of the i'th variance component
## (given the other variance components) for the interval var = x1 to x2
## with a granularity of N points.
{
  L=Some.Initial.Guesses(0)
  parIni=c(L$varT, L$varBR, L$varWR )
  x=x1;
  vX=NULL
  vY=NULL
  dx=(x2-x1)/N
  for (j in 1:N)
  {
    vX=c(vX, x)
    parIni[i]=x
    vY=c(vY, Obj.F12(parIni))
    x=x+dx
  }
  plot(vX, vY)
}

MyREML=function(foo.bar)
## ideally this would work, but it doesn't right now :-(
## also, I have a feeling the surface may be quite rugged,
## so I may switch to a homecooked optimiser
## once I have ironed out the singularity issue, see below.
{
  L=Some.Initial.Guesses(0)
  print(L)
  parIni=c(L$varT, L$varBR, L$varWR)
  F=optim(par=parIni, fn=Obj.F12, #method="L-BFGS-B",lower=c(0,0,0),
      control=list(reltol=1e-10, trace=T))
  print(F)
}
# if ti worked it'd be called through
# MyREML(0)

# however:

Check.Surf(1, 0.01, 0.2, 50)
  ## ouch, that's a singularity.
  ## so something is wrong in the covariance matrix
  ## or in the "const + likelihood function"                     
  ## or elsewhere. Any ideas?

#
#
#

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-07-12 13:26
(100 d 08:34 ago)

(edited by ElMaestro on 2020-07-12 13:39)
@ ElMaestro
Posting: # 21676
Views: 9,996
 

 Who can help?

Important update, I forgot vBR on the diagonal. Now the singuality is less of a heacahe:

rm(list=ls(all=TRUE))

###### par vector is c(varT, varBR, varWR)

######
######   Section 1: Household things
######



CreateX=function(D)
{
 ##first the two treatments
 TrtT=as.numeric(as.character(D$Trt)=="T")
 TrtR=as.numeric(as.character(D$Trt)=="R")
 X=cbind(TrtT, TrtR)
 R= qr(X)$rank

 ## subjects: in a mixed model with subj as random
 ## we do not do subjects also as fixed, therefore they are #'ed away here
 ## for (s in unique(D$Subj))
 ## {
 ## v=as.numeric(D$Subj==s)
 ## #print(v)
 ## XX=data.frame(X, v)
 ## names(XX)[ncol(XX)] = paste("S", s, sep="")
 ## rnk=qr(XX)$rank
 ## if (rnk>R)
  ##  {
  ##     X=XX
  ##     R=rnk
  ##  }
  ## }
 ##now the Pers
 for (p in unique(D$Per))
 {
  v=as.numeric(D$Per==p)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("P", p, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }

 for (q in unique(D$Seq))
 {
  v=as.numeric(D$Seq==q)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("Q", q, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }
 return(as.matrix(X))
}




Create.CovM=function(Params)
##block diagonal covariance matrix
{
  varT=Params[1]
  varBR=Params[2]
  varWR=Params[3]
  #varRT=Params[4]

  #cat("Vars:", varT, varBR, varWR,"\n")

  Nobs=length(D$Y)
  V=matrix(0,ncol=Nobs, nrow=Nobs)
  for (iRow in 1:Nobs)
  for (iCol in 1:Nobs)
  {
   
   ## the diagonal
   if (iCol==iRow)
    {
      if (D$Trt[iRow]=="T") V[iRow,iCol]=V[iRow,iCol]+varT
      if (D$Trt[iRow]=="R") V[iRow,iCol]=V[iRow,iCol]+varWR +varBR
    }

   ## off diagonal
   if (iCol!=iRow)
   if (D$Subj[iRow]==D$Subj[iCol])
    {
     if (D$Trt[iCol]==D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varBR
     #if (D$Trt[iCol]!=D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varRT
    }
  }
  return(as.matrix(V))
}

######
######   Section 2: Matrix things
######

Obj.F12=function(Pars)
##this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
{
  CovM=Create.CovM(Pars)

  A= -0.5*log(det(CovM))
  B= -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(A+B+C)
}



Some.Initial.Guesses=function(foo)
{
  #guess variance within R
  D1=subset(D, D$Trt=="R")
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per) , data=D1)
  #print(anova(m))
  varWR= anova(m)["Residuals", "Mean Sq"]
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per)+factor(Trt) , data=D)
  varBR= anova(m)["factor(Subj)", "Mean Sq"]
  ##total var T, cheaply
  ##actually, this may be much better: varT=varBR+varWR
  D2=subset(D, D$Trt=="T")
  varT=var(log(D2$Y))
  varT=varBR+varWR
  L=list(varT=varT, varBR=varBR, varWR=varWR)
  return(L)
}



####
#### section 3: execution
####



D=read.csv("EMAII.a.csv" , header=T, sep="\t") ##public = easier
X=CreateX(D) ##public = easier
y=log(D$Y) ##public = easier


Check.Surf=function(i, x1,x2, N)
## Checks the log likelihood surface of the i'th variance component
## (given the other variance components) for the interval var = x1 to x2
## with a granularity of N points.
{
  L=Some.Initial.Guesses(0)
  parIni=c(L$varT, L$varBR, L$varWR )
  x=x1;
  vX=NULL
  vY=NULL
  dx=(x2-x1)/N
  for (j in 1:N)
  {
    vX=c(vX, x)
    parIni[i]=x
    vY=c(vY, Obj.F12(parIni))
    x=x+dx
  }
  plot(vX, vY)
}

MyREML=function(foo.bar)
## ideally this would work, but it doesn't right now :-(
## also, I have a feeling the surface may be quite rugged,
## so I may switch to a homecooked optimiser
## once I have ironed out the singularity issue, see below.
{
  L=Some.Initial.Guesses(0)
  print(L)
  parIni=c(L$varT, L$varBR, L$varWR)
  F=optim(par=parIni, fn=Obj.F12, #method="L-BFGS-B",lower=c(0,0,0),
      control=list(reltol=1e-10, trace=T))
  print(F)
}
# if ti worked it'd be called through
 MyREML(0)

# however:

Check.Surf(3, 0.01, 0.2, 50) ##checks the surface in one dimension


#
#
#



I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-07-12 21:36
(100 d 00:24 ago)

@ ElMaestro
Posting: # 21677
Views: 9,847
 

 Update II

After re-re-re-re-reading the documentation for optim, I think it can be forced to look for either a minimum (default) or for a maximum if need be through the control argument or, more simply , the objective function should be handled for a minimum by putting a minus sign on the return of Obj.F12...
... but even so the estimates are still somewhat off. :-D

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-07-12 21:46
(100 d 00:14 ago)

@ ElMaestro
Posting: # 21678
Views: 9,824
 

 Update III

Look at this baby:

rm(list=ls(all=TRUE))

###### par vector is c(varT, varBR, varWR)

######
######   Section 1: Household things
######



CreateX=function(D)
{
 ##first the two treatments
 TrtT=as.numeric(as.character(D$Trt)=="T")
 TrtR=as.numeric(as.character(D$Trt)=="R")
 X=cbind(TrtT, TrtR)
 R= qr(X)$rank

 ## subjects: in a mixed model with subj as random
 ## we do not do subjects also as fixed, therefore they are #'ed away here
 ## for (s in unique(D$Subj))
 ## {
 ## v=as.numeric(D$Subj==s)
 ## #print(v)
 ## XX=data.frame(X, v)
 ## names(XX)[ncol(XX)] = paste("S", s, sep="")
 ## rnk=qr(XX)$rank
 ## if (rnk>R)
  ##  {
  ##     X=XX
  ##     R=rnk
  ##  }
  ## }
 ##now the Pers
 for (p in unique(D$Per))
 {
  v=as.numeric(D$Per==p)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("P", p, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }

 for (q in unique(D$Seq))
 {
  v=as.numeric(D$Seq==q)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("Q", q, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }
 return(as.matrix(X))
}




Create.CovM=function(Params)
##block diagonal covariance matrix
{
  varT=Params[1]
  varBR=Params[2]
  varWR=Params[3]
  #varRT=Params[4]
  #cat("Vars:", varT, varBR, varWR,"\n")

  Nobs=length(D$Y)
  V=matrix(0,ncol=Nobs, nrow=Nobs)
  for (iRow in 1:Nobs)
  for (iCol in 1:Nobs)
  {
   
   ## the diagonal
   if (iCol==iRow)
    {
      if (D$Trt[iRow]=="T") V[iRow,iCol]=V[iRow,iCol]+varT
      if (D$Trt[iRow]=="R") V[iRow,iCol]=V[iRow,iCol]+varWR +varBR
    }

   ## off diagonal
   if (iCol!=iRow)
   if (D$Subj[iRow]==D$Subj[iCol])
    {
     if (D$Trt[iCol]==D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varBR
     #if (D$Trt[iCol]!=D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varRT
    }
  }
  return(as.matrix(V))
}

######
######   Section 2: Matrix things
######


Obj.F12=function(Pars)
##this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
{
  CovM=Create.CovM(Pars)

  A= -0.5*log(det(CovM))
  B= -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(A+B+C)
}



Some.Initial.Guesses=function(foo)
{
  #guess variance within R
  D1=subset(D, D$Trt=="R")
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per) , data=D1)
  #print(anova(m))
  varWR= anova(m)["Residuals", "Mean Sq"]
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per)+factor(Trt) , data=D)
  varBR= anova(m)["factor(Subj)", "Mean Sq"]
  ##total var T, cheaply
  ##actually, this may be much better: varT=varBR+varWR
  D2=subset(D, D$Trt=="T")
  varT=var(log(D2$Y))
  varT=varBR+varWR
  L=list(varT=varT, varBR=varBR, varWR=varWR)
  return(L)
}



####
#### section 3: execution
####



D=read.csv("EMAII.a.csv" , header=T, sep="\t") ##public = easier
X=CreateX(D) ##public = easier
y=log(D$Y) ##public = easier


MyREML=function(foo.bar)
{
  L=Some.Initial.Guesses(0)
  print(L)
  parIni=c(0.06, 0.032, 0.012)
  F=optim(par=parIni, fn=Obj.F12,
      #method="BFGS",
      #lower=c(0.003,0.003,0.003), upper=c(1,1,1),
      control=list(reltol=1e-11, trace=T, fnscale=-1))
  print(F)
}
MyREML(0)



On my machine it gives:
$par
[1] 0.06776119 0.03636979 0.01237020


:-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-07-12 22:27
(99 d 23:33 ago)

(edited by ElMaestro on 2020-07-12 23:51)
@ ElMaestro
Posting: # 21679
Views: 9,709
 

 Final update today


rm(list=ls(all=TRUE))
 

######
######   Section 1: Household things
######



CreateX=function(D)
{
 ##first the two treatments
 TrtT=as.numeric(as.character(D$Trt)=="T")
 TrtR=as.numeric(as.character(D$Trt)=="R")
 X=cbind(TrtT, TrtR)
 R= qr(X)$rank

 ## subjects: in a mixed model with subj as random
 ## we do not do subjects also as fixed, therefore they are #'ed away here
 ## for (s in unique(D$Subj))
 ## {
 ## v=as.numeric(D$Subj==s)
 ## #print(v)
 ## XX=data.frame(X, v)
 ## names(XX)[ncol(XX)] = paste("S", s, sep="")
 ## rnk=qr(XX)$rank
 ## if (rnk>R)
  ##  {
  ##     X=XX
  ##     R=rnk
  ##  }
  ## }
 ##now the Pers
 for (p in unique(D$Per))
 {
  v=as.numeric(D$Per==p)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("P", p, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }

 for (q in unique(D$Seq))
 {
  v=as.numeric(D$Seq==q)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("Q", q, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }
 return(as.matrix(X))
}




Create.CovM=function(Params)
##block diagonal covariance matrix
{
  varT=Params[1]
  varBR=Params[2]
  varWR=Params[3]
  varRT=Params[4]
  #cat("Vars:", varT, varBR, varWR,"\n")

  Nobs=length(D$Y)
  V=matrix(0,ncol=Nobs, nrow=Nobs)
  for (iRow in 1:Nobs)
  for (iCol in 1:Nobs)
  {
   
   ## the diagonal
   if (iCol==iRow)
    {
      if (D$Trt[iRow]=="T") V[iRow,iCol]=V[iRow,iCol]+varT
      if (D$Trt[iRow]=="R") V[iRow,iCol]=V[iRow,iCol]+varWR +varBR
    }

   ## off diagonal
   if (iCol!=iRow)
   if (D$Subj[iRow]==D$Subj[iCol])
    {
     if (D$Trt[iCol]==D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varBR
     if (D$Trt[iCol]!=D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varRT
    }
  }
  return(as.matrix(V))
}

######
######   Section 2: Matrix things
######

Obj.F12=function(Pars)
##this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
{
  CovM=Create.CovM(Pars)

  A= -0.5*log(det(CovM))
  B= -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(A+B+C)
}






####
#### section 3: execution
####

Some.Initial.Guesses=function(foo)
{
 #guess varT
 D1=subset(D, (D$Trt=="T"))
 m=lm(log(Y)~factor(Seq)+factor(Per)   , data=D1)
 varT=anova(m)["Residuals", "Mean Sq"]  #guess VarWR
 D1=subset(D, (D$Trt=="R"))
 m=lm(log(Y)~factor(Seq)+factor(Per)+factor(Subj)  , data=D1)
 varWR=anova(m)["Residuals", "Mean Sq"]  #guess varBR
 S=unique(D$Subj)
 v=NULL
 for (s in S)
 {
  Dx=subset(D, (D$Subj==s) & (D$Trt=="R"))
  v=c(v, mean(log(Dx$Y)))
 }
 varBR=var(v)-varWR
 varRT=0.5*(varT+varBR)

 rslt=c(varT, varBR, varWR, varRT)
 return(rslt)
}




D=read.csv("EMAII.a.csv" , header=T, sep="\t") ##public = easier
X=CreateX(D) ##public = easier
y=log(D$Y) ##public = easier


MyREML=function(foo.bar)
 
{
  L=Some.Initial.Guesses(0)
  print(L)
  parIni=L
  F=optim(par=parIni, fn=Obj.F12,
      control=list(reltol=1e-9, trace=T, fnscale=-1))
  Nms=c("varT","varBR", "varWR", "varRT")
  X=data.frame(Nms, F$par, parIni)
  names(X)=c("Var.Component", "Value", "Ini.value")
  return(X)
}
 
MyREML(0)



which gives
  Var.Component      Value  Ini.value
1          varT 0.07050210 0.06841615
2         varBR 0.03621403 0.02776728
3         varWR 0.01324510 0.01240137
4         varRT 0.04563784 0.04809171


By the way: You can adjust how many decimals you want on e.g. varWR (~s2WR) through the reltol setting. Use e.g. 1e-12 to get 0.0.01324652 with just a few more iterations.



Job done?

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-13 13:52
(99 d 08:08 ago)

@ ElMaestro
Posting: # 21684
Views: 9,549
 

 Medium rare.

Hi ElMaestro,

» Job done?

I prefer medium rare over well done (aka quick and dirty or clean and never)!
Below my more R-ish code. Changes:
  • Reads CSV files in any format (from the web or local system, arbitrary column separator [default comma] and decimal separator [default period]).
  • Vectorizes the data set.
  • Checks the design and assigns a variable 'des' ("partial" or "full") for further use. In principle all full and partial replicates are supported.
  • Get rid of missing geometric means.
  • In Obj.F12() returns a double instead of a 1:1 matrix. Useful if you decide to switch the optimizer (some require a scalar).
  • Direct access of D instead of subset(). Subset is convenient in development but leads to endless troubles when building a package.
  • optim(control = list(reltol = 1e-12, trace = 0,...)
  • Information about optimization.
  • G-matrix.
TODO:
  • I wonder why this code works at all. ;-)
    In Obj.F12() the line
    B <- -0.5*log(det(t(X) %*% solve(CovM) %*% X))
    is the reason for the warnings because t(X) is not a square matrix. Therefore B is NaN.
  • Adapt for full replicates (varWT already estimated).
  • Adapt for the conventional TR|RT – mixed model also required for the FDA? Currently stops execution for such a data set.
  • Most important: Get the PE (from package lmerTest together with Satterthwaite’s DFs) and calculate the CI.
  • Implement the FDA’s RSABE (should be easy).


###############################
# Section 1: Household things #
###############################

GetData <- function(path, sep, dec) {
  # read data from a CSV-file
  if (missing(sep)) sep <- ","
  if (!sep %in% c(",", ";", "t"))
    stop("Column separator must be any of ',', ';', 't'.")
  if (sep == "t") sep <- "\t"
  if (missing(dec)) dec <- "."
  if (!dec %in% c(".", ","))
    stop("Decimal separator must be '.', or ','.")
  # strip quotes and skip eventual commentary lines
  D        <- read.csv(path, sep = sep, dec = dec,
                       quote = "", comment.char = "#")
  names(D) <- c("Subj", "Per", "Seq", "Trt", "Y")
  cols     <- c("Subj", "Per", "Seq", "Trt")
  D[cols]  <- lapply(D[cols], factor)
  return(invisible(D))
}

CreateX <- function(D) {
  # first the two treatments
  TrtT <- as.numeric(as.character(D$Trt) == "T")
  TrtR <- as.numeric(as.character(D$Trt) == "R")
  X <- cbind(TrtT, TrtR)
  R <- qr(X)$rank
  # now the periods
  for (p in unique(D$Per)) {
    v  <- as.numeric(D$Per == p)
    XX <- data.frame(X, v)
    names(XX)[ncol(XX)] <- paste0("P", p)
    rnk <- qr(XX)$rank
    if (rnk > R) {
      X <- XX
      R <- rnk
    }
  }
  for (q in unique(D$Seq)) {
    v  <- as.numeric(D$Seq == q)
    XX <- data.frame(X, v)
    names(XX)[ncol(XX)] <- paste0("Q", q)
    rnk <- qr(XX)$rank
    if (rnk > R) {
      X <- XX
      R <- rnk
    }
  }
  return(as.matrix(X))
}
Create.CovM <- function(Params) {
  # block diagonal covariance matrix
  varT  <- Params[1]
  varBR <- Params[2]
  varWR <- Params[3]
  varRT <- Params[4]
  Nobs  <- length(D$Y)
  V     <- matrix(0, ncol = Nobs, nrow = Nobs)
  for (iRow in 1:Nobs) {
    for (iCol in 1:Nobs) {
      if (iCol == iRow) { # diagonal elements
        if (D$Trt[iRow] == "T") {
          V[iRow, iCol] <- V[iRow, iCol] + varT
        } else {
          V[iRow, iCol] <- V[iRow, iCol] + varWR + varBR
        }
      } else {            # off diagonal elements
        if (D$Subj[iRow] == D$Subj[iCol]) {
          if (D$Trt[iCol] == D$Trt[iRow]) {
            V[iRow, iCol] <- V[iRow, iCol] + varBR
          } else {
            V[iRow, iCol] <- V[iRow, iCol] + varRT
          }
        }
      }
    }
  }
  return(V)
}

############################
# Section 2: Matrix things #
############################
Obj.F12 <- function(Pars) {
  # this is line 3 of page 10 of:
  # http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
  CovM  <- Create.CovM(Pars)
  A     <- -0.5*log(det(CovM))
  # TODO: NaNs produced in the next line cause t(X) is not a square matrix
  B     <- -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b <- solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp   <- y - X %*% est.b
  C     <- -0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  C     <- as.numeric(C) # make a scalar for other solvers
  return(A+B+C)
}

########################
# Section 3: Execution #
########################
Some.Initial.Guesses <- function() {
  # design (all possible ones, error handling, full or partial)
  subjs <- unique(as.numeric(D$Subj))
  trts  <- sort(unique(as.character(D$Trt)), decreasing = TRUE)
  if (!length(trts) == 2) stop("Only two treatments supported.")
  if (sum(!trts %in% c("T", "R")) !=0)
    stop("treatments must be coded as 'T' and 'R'.")
  seqs  <- sort(unique(as.character(D$Seq)), decreasing = TRUE) # T first
  type  <- paste(seqs, collapse = "|") # identifier
  nseq  <- length(unique(as.character(D$Seq)))
  nper  <- length(unique(as.numeric(D$Per)))
  nsubj <- length(subjs)
  if (nchar(type) == 19) {  # 4-period 4-sequence full replicate designs
    if (nper != 4) stop("4 periods required in this full replicate design.")
    if (nseq != 4) stop("4 sequences required in this full replicate design.")
    des <- "full"
  }
  if (nchar(type) == 9) {  # 4-period full replicate designs
    if (nper != 4) stop("4 periods required in this full replicate design.")
    if (nseq != 2) stop("2 sequences required in this full replicate design.")
    des <- "full"
  }
  if (nchar(type) == 7) {  # 3-period replicates
    if (type %in% c("TRT|RTR", "TRR|RTT")) {
      if (nper != 3) stop("3 periods required in this full replicate design.")
      if (nseq != 2) stop("2 sequences required in this full replicate design.")
      des <- "full"
    }
    if (type == "TRR|RTR") {
      if (nper != 3) stop("3 periods required in the extra-reference design.")
      if (nseq != 2) stop("2 sequences required in the extra-reference design.")
      des <- "partial"
    }
  }
  if (nchar(type) == 11) { # Balaam's design or 3-sequence partial
    if (!type == "TR|RT|TT|RR") { # Balamm's
      if (nper != 3) stop("3 periods required in this partial replicate design.")
      if (nseq != 3) stop("3 sequences required in this partial replicate design.")
      des <- "partial"
    } else {                      # TRT|RTR|RRT
      if (nper != 2) stop("2 periods required in Balaam's design.")
      if (nseq != 4) stop("4 sequences required in Balaam's design.")
      des <- "full"
    }
  }
  if (type == "TR|RT") stop("TR|RT design not supported yet.")
  gm.T <- gm.R <- numeric(nsubj) # vector of geometric means
  for (subj in seq_along(subjs)) {
    gm.T[subj] <- mean(log(D$Y[which(D$Subj == subj & D$Trt == "T")]))
    gm.R[subj] <- mean(log(D$Y[which(D$Subj == subj & D$Trt == "R")]))
  }
  # get rid of missings
  gm.T <- gm.T[!is.na(gm.T)]  
  gm.R <- gm.R[!is.na(gm.R)]  
  if (des == "partial") { # guess varT (between + within)
    m    <- lm(log(Y) ~ Seq+Per, data = D[D$Trt == "T", ])
    varT <- anova(m)["Residuals", "Mean Sq"]
  } else {                # guess VarWT
    m     <- lm(log(Y) ~ Seq+Per+Subj, data = D[D$Trt == "T", ])
    varWT <- anova(m)["Residuals", "Mean Sq"]
  }
  # guess VarWR
  m     <- lm(log(Y) ~ Seq+Per+Subj, data = D[D$Trt == "R", ])
  varWR <- anova(m)["Residuals", "Mean Sq"]   # guess varBR
  varBR <- var(gm.R)-varWR
  # TODO: full replicates
  varRT <- 0.5*(varT+varBR)
  rslt  <- c(varT, varBR, varWR, varRT)
  return(rslt)
}
MyREML <- function() {
  Ini <- Some.Initial.Guesses()
  F   <- optim(par = Ini, fn = Obj.F12,
               control = list(reltol = 1e-12, trace = 0, fnscale = -1))
  Nms <- c("var_T", "var_bR", "var_wR", "covar_bTbR")
  X   <- data.frame(Nms, F$par, Ini)
  names(X) <- c("Component", "Estimate", "Initial")
  cat("Maximum         :", F$value,
    "\nEvaluations     :", F$counts[["function"]],
    "\nConvergence code:", F$convergence, "\n")
  return(X)
}

# Data has to have five columns in this order:
#   Subject (numeric), Period (numeric), Sequence (character),
#   Treatment (character), Response (numeric)
# Column separator 'sep' comma (default), semicolon, or t (tab)
# Decimal separator 'dec' period (default) or comma
# path <- "https://bebac.at/downloads/ds01.csv" # EMA full replicate
path <- "https://bebac.at/downloads/ds02.csv" # EMA partial replicate
D   <- NULL # nullify eventual previous one
D   <- GetData(path)
X   <- CreateX(D)
y   <- log(D$Y)
# the workhorse
res <- MyREML() # patience!
G.matrix <- data.frame(c(NA, res$Estimate[4]),
                       c(res$Estimate[4], res$Estimate[2]))
names(G.matrix) <- 1:2
print(res, row.names = FALSE); cat("G matrix:\n"); print(G.matrix)


Gives:
Maximum         : 75.98722
Evaluations     : 197
Convergence code: 0
Warning message:
In log(det(t(X) %*% solve(CovM) %*% X)) : NaNs produced

  Component   Estimate    Initial
      var_T 0.07049482 0.06841615
     var_bR 0.03621594 0.02776728
     var_wR 0.01324652 0.01240137
 covar_bTbR 0.04563735 0.04809171
G matrix:
           1          2
1         NA 0.04563735
2 0.04563735 0.03621594

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-13 14:18
(99 d 07:42 ago)

(edited by ElMaestro on 2020-07-13 14:31)
@ Helmut
Posting: # 21686
Views: 9,513
 

 took just 52 hrs to do it :-)

Hi Hötzi,


I think expressing the G matrix after this fit is a little backward;
The whole point is not to decompose V into ZGZt+R, because this design makes that decomposition pointless even when the design is otherwise relevant.
That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R - they really aren't when the model is constructed my way.

The 2-trt, 3-seq, 3-per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution.

Note also: The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged.

If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. :-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-13 15:34
(99 d 06:26 ago)

@ ElMaestro
Posting: # 21688
Views: 9,482
 

 Will take much more hours still…

Hi ElMaestro,

» I think expressing the G matrix after this fit is a little backward;…

OK, true.

» That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R - they really aren't when the model is constructed my way.

Yep.

» The 2-trt, 3-seq, 3-per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution.

Kudos to your work!
I still see no reason to perform a study in one of the partial replicates. Sorry.
What is given as ‘justifications’:
  • “We want to limit the blood volume and decrease the chance of dropouts. Hence, we prefer 3 over 4 periods.”
    Understandable. However, 3-period full replicates (TRT|RTR or TRR|RTT) require similar sample sizes than the partial replicates (TRR|RTR|RRT or TRR|RTR) but provide additional information about CVwT. If this is a pilot study one gets an incentive for the sample size of the pivotal if CVwT < CVwR (which is often the case). In a partial replicates one has to assume CVwT = CVwR.
    Furthermore, one may consider ABEL for AUC (WHO’s rules) where the vari­ability associated with each product has to be compared.
  • “This design is recommended by the FDA and the EMA.”
    False. The fact that the FDA gives only SAS-code for the partial replicate does not imply that it is the only acceptable one. There are a lot of different designs in the public domain which were accepted by the FDA. The FDA stated already in 2001: “… the two-sequence, three-period design TRR|RTT is thought to be optimal among three-period replicated crossover designs.”
    In the EMA’s Q&A the partial replicate is given only as an example. Rev.12 (June 2015) made clear that TRT|RTR is acceptable as well. The only condition is that at least 12 eligible subjects are in sequence RTR. That’s not relevant in practice unless one faced dropout rates of >40% (see there).
  • If one has problems with the evaluation and/or want to perform simulations, heterogenicity is a pain in the back. Much easier in full replicates. That’s the reason why the two Lászlós in their early papers simulated TRT|RTR.
Nobody published reference datasets for the mixed-model so far. ;-)
Hence, everybody has to believe in the software. As a starter I would expect the same level of error handling, documentation, :blahblah: like the package replicateBE. Otherwise, assessors would be skeptical.

» The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged.

In Section 2:
  Final <- function(Pars) {
    CovM <- Create.CovM(Pars)
    tmp  <- solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
    PE   <- exp(as.numeric(tmp["TrtT", ] - tmp["TrtR", ]))
    return(PE)
  }

At the end:
  PE <- Final(res$Estimate)
  cat("PE:", PE, "\n")

Gives:
  PE: 1.372138
In PHX 1.372138 (Patterson/Jones got in SAS 1.37). Great!
TODO: Get the Satterthwaite’s DFs for the CI.

» If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. :-)

See this post with links to John’s datasets. Maybe there are others.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-14 03:22
(98 d 18:38 ago)

(edited by ElMaestro on 2020-07-14 12:43)
@ Helmut
Posting: # 21691
Views: 9,396
 

 Negative determinant

Hi Hötzi,

» In Obj.F12() the line
» B <- -0.5*log(det(t(X) %*% solve(CovM) %*% X))
» is the reason for the warnings because t(X) is not a square matrix. Therefore B is NaN.

I looked at this one by testing is.nan(B) and printing the components if/when it is. It happens right at the start for this data.

The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation.

All is in fact hunky dory when it happens, the variance estimates being:
[1] 0.06841615 0.03576077 0.01240137 0.05893007 (on my machine)
- that's nothing special, just a wee change from initial.

The issue here is that the determinant suddenly becomes apparently negative. This is mysterious to me at this point, I am not much into matrices. I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards.

However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error - you can try it with the vector above.
Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful.

I believe the issue can not be remedied by fiddling with X.

I will be happy to hear a hardcore matrix algebrist's opinion here.

Update 8 hours later:
I think CovM becomes noninvertible at select levels of a variance component, given the others, when/if a certain value of the variance component means that one or more columns (rows) add up to one or more other columns. At exactly that point CovM will drop in its rank, and it becomes non-invertible; beyond the value that creates this phenomenon CovM is invertiable but the determinant may switch sign. It is a phenomenon with a singularity in the determinant easily explained in the numbers, but I cannot interpret the higher perspective. I am grateful that NM in the implementation of optim apparently safeguards against this phenomenon as long as the first call to the objective is evaluable.:-) Whodathunkit?

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-14 13:58
(98 d 08:02 ago)

@ ElMaestro
Posting: # 21692
Views: 9,330
 

 Are we loosers?

Hi ElMaestro,

» The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation.

Ah, yes!

» The issue here is that the determinant suddenly becomes apparently negative.

Might point towards a similar issue in PHX complaining about negative variance components and SAS forcing them to zero.

» This is mysterious to me at this point, I am not much into matrices.

So am I. Sorry.

» I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards.

Luckily.

» However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error - you can try it with the vector above.
» Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful.

But how?

» I believe the issue can not be remedied by fiddling with X.

Agree.

» Update 8 hours later: …

Sounds reasonable.
But I think that we are going in circles. This story is all about RSABE where we have to go for ABE if swR <0.294 (or the mixed-model for ABE in general). The scaling part is easily doable in R. Since you found a solution for estimating swR and the point estimate, fine. However, if we have to go for ABE, we are still at a loss. Remember:

PROC MIXED
  CLASSES SEQ SUBJ PER TRT;
  MODEL Y = SEQ PER TRT/ DDFM=SATTERTH;
  RANDOM TRT/TYPE=FA0(2) SUB=SUBJ G;
  REPEATED/GRP=TRT SUB=SUBJ;
  ESTIMATE 'T vs. R' TRT 1 -1/CL ALPHA=0.1;
run;

:ponder:
For the imbalanced dataset of this post PHX gives me:
Δ 0.03084859
SEΔ 0.03256626
df 18.24154
and therefore, \(\small{CI=\exp (\Delta\mp t_{0.05,df}\times SE)=\left \{ 0.9747, 1.0912 \right \}}\).

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-14 15:07
(98 d 06:53 ago)

@ Helmut
Posting: # 21693
Views: 9,332
 

 "we"? Loosers?

Hi again Hötzi,

still at a loss, how?
From my own perspective, this was not about creating a package (roof, walls) but just to find a way to use an optimiser to find the solid estimate of swr for the TRR/RTR/RRT design (base), and this is what was deemed rather difficult at the outset. It succeeded when I did not express V directly and not via ZGZt+R. Did not take so much time as I had feared.

Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)? I am not sure I can do it, not sure I can't, either. Did not look into it at all, for me this part is a non-issue as is generalising the idea here to make it work for all other designs.
Will be happy to take a look at the other stuff, too, if I have something to contribute with.
For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. :-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-14 15:32
(98 d 06:28 ago)

@ ElMaestro
Posting: # 21695
Views: 9,314
 

 Misunderstanding?

Hi ElMaestro,

» still at a loss, how?

Amazingly you solved an old riddle – which is only part of the game in RSABE.

» […] Did not take so much time as I had feared.

Well, 60+ hours is a lot for a spare time project. Welcome to the club.

» Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)?

Exactly. I still don’t see how we could mimic the FDA’s mixed model for ABE in R.

» I am not sure I can do it, not sure I can't, either.

Heads up. You could be the first succeeding in R. PharmCat produced something in Julia (see there). Duno the state of affairs.

» Will be happy to take a look at the other stuff, too, if I have something to contribute with.
» For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. :-)

By “we” I’ve meant the community hoping for an open source solution.
Sorry for the huge amount of time you’ve spent on this. swR is only needed as a decision point whether to go for RSABE or ABE. Great that we can get it now in R but the evaluation for ABE remains a mystery.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-13 10:13
(99 d 11:47 ago)

@ Helmut
Posting: # 21681
Views: 9,634
 

 Braveheart!

Hi all,

The solution I posted is a total winner. :-D:-D:-D Where do I collect my medal? :-D:-D:-D

Will be happy to learn of performance on other datasets, in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum). Obviously, with this implementation the input format has to meet the expectation of the algo. For example, T and R coded as T and R, not A and B, in this implementation. All this is unrelated to the task of this thread and can be rather easily worked out as well.

A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ;-) ) to reflect the optimal solution:

Some.Initial.Guesses=function(foo)
{
 #guess varT
 D1=subset(D, (D$Trt=="T"))
 m=lm(log(Y)~factor(Seq)+factor(Per)   , data=D1)
 varT=anova(m)["Residuals", "Mean Sq"]
 
 #guess VarWR
 D1=subset(D, (D$Trt=="R"))
 m=lm(log(Y)~factor(Seq)+factor(Per)+factor(Subj)  , data=D1)
 varWR=anova(m)["Residuals", "Mean Sq"] 

 #guess varBR
 D1=subset(D, (D$Trt=="R"))
 m=lm(log(Y)~factor(Seq)+factor(Per)   , data=D1)
 varBR=anova(m)["Residuals", "Mean Sq"] - varWR

 #guess varRT (between), must tbe closer to the other two betweens.
 ##perhaps mean is a decent guess.
 varRT=0.5*(varT+varBR)

 rslt=c(varT, varBR, varWR, varRT)
 return(rslt)
}

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-13 14:16
(99 d 07:44 ago)

@ ElMaestro
Posting: # 21685
Views: 9,540
 

 Braveheart!

Hi ElMaestro,

» The solution I posted is a total winner. :-D:-D:-D Where do I collect my medal? :-D:-D:-D

Hey, it looks great indeed! However, that’s only the foundation of the house. Walls, a door, windows, and a roof would be nice.

» Will be happy to learn of performance on other datasets, …

Try my code with
path <- "https://raw.githubusercontent.com/Helmut01/replicateBE/master/inst/extdata/DS04.csv"
That’s an  incomplete  balanced data set (Cmax of Table II).1

PHX (convergence after 8 iterations without warnings, -2REML 292.14, AIC 314.14):
lambda(1,1) 0.43651691
lambda(1,2) 0.63844444
lambda(2,2) 0.41685382
s2wR        0.31426995
s2wT        0.00718607

Manual:
lambda(1,1)^2=var_bR=0.1905470

My master’s code:
  Component  Estimate     Initial
      var_T 0.5885621 0.591211271
     var_bR 0.1905466 0.008845665
     var_wR 0.3142704 0.318270251
 covar_bTbR 0.2786906 0.300028468

Not that bad. ;-)

» … in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum).

See this post with links to John’s nasty datasets. Will see whether I still have the one which failed completely both in SAS and PHX.

» A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ;-) ) to reflect the optimal solution:

Will check that later.2


  1. Patterson SD, Jones B. Viewpoint: observations on scaled average bioequivalence. Pharm Stat. 2012;11(1):1–7. doi:10.1002/pst.498.
  2. Edit: This time a really imbalanced one (1 subject in RRT and 1 in RTR removed):
    https://bebac.at/downloads/ds02a.csv
    Gives:
    Maximum         : 70.14324
    Evaluations     : 269
    Convergence code: 0
      Component   Estimate    Initial
          var_T 0.06800845 0.06571035
         var_bR 0.03821576 0.03199571
         var_wR 0.01065516 0.01009356
     covar_bTbR 0.04412955 0.04885303
    PE: 1.031329

    In PHX (Warning: Newton's algorithm converged with modified Hessian. Output is suspect.):
    s2wR 0.010655215
    PE: 1.0313293


    With your modified code and cosmetic tweaks:
    Nelder-Mead optimization (39 seconds)
      Final       70.14324
      Evaluations 283
      Successful completion.

      Component   Estimate    Initial
          var_T 0.06800851 0.06571035
         var_bR 0.03821582 0.03750818
         var_wR 0.01065521 0.01009356
     covar_bTbR 0.04412958 0.05160927

          Parameter     Value
                swR 0.1032241
               CVwR    10.35%
     Point estimate  1.031329


    [image]

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
PharmCat
★    

Russia,
2020-07-15 14:19
(97 d 07:41 ago)

@ ElMaestro
Posting: # 21699
Views: 9,131
 

 Braveheart!

Hello all!

As I understand some code for mixed model solution was done? Great! :-D

» Will be happy to learn of performance on other datasets, in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum).

Here you can find some self generated (and some public) datasets for package validation and results from SPSS here.

Also you can make any random dataset with any design with Julia, docs here.

Is is possible to make easy performance comparison on my machine?


Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post #5[Helmut]
ElMaestro
★★★

Belgium?,
2020-08-02 17:39
(79 d 04:21 ago)

@ PharmCat
Posting: # 21817
Views: 6,721
 

 Braveheart!

Hi PharmCat,


» Is is possible to make easy performance comparison on my machine?

the code on this page would suffice for the comparison, however I am not going in the direction of estimating denominator DF's and calculating CI's.

I am solely interested in redefining our specification of the covariance matrix so that the model that is being fit contains exactly the variance components we can realistically ask about for a 3-period semireplicated study.

I have reached that goal.
Also, I am writing an optimiser and it works bloody great! Not terribly fast but seems to get there in a very orderly fashion :-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-15 14:27
(97 d 07:33 ago)

@ ElMaestro
Posting: # 21700
Views: 9,137
 

 We were all blind (except Detlew)

Hi ElMaestro,

we were all blind!
Just to make clear what I mean by “we”: Myself, you, the EMA’s PKWP (and the BSWP fabricating the example datasets and the evaluation by ‘Method C’), and Patterson & Jones.
Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REML-estimate (see this post). Fortunately the templates for Phoenix are correct. Quick & dirty R-code at the end. Results:

      file             descr       s2wR       swR CVwR (%)      PHX match
  ds02.csv            EMA II 0.01298984 0.1139730 11.43441 11.43441   yes
 ds02a.csv     EMA II imbal. 0.01038611 0.1019123 10.21774 10.21774   yes
  DS04.csv Patterson & Jones 0.32489813 0.5699984 61.95883 61.95883   yes
  ds01.csv             EMA I 0.2188968  0.4678640 46.96431 46.96431   yes


In the Q&A we find for DSII the REML-based CVwR 11.5% (PHX 11.43%). Patterson & Jones reported for their dataset 61% (SAS); I got in PHX 61.96%. For DSI the Q&A gives 47.3% (PHX 46.96%).

Therefore, congratulations to all of us answering a question which was not asked by the FDA. :-(


GetData <- function(path) {
  D        <- NULL # paranoia
  D        <- read.csv(path, comment.char = "#")
  D        <- D[, 1:5]
  D[, 5]   <- log(D[, 5])
  names(D) <- c("Subj", "Per", "Seq", "Trt", "PK")
  return(invisible(D))
}
CalcSwR <- function(path) {
  D    <- GetData(path)
  R    <- D[D$Trt == "R", ]
  R    <- cbind(R, Diff = NA)
  seqs <- sort(unique(D$Seq), decreasing = TRUE)
  des  <- NA
  if (sum(seqs %in% c("TRR", "RTR", "RRT")) == 3) des <- "partial"
  if (sum(seqs %in% c("TRTR", "RTRT")) == 2) des <- "full"
  if (is.na(des)) stop("Design not supported.")
  if (des == "partial") {
    TRR <- R[R$Seq == "TRR", c(1:3, 5:6)]
    RTR <- R[R$Seq == "RTR", c(1:3, 5:6)]
    RRT <- R[R$Seq == "RRT", c(1:3, 5:6)]
    # keep only subjects with two observations
    TRR <- TRR[duplicated(TRR$Subj, fromLast = TRUE) |
               duplicated(TRR$Subj, fromLast = FALSE), ]
    RTR <- RTR[duplicated(RTR$Subj, fromLast = TRUE) |
               duplicated(RTR$Subj, fromLast = FALSE), ]
    RRT <- RRT[duplicated(RRT$Subj, fromLast = TRUE) |
               duplicated(RRT$Subj, fromLast = FALSE), ]
    # first minus second administration
    # too lazy to vectorize
    for (j in 1:(nrow(TRR)-1)) {
      if (TRR$Per[j] == 2) {
        TRR$Diff[j] <- TRR$PK[j] - TRR$PK[j+1]
      }
    }
    for (j in 1:(nrow(RTR)-1)) {
      if (RTR$Per[j] == 1) {
        RTR$Diff[j] <- RTR$PK[j] - RTR$PK[j+1]
      }
    }
    for (j in 1:(nrow(RRT)-1)) {
      if (RRT$Per[j] == 1) {
        RRT$Diff[j] <- RRT$PK[j] - RRT$PK[j+1]
      }
    }
    R <- rbind(TRR[, c(3, 5)],
               RTR[, c(3, 5)],
               RRT[, c(3, 5)])
  } else {
    TRTR <- R[R$Seq == "TRTR", c(1:3, 5:6)]
    RTRT <- R[R$Seq == "RTRT", c(1:3, 5:6)]
    TRTR <- TRTR[duplicated(TRTR$Subj, fromLast = TRUE) |
                 duplicated(TRTR$Subj, fromLast = FALSE), ]
    RTRT <- RTRT[duplicated(RTRT$Subj, fromLast = TRUE) |
                 duplicated(RTRT$Subj, fromLast = FALSE), ]
    for (j in 1:(nrow(TRTR)-1)) {
      if (TRTR$Per[j] == 2) {
        TRTR$Diff[j] <- TRTR$PK[j] - TRTR$PK[j+1]
      }
    }
    for (j in 1:(nrow(RTRT)-1)) {
      if (RTRT$Per[j] == 1) {
        RTRT$Diff[j] <- RTRT$PK[j] - RTRT$PK[j+1]
      }
    }
    R <- rbind(TRTR[, c(3, 5)],
               RTRT[, c(3, 5)])
  }
  R    <- R[!is.na(R$Diff), ]
  m    <- lm(Diff ~ as.factor(Seq), data = R)
  s2wR <- anova(m)["Residuals", "Mean Sq"]/2
  res  <- c(s2wR = s2wR, swR = sqrt(s2wR), CVwR = 100*sqrt(exp(s2wR)-1))
  names(res)[3] <- "CVwR (%)"
  return(res)
}
p <- paste0("https://",
       c("bebac.at/downloads/ds02.csv",
         "bebac.at/downloads/ds02a.csv",
         "raw.githubusercontent.com/Helmut01/replicateBE/master/inst/extdata/DS04.csv",
         "bebac.at/downloads/ds01.csv"))
d <- c("EMA II", "EMA II imbal.", "Patterson & Jones", "EMA I")
res <- data.frame(file = basename(p), descr = d, X1 = NA, X2 = NA, X3 = NA,
                  PHX = c(11.43441, 10.21774, 61.95883, 46.96431), match = "no")
names(res)[3:5] <- c("s2wR", "swR", "CVwR (%)")
for (j in 1:4) {
  res[j, 3:5] <- CalcSwR(path = p[j])
  if (round(res[j, 5], 5) == res$PHX[j]) res$match[j] <- "yes"
}
print(res, row.names = FALSE)


Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-15 16:25
(97 d 05:36 ago)

@ Helmut
Posting: # 21701
Views: 9,079
 

 It is the opposite way around for me

Hi again,

to me it is exactly the opposite way around:
A model with more than one variance component is a mixed model. I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance.
My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design.

I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part.

For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML? What would your answer imply for any future recommendations for guidelines/guidances?
(I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators)

This thread already caught the attention of a number cruncher at an agency across the Atlantic unofficially, so I am OK with it all:-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-15 17:33
(97 d 04:27 ago)

@ ElMaestro
Posting: # 21702
Views: 9,072
 

 Desultory thoughts

Hi ElMaestro,

» to me it is exactly the opposite way around:
» A model with more than one variance component is a mixed model.

Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course.

» I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance.

Well, the EMA’s models are wacky because they assume equal intra-subject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away.
I don’t know why the FDA does not directly estimate \(s_{wR}^2\) from the mixed model but work with the intra-subject contrasts of R instead. More below.

» My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design.

OK.

» I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part.

With the current guidance one gets always an estimate of \(s_{wR}^2\). No problems in the RSABE-part if ≥0.0294. Only in the mixed-model for ABE and the partial replicate designs one may run into the convergence issues. That’s not resolved yet, unless one uses a simpler1 variance-covariance structure, which is against the guidance. But – again and again – there are cases were nothing helped, neither in SAS nor in Phoenix. Duno whether it is possible to specify Nelder-Mead in SAS or Phoenix.2 That’s why I still hold that the partial replicates are lousy designs because one might end up with no result for ABE unless one opts for the EMA’s models (which are wrong, IMHO).
OK, one step back. Since the mixed model for ABE is recommended by the FDA for ages, I wonder what people have done when they faced convergence problems…

» For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML?

Good question, next question. Maybe REML…

» What would your answer imply for any future recommendations for guidelines/guidances?

Oh dear! I participated in all GBHI workshops. The discussions about reference-scaling (Rockville 2016, Amsterdam 2018) were disastrous. What do we have?
  • US and China
    • RSABE (any PK metric)
    • No clinical justification required.3
    • No upper cap on scaling.
    • Mixed model for ABE, ISC for RSABE.
  • EMA, WHO, and many more
    • ABEL (Cmax; for modified release additionally Cmin/Cτ, partial AUCs).
    • Upper cap of expansion at CVwR 50%.
    • All effects fixed.
    • Clinical justification (no safety issues wider wider limits).
    • Demonstration that CVwR is not caused by outliers.
  • Health Canada
    • Like above but AUC only and upper cap 57.4%.
    • Mixed effects model.
    • Outliers can be excluded if procedure specified in the protocol (irrespective of the treatment).
And yes, problems with inflation of the Type I Error, which all regulators prefer to ignore. Kudos to some companies taking the issue more seriously and adjusting α downwards.

» (I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators)

Agree.


  1. Actually it is the other way ’round. The model of the guidance is overspecified (read: wrong) for partial replicates.
  2. I played around with a lot of optimizers / settings in R the last three days. Only the grid search was stable (though slow). All others ran into singularities even with tweaked parameter constraints.
  3. Yeah, highly variable drugs are safe drugs. Instead of coming up with guidances for the “highly variable narrow therapeutic index drugs” dagibatran and rivaroxaban, IMHO, they should be taken off the market. Regulators should be concerned about protecting the public health and not the profits of the industry.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2020-07-15 18:13
(97 d 03:47 ago)

@ Helmut
Posting: # 21703
Views: 9,055
 

 FDA RSABE is ISC

Hi Helmut, Hi ElMaestro,

» ...
» Well, the EMA’s models are wacky because they assume equal intra-subject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away.

Full ACK!

» I don’t know why the FDA does not directly estimate \(\small{s_\textrm{wR}^2}\) from the mixed model but work with the intra-subject contrasts of R instead. ...

I was always wondering about the different methodologies in the code of the progesteron guidance:
  • ISC analysis for estimating CVwR and decision for using RSABE or ABE based on this estimate
  • Mixed model analysis of the 90% CI of T-R (ABE analysis) in case of swR < 0.029356 (CVwR < 30%)
  • ISC analysis for estimating the point estimate of T-R including 90% CI and its use in the RSABE criterion in case of swR >= 0.029356 (CVwR >= 30%)
Why not use the ISC estimate of T-R in the ABE decision in case of swR < 0.029356 (CVwR < 30%)?
Politics? Nostalgia (Since years recommended the Proc MIXED code)?

Or the other way round: Why not use the components of the RSABE criterion (pe and 90% CI, s2wR) from the mixed model approach?

I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial :cool:.

Regards,

Detlew
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-16 11:11
(96 d 10:49 ago)

@ d_labes
Posting: # 21709
Views: 8,863
 

 FDA RSABE is ISC

Hi Detlew,

» Why not use the ISC estimate of T-R in the ABE decision in case of swR < 0.029356 (CVwR < 30%)?
» Politics? Nostalgia (Since years recommended the Proc MIXED code)?

The guidance “Statistical Approaches to Establishing Bioequivalence” of January 2001 (where the same SAS code is given in APPENDIX E) is still in force. Hence, it is only consistent to give it in the progesterone guidance.

» I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial :cool:.

Cannot agree more. All problems would vanish.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-15 23:06
(96 d 22:54 ago)

@ Helmut
Posting: # 21705
Views: 9,003
 

 Desultory thoughts

Hi Hötzi,

» » to me it is exactly the opposite way around:
» » A model with more than one variance component is a mixed model.
»
» Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course.

May I offer the completely opposite view? Search the forum for "bogus statement" :-D, check the SAS documentation for what the statement actually actually does to Proc GLM:
When ProcGLM is used for 222BE with or without the bogus statement it still means a normal linear model is fit (even when the theory in e.g. C&L calls it "random"). There is a single variance component in such fits. If you wish to verify it: There is a model matrix with columns for subjects; check the df's for such models, it would be (much) higher if subjects were fit as random. Conversely, if subject were random subject would not appear as a factor in the anova.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-16 10:59
(96 d 11:01 ago)

@ ElMaestro
Posting: # 21708
Views: 8,866
 

 Desultory thoughts

Hi ElMaestro,

» May I offer the completely opposite view? Search the forum for "bogus statement" :-D, check the SAS documentation for what the statement actually actually does to Proc GLM: […]

I know, I know. ;-)
I was talking about a mixed model and not the way how SAS uses the RANDOM statement in PROC GLM in order to get the DFs right.
A true all fixed effects model – with subjects instead of the crazy subjects(sequence) – requires unique coding of subjcts. I practice always the case (OK, maybe not in multicenter studies but then you have center as another fixed effect). I recently reviewed a manuscript where subjects in each sequence had the same codes. The authors obviously had no clue about how things work.

BTW, same agencies require that in crossovers CVintra and CVinter are reported. The latter is not possible when subject is a fixed effect.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2020-07-19 00:42
(93 d 21:18 ago)

@ Helmut
Posting: # 21745
Views: 8,327
 

 Phoenix - which template?

Hi Helmut,

» Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REML-estimate (see this post). Fortunately the templates for Phoenix are correct.

are you referring to that template for RSABE?

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2020-07-19 01:45
(93 d 20:15 ago)

@ mittyri
Posting: # 21747
Views: 8,321
 

 FDA RSABE Project template_ v1.4.phxproj

Hi mittyri,

» are you referring to that template for RSABE?

Nope. I was using “FDA RSABE Project template_ v1.4.phxproj” of 2014-11-11 (should be accessible at https://www.certarauniversity.com/store/637337-106-fl-free-phoenix-templates-for-bioequivalence-regulatory-guidances but my credentials were not accepted (why the heck?) and when I tried to reset the password I didn’t receive the confirmation e-mail…).
Relevant part:

FDA RSABE Project template_ v1.4
 ↳ Workflow
  ↳ RSABE
    ↳ Prepare Data for RSABE analysis
      ↳ Intermediate ilat (Dependent: ilat, Fixed Effect: Sequence)
          Result: Final Variance Parameters
            Estimate
            ↳ Transformation Type = Arithmetic
              Transformation = x / n
              New Column Name = Var_wr
              n = 2
              Var_wr
              ↳ Transformation Type = Functions
                Transformation = Square root(x)
                New Column Name = sWR


The main difference to v1.3 is that v1.4 uses the function chiinv(0.95,df) further down which was not available in PHX6.3 and a table of values had to be used instead.
However, with v1.3 you get the same result like in this post.


PS: Ref.35 of doi:10.1208/s12248-020-0427-6 leads to limbo. Not very professional. I asked Certara’s support to establish a [image] permanent redirect on the server.

Dif-tor heh smusma 🖖
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2020-07-24 10:07
(88 d 11:53 ago)

@ ElMaestro
Posting: # 21783
Views: 7,610
 

 "By popular demand": likelihood

Hi all,

to make the objective function return the likelihood, use e.g.:

Obj.F12=function(Pars)
##returns the restricted log likelihood.
##apart from k, this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
##k is invariant with parameter estimates and variance components.
{
  CovM=Create.CovM(Pars)
  k=  -((length(y)-ncol(X))/2)*log(2*pi)
  A= -0.5*log( det(CovM))
  B= -0.5*log( det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(k+A+B+C)
}


The optimised object (F in my original code) will hold the optimised likelihood as F$value.
Opportunity: Will run a tiny bit faster if k is public, and if solve(CovM) is done a single time.


For example at reltol=1e-8 I get with EMA dataset II a value of 15.33728.

This matches exactly what SAS reports, see post # 21670 above.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-07-24 12:52
(88 d 09:08 ago)

@ ElMaestro
Posting: # 21785
Views: 7,570
 

 And by the way....

Obj.F12=function(Pars)
##returns the restricted log likelihood.
##apart from k, this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
##k is defined from the number of observations, and .
{
  CovM=Create.CovM(Pars)
  k=  -((length(y)-ncol(X))/2)*log(2*pi)
  A= -0.5*log( det(CovM))
  B= -0.5*log( det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(k+A+B+C)
}


....this is exactly also the REML likelihood function that SAS defines.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-03 14:24
(78 d 07:36 ago)

@ ElMaestro
Posting: # 21820
Views: 6,672
 

 And by the way....

» ....this is exactly also the REML likelihood function that SAS defines.

Hello!

Do you calculate REML for all data at once or calculate REML subject by subject?

Is it possible to use function caching in R?
ElMaestro
★★★

Belgium?,
2020-08-03 22:55
(77 d 23:05 ago)

@ PharmCat
Posting: # 21821
Views: 6,624
 

 Not understood

Hi Pharmcat,


» Do you calculate REML for all data at once or calculate REML subject by subject?

REML is not generally done on a per-subject basis. Each model is assumed to have an optimal fit, given the data and the variance/covariance structure. The optimal fit, when putatively found, corresponds to the fixed effects and the variance components, and the log likelihood serves as objective function.
I am sure I am missing somethign clever in your question. Can you describe what you meant?

» Is it possible to use function caching in R?

You can always write some public members to that effect where necessary and appropriate. I don't think R has a built-in optimizer which takes care of it as the language is interpreted, but other users of this forum should confirm; I am not expert on the inner workings of R.
I do not see how this would affect the REML process - can you describe with more words what your perspective on caching was, then we'll possibly figure out something?

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-05 01:41
(76 d 20:19 ago)

@ ElMaestro
Posting: # 21823
Views: 6,482
 

 Not understood

» Can you describe what you meant?


Hello ElMaestro!

logREML of all data is just sum of individual subject logREML. See Mary J. Lindstrom and Douglas M. Bates, Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data.

You can make big sparce V matrix for all data and then inverce it (solve(CovM) in your code above) or you can take indiviual Xi and Vi and calculate logREML for each subject and sum.

Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it block-diagonal) it can be done faster.

Problem2: In BE case you have many equal Zi matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result - you can seriously increase performance.

So, I didn't read all R code (sorry) and I didn't understand what approach is used.
ElMaestro
★★★

Belgium?,
2020-08-05 08:13
(76 d 13:47 ago)

@ PharmCat
Posting: # 21824
Views: 6,468
 

 Not understood

Thanks PharmCat,

funny, I read the ref. you mention (Lindstrom & Bates) yesterday and I did not understand much of it. Will try again.

» Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it block-diagonal) it can be done faster.

Sounds neat, will check if I can somehow do such a thing.

» Problem2: In BE case you have many equal Zi matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result - you can seriously increase performance.

My approach is very simple: I make a case for saying that when T is not replicated while R is replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens). If we insest on ZGZt+R then we get irrelevant variance components associated with T whichever way we parameterise the covariance matrix. Therefore, I am working in V directly, meaning I am writing the blocks into it directly.
This may still be compatible with your proposal, though.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-05 16:37
(76 d 05:23 ago)

(edited by PharmCat on 2020-08-06 00:34)
@ ElMaestro
Posting: # 21825
Views: 6,421
 

 Not understood

Hello!

» Sounds neat, will check if I can somehow do such a thing.

I found Lindstrom&Bates, equation 3.4 or J. GURKA, 2006, equation 5.

Some experiment:

m = matrix(c(5, 1, 0, 0, 1, 2, 0, 0, 0, 0, 5, 3, 0, 0, 3, 4), nrow = 4, ncol = 4)
m
     [,1] [,2] [,3] [,4]
[1,]    5    1    0    0
[2,]    1    2    0    0
[3,]    0    0    5    3
[4,]    0    0    3    4
log(det(m))
[1] 4.59512
log(det(m[1:2, 1:2])) + log(det(m[3:4, 3:4]))
[1] 4.59512
solve(m)
           [,1]       [,2]       [,3]       [,4]
[1,]  0.2222222 -0.1111111  0.0000000  0.0000000
[2,] -0.1111111  0.5555556  0.0000000  0.0000000
[3,]  0.0000000  0.0000000  0.3636364 -0.2727273
[4,]  0.0000000  0.0000000 -0.2727273  0.4545455
solve(m[1:2, 1:2])
           [,1]       [,2]
[1,]  0.2222222 -0.1111111
[2,] -0.1111111  0.5555556
solve(m[3:4, 3:4])
           [,1]       [,2]
[1,]  0.3636364 -0.2727273
[2,] -0.2727273  0.4545455


So Vi can be result of function ƒ(Zi, θ); Zi is known for all subjects, θ - vector of optimizables parameters.

» My approach is very simple: I make a case for saying that when T is not replicated while R is
» replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens).

Mmm... I can't say that ZGZ'+R is meaningful. If you construct V manualy it can still be represented as ZGZ'+R if ρ not equal 0. If used random model with covariation (ρ≠0) nondiagonal elements of G depends on diagonal and when you construct V some nondiagonal elements depends on both variance parameters. Then we add diagonal R matrix and have combination (parts of V): σ2BT2WT, σ2BR2WR, σ2BR, σBTBR*ρ (we have no σ2BT) so we can estimate σ2WT from σ2BT2WT and σ2BT is stable because it is part of σBTBR*ρ (Yes estimate of σ2WT is indirect). If ρ = 0 we have σBTBR*ρ = 0, and we can estimate only σ2BT2WT.

And I think that V can be decomposed to ZGZ'+R if you know Z, ρ (ρ≠0) and σ2BR in model with covariance and diagonal R.
ElMaestro
★★★

Belgium?,
2020-08-06 21:11
(75 d 00:49 ago)

(edited by ElMaestro on 2020-08-06 21:23)
@ PharmCat
Posting: # 21827
Views: 6,240
 

 Open issues

Hi PharmCat,

I really like your idea. I don't have a good understanding of this as a whole but I know that when I work with a little these things they become easier for me to grasp, so I am giving it a try. :-)

Let us turn to Gurka 2006 equation 3 and 5 and remembering that the rank of our overall design matrix X is 6:
Gurka requires me to derive Xi, the design matrix for fixed effects of subject i.
They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject
with rank p"

At just three observations per subject we cannot have a rank of p=6 for Xi. The maximum rank is naturally 3. I don't see a way to handle Xi on a subject by subject basis, but perhaps you have some insight. Possibly I could do it by lumping subjects together so that the "smaller" X of sorts still has more rows thank columns and thus a rank of p=6.


Do you see my issue with Xi? Could you indicate your view on Xi here? Is lumping subjects together a way to go to ensure the right rank of the reduced design matrices? Many thanks in advance.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-07 00:02
(74 d 21:58 ago)

@ ElMaestro
Posting: # 21828
Views: 6,241
 

 Open issues

Hi ElMaestro!


» Gurka requires me to derive Xi, the design matrix for fixed effects of subject i.
» They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject
» with rank p"

I think it means rank of whole X, because after: "and β is a (p × 1) vector of unknown, constant population parameters". Xi is a part of X and have same count of columns, rank Xi can be lower. Here is no contradictions I think. If look at Xiuming Zhang, 2015* p is just length of β and rank should be the same. If not, we will have rank-deficient matrix.

And if look generally - it really doesn't matter because -(N-p)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets - this will matter.

And as I know some softaware using different N for calculation (some use all observations, some only statistically independent (m)), and some include second constant part: logdet(Σim(X'X))/2

So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets.

*A Tutorial on Restricted Maximum Likelihood Estimation in Linear Regression and Linear Mixed-Effects Model

Example N=3, n=9, p=6

X:
β1 β2 β3 β4 β5 β6

1  1  0  1  0  0
1  1  0  1  0  0
1  1  0  1  0  0
1  0  1  0  1  0
1  0  1  0  1  0
1  0  1  0  1  0
1  1  0  0  0  1
1  1  0  0  0  1
1  1  0  0  0  1

X1
β1 β2 β3 β4 β5 β6

1  1  0  1  0  0
1  1  0  1  0  0
1  1  0  1  0  0

X2
β1 β2 β3 β4 β5 β6

1  0  1  0  1  0
1  0  1  0  1  0
1  0  1  0  1  0

X3
β1 β2 β3 β4 β5 β6

1  1  0  0  0  1
1  1  0  0  0  1
1  1  0  0  0  1
ElMaestro
★★★

Belgium?,
2020-08-07 07:49
(74 d 14:11 ago)

@ PharmCat
Posting: # 21829
Views: 6,186
 

 Open issues

Hi PharmCat,

» And if look generally - it really doesn't matter because -(N-p)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets - this will matter.

I am afraid you missed my point. The issue is not related to constants. As you say they are not of importance to find the maximum, which will just be vertically displaced.

Xi is the design matrix for fixed effects for the i'th subject.
Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject).

Thus we cannot use Xi -if i is a single subject- during the optimization for the RTR/TRR/RRT design. Applying Gurkas equations will not work and I am looking for a way to work on derived versions of X (like for a few subjects at a time?) so that X retains in rank of 6 (in this design).

» So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets.

Don't worry, the constants are of no importance to find the variance components.

» Example N=3, n=9, p=6

These examples of X are possibly not very illustrative; as you can see column 2+3 add up to column 1, the intercept, column 4+5+6 add up to that as well, and so forth. X is column-wise not full rank.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-07 11:29
(74 d 10:31 ago)

@ ElMaestro
Posting: # 21830
Views: 6,167
 

 Open issues

Hi ElMaestro!

I thought you asked about p, sorry :)

» Xi is the design matrix for fixed effects for the i'th subject.
» Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject).

I think Xi is not shuld be full-rank design matrix. It is just part of X for subject i. Moreover all Xi should be the same width (num of columns) for Σ(Xi'V-1Xi). Also with (yi - Xi*β) you get residuals ri (col num of Xi should be equal length β).
There is not very accurate description in Gurka's paper, but it provide good concept.
If you check, it should work.
ElMaestro
★★★

Belgium?,
2020-08-07 13:08
(74 d 08:52 ago)

(edited by ElMaestro on 2020-08-07 13:25)
@ PharmCat
Posting: # 21831
Views: 6,161
 

 Still can't make it work

Hi PharmCat,

thanks for trying to assist on this. :-) It is kind of you.

» I think Xi is not shuld be full-rank design matrix. It is just part of X for subject i. Moreover all Xi should be the same width (num of columns) for Σ(Xi'V-1Xi). Also with (yi - Xi*β) you get residuals ri (col num of Xi should be equal length β).
» There is not very accurate description in Gurka's paper, but it provide good concept.
» If you check, it should work.

I can't make it work.
In Gurka's equation 3, where we can write the shorthand beta= M1-1*M2: When using the little 3-by-6 matrices for Xi, M1 becomes non-invertible.
I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model.

I wonder if Gurka's equation only works under assumptions not mentioned in the paper or if I am just a very lousy programmer. My money is clearly on the latter :-D

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-07 15:42
(74 d 06:18 ago)

@ ElMaestro
Posting: # 21833
Views: 6,143
 

 Still can't make it work

Hi ElMaestro!

» I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model.

If you calculate β with initial V - it will be wrong. It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y

But if you have wrong β at final iteration - it is bad case.

You can find same equation in LINDSTROM&BATES - 1.2 (where X not a design matix, but vector of Xi, IMHO)

If you provide me data from 6 subjects I can make calculations in Julia.

Magical linear algebra in julia:
using LinearAlgebra

X1 = [1 1 0 0 1; 1 0 0 0 1]
X2 = [1 1 0 0 1; 1 1 1 0 0]
Xv = Vector{Any}(undef, 2)
Xv[1] = X1
Xv[2] = X2
Xv'*Xv == X1'*X1 + X2'*X2 == vcat(X1, X2)'*vcat(X1, X2)
#true
ElMaestro
★★★

Belgium?,
2020-08-07 16:44
(74 d 05:16 ago)

@ PharmCat
Posting: # 21834
Views: 6,129
 

 Still can't make it work

Hi PharmCat.

» If you calculate β with initial V - it will be wrong.

Correct and that is not what I am doing. For every iteration through the optimizer I am deriving a new beta vector; this is beta given V. The loglikelihood is calculated on basis of V and beta, where beta is derived from V via X. Therefore LL can be said to be a function of V (or of the variance components in it).

There are two alternative ways I can get that beta vector for any given V:
1. I can do it the simple way with the very big X. It works well.
2. I can perhaps (=I should be able to?) do it by partition X into small bits, Xi, like in Gurka equation 3, or Lindstrom&Bates. But I cannot get the same beta vector for a given set of variance components in V.

The -or a- reason is that if I reduce X to Xi (individual subjects) and do all the summation then the M1 and written above is not invertible. This, in turn, appears to be because the individual Xi's are not of the same rank as X. If I lump Xi's together like the first 6 subjects, then the next 6 subjects etc, and so forth and use Gurka's equation 3, then I can generate a beta estimate (from 4 chunks of 6 subjects), but it is still not the same as in pt 1, for any given set of components in V.

I am not even at a final iteration, I am just at any iteration where I am comparing beta via the two methods for a set of starting values, or for that matter, if I plug in the values that come out as optimal values from pt 1 above after everything is coverged.


» It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y

You do not need this since you do not need starting values for beta, cf the LL above. You only need starter values for theta (variance components in V). Every pass through the REML LL equation generates a beta estimate from the variance compnents, given X. Once the algo has converge d the estimate of beta is final.
All this imaterial to my challenge at hand.

» But if you have wrong β at final iteration - it is bad case.

This is not about the "final iterattion". It is about any iteration. For any iteraiton, the two ways to derive beta results in different values. Therefore something s wrong.

» You can find same equation in LINDSTROM&BATES - 1.2 (where X not a design matix, but vector of Xi, IMHO)


» If you provide me data from 6 subjects I can make calculations in Julia.

Here all subjects are (y is logarithmised):

Subj Seq Trt Per y
1 RTR R 1 8.307361
1 RTR T 2 8.286622
1 RTR R 3 8.229191
2 RTR R 1 8.001757
2 RTR T 2 7.774351
2 RTR R 3 7.939016
3 RRT R 1 8.150295
3 RRT R 2 8.113786
3 RRT T 3 8.301224
4 TRR T 1 8.319961
4 TRR R 2 8.068152
4 TRR R 3 8.243703
5 TRR T 1 8.469640
5 TRR R 2 8.421255
5 TRR R 3 8.278936
6 RRT R 1 8.023159
6 RRT R 2 8.015393
6 RRT T 3 7.791358
7 RTR R 1 7.836054
7 RTR T 2 8.030084
7 RTR R 3 7.993823
8 TRR T 1 7.698483
8 TRR R 2 7.621391
8 TRR R 3 7.609862
9 RTR R 1 8.444106
9 RTR T 2 8.333174
9 RTR R 3 8.131531
10 TRR T 1 7.708949
10 TRR R 2 7.766586
10 TRR R 3 7.705803
11 RRT R 1 7.530373
11 RRT R 2 7.701833
11 RRT T 3 7.780888
12 RRT R 1 7.731229
12 RRT R 2 8.061613
12 RRT T 3 8.275682
13 RRT R 1 7.878686
13 RRT R 2 7.795811
13 RRT T 3 7.961789
14 RTR R 1 8.016582
14 RTR T 2 7.807835
14 RTR R 3 7.996452
15 RTR R 1 7.720639
15 RTR T 2 7.598299
15 RTR R 3 7.910003
16 TRR T 1 7.992809
16 TRR R 2 8.143808
16 TRR R 3 8.114504
17 TRR T 1 7.781890
17 TRR R 2 7.885856
17 TRR R 3 7.683404
18 RRT R 1 7.910224
18 RRT R 2 7.939373
18 RRT T 3 8.054078
19 RTR R 1 7.791027
19 RTR T 2 7.919283
19 RTR R 3 7.825645
20 RRT R 1 7.886983
20 RRT R 2 7.982689
20 RRT T 3 8.018691
21 RRT R 1 7.961928
21 RRT R 2 7.888485
21 RRT T 3 8.029107
22 TRR T 1 7.989221
22 TRR R 2 7.981665
22 TRR R 3 7.956967
23 TRR T 1 8.056680
23 TRR R 2 8.066396
23 TRR R 3 8.174308
24 RTR R 1 7.536257
24 RTR T 2 7.500419
24 RTR R 3 7.912350

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-07 18:14
(74 d 03:46 ago)

(edited by PharmCat on 2020-08-08 01:12)
@ ElMaestro
Posting: # 21835
Views: 6,126
 

 Still can't make it work

» Here all subjects are (y is logarithmised)


# Theta estimate θ was taken by REML

using CSV, DataFrames, StatsModels, StatsBase

path = dirname(@__FILE__)
cd(path)

data   = CSV.File("dat.csv", delim=' ')|> DataFrame
#Sort data
sort!(data, [:Subj, :Trt, :Per])
#Get X matrix, Z, y
X = ModelMatrix(ModelFrame(@formula(0 ~ Seq + Per + Trt), data)).m
Z = ModelMatrix(ModelFrame(@formula(0 ~ 0 + Trt), data, contrasts = Dict(:Trt => StatsModels.FullDummyCoding()))).m
y   = data[!, :y]
#Xv vector of Xi, Zv vector of Zi, yv vector of yi
u = unique(data[!, :Subj])
Xv = Vector{Matrix}(undef, length(u))
Zv = Vector{Matrix}(undef, length(u))
yv = Vector{Vector}(undef, length(u))
for i = 1:length(u)
    v = findall(x -> x == u[i], data[!, :Subj])
    Xv[i] = view(X, v, :)
    Zv[i] = view(Z, v, :)
    yv[i] = view(y, v)
end

# Theta estimate θ[1:2] for R, θ[1:3] for G
# Very hard to take good θ estmate for this design
# If you provide your estimate, β can be recalculated

θ = [0.013246492714940418,
0.008891008058562478,
0.03621599611178057,
0.06160355780666301,
0.9661995154179528]
#G matrix
G = [θ[3] sqrt(θ[3]*θ[4])*θ[5]; sqrt(θ[3]*θ[4])*θ[5] θ[4]]

#Vector of R matrices
Rv = Diagonal.(map(x -> x * θ[1:2], Zv))

#Construct vector of Vi
Vv = Vector{Matrix}(undef, length(u))

for i = 1:length(u)

    global Vv[i] = Zv[i]*G*Zv[i]' + Rv[i]
end

#Vector of inverted Vi
iVv = inv.(Vv)

M1 = zeros(6, 6)
M2 = zeros(6)

#Calc M1 & M2
for i = 1:length(u)

    global M1 .+= Xv[i]'*iVv[i]*Xv[i]
    global M2 .+= Xv[i]'*iVv[i]*yv[i]
end

β = inv(M1) * M2

#=
julia> β
6-element Array{Float64,1}:
 7.904258681084915
 0.0547761264037151
 0.05092362547466389
 0.0012959346740553102
 0.048118895192829976
 0.02239133333333365
=#


I get beta:

7.904258681084915
0.0547761264037151
0.05092362547466389
0.0012959346740553102
0.048118895192829976
0.02239133333333365
ElMaestro
★★★

Belgium?,
2020-08-07 18:23
(74 d 03:37 ago)

@ PharmCat
Posting: # 21836
Views: 6,114
 

 Still can't make it work

Hi PharmCat,

it is kind of you, but this looks like apples compared to pears.

The whole point is that G and R are meaningless so I cannot compare to an output generated with a model involving G and R. I am doing V directly, because we do not work with within and betweens for T.
That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here.

Note also, I am not after correlations per se. Therefore I am writing the variances directly into the covariance matrix (perhaps like you are used to if you work with compound symmetry of the heterogenous type).

If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
ElMaestro
★★★

Belgium?,
2020-08-07 21:31
(74 d 00:29 ago)

(edited by ElMaestro on 2020-08-07 22:48)
@ ElMaestro
Posting: # 21837
Views: 6,063
 

 And now it works

I think I made beta work now, thanks a lot PharmCat :-D:-D:-D:-D:-D

It appears that beta can be constructed even if the local Xi is not of rank p; I guess it must be a little typo in Gurka's paper. Anyways, I reprogrammed the lines, and now I get conformable matrices, and beta is the same whichever way I calculate it.
May try and see if I can do some comparisons about the performance, I still need a lot of code to put it all into function.

Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help:ok:.

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-08 01:08
(73 d 20:52 ago)

@ ElMaestro
Posting: # 21838
Views: 6,046
 

 Still can't make it work

Hi ElMaestro!

» That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here.

I understand your point. And generaly your variance component function is better because is not redundant. From other side if we have some different functions V = ƒ11) and V = ƒ22) it could be any and from any θ if we get equal V (so it could be CSH, FA0(2), manual construct).

» If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model.

I have 15.337299690629834, model is the same ;-) it is good proof: different V construction methods lead to same results.

» I guess it must be a little typo in Gurka's paper.

From time to time I find inaccuracies even in good articles and sometime it give strong headake when you try directly reproduce some methods. log(Σ|X|) instead log(|ΣX|) can give hours of frustration.:crying:

» Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help:ok:.

I am very glad if my thoughts were useful :clap:
ElMaestro
★★★

Belgium?,
2020-08-08 12:25
(73 d 09:36 ago)

@ PharmCat
Posting: # 21839
Views: 6,023
 

 Speed improvement

I am seeing about a 30% speed improvement on my machine when using arrays of matrices. Am using lists for that purpose, that's the only way I can think of.
Further improvement may be possible, because I may only need to store principally three different matrices in these arrays (per array) an keep track of which subjects use which matrices (that would be defined by their sequences), but this, I think, for now is not a priority for me to do.

A further option is to only use one kind of matrix and then reorder all periods in the global data list, so that all subjects have e.g. R-T-R, then we'd need to work with only a single matrix type in the profiles; this may improve speed further. Will think about it...

30%...It ain't much but it's honest work. :-)

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-08 17:27
(73 d 04:33 ago)

(edited by PharmCat on 2020-08-08 17:43)
@ ElMaestro
Posting: # 21840
Views: 5,903
 

 Speed improvement

» 30%...It ain't much but it's honest work. :-)

It is very good boost. I donn't know how matrix multipication works in R, but some improovments can be done if make function A = B'*C*B or A += B'*C*B. At each step of this function there are many intermediate products that can be avoided with direct computation. (Or make it in C)

Example incremantal function:

"""
A' * B * A -> +θ (cache)
A' * B * C -> +β
"""
function mulθβinc!(θ, β, A::AbstractMatrix, B::AbstractMatrix, C::Vector)
    q = size(B, 1)
    p = size(A, 2)
    c = zeros(eltype(B), q)
    for i = 1:p
        c .= 0
        for n = 1:q
            for m = 1:q
                @inbounds c[n] += B[m, n] * A[m, i]
            end
            @inbounds β[i] += c[n] * C[n]
        end
        for n = 1:p
            for m = 1:q
                @inbounds θ[i, n] += A[m, n] * c[m]
            end
        end
    end
    θ, β
end

function mulsimple!(θ, β, A::AbstractMatrix, B::AbstractMatrix, C::Vector)
    AB = A'*B
    θ .+ AB*A, β .+ AB*C
end

θ = zeros(4, 4)
β = zeros(4)

A = [1 3 4 3; 2 3 2 3; 3 4 5 6; 3 4 5 6] B = [1 3 4 3; 2 3 2 3; 3 4 5 6; 3 4 5 6] C = [1, 3, 4 ,3]


julia> @time mulsimple!(θ, β, A, B, C);
0.000007 seconds (10 allocations: 1.172 KiB)

julia> @time mulθβinc!(θ, β, A, B, C);
0.000004 seconds (2 allocations: 144 bytes)

eightfold memory savings
ElMaestro
★★★

Belgium?,
2020-08-08 18:10
(73 d 03:50 ago)

@ ElMaestro
Posting: # 21841
Views: 5,894
 

 Speed improvement

Hi again,

now it gets really interesting.
I made two observations:

1. There is this 30% improvement if I switch to profile log likelihood in stead of "the big one" for lac kof betyter wording
2. I can achieve about an improvement of several thousand percent if stay with the big X and V, however, I must then use a smart tactic:
I am only generating V once, but though each pass on the optimiser I am updating V rather than re-writing it.
Instead of iterating across all rows and column and checking in the data list where to put the individual components, I am initially generating a list of coordinates for each variance component. Something like this:

V.Coordinates=function(Data)
{
 Nobs=nrow(Data)
 xy.varT=NULL; xy.varBR=NULL; xy.varWR=NULL; xy.varBRT=NULL;

 for (iRow in 1:Nobs)
 for (iCol in 1:Nobs)
  {
   ## the diagonal
   if (iCol==iRow)
    {
      if (Data$Trt[iRow]=="T") xy.varT=rbind(xy.varT, c(iRow, iCol))
      if (Data$Trt[iRow]=="R") xy.varWR=rbind(xy.varWR, c(iRow, iCol))
    }
   ## off diagonal
   if (iCol!=iRow)
   if (Data$Subj[iRow]==Data$Subj[iCol])
    {
     if (Data$Trt[iCol]==Data$Trt[iRow]) xy.varBR=rbind(xy.varBR, c(iRow, iCol))
     if (Data$Trt[iCol]!=Data$Trt[iRow]) xy.varBRT=rbind(xy.varBRT, c(iRow, iCol))
    }
  }
 return(list(xy.varT=xy.varT, xy.varBR=xy.varBR, xy.varWR=xy.varWR, xy.varBRT=xy.varBRT))
}


This returns a list with four components. The first component is a set of coordinates in V for varT, the second is a set of coordinates for varBR. And so forth.


Then, in the REML function, I am having a block like this:
  for (i in 1:nrow(VC$xy.varT))
   V[VC$xy.varT[i,1], VC$xy.varT[i,2]] = Pars[1]
  for (i in 1:nrow(VC$xy.varBR))
   if (VC$xy.varBR[i,1] != VC$xy.varBR[i,2])
    V[VC$xy.varBR[i,1], VC$xy.varBR[i,2]] = Pars[2]
  for (i in 1:nrow(VC$xy.varWR))
   V[VC$xy.varWR[i,1], VC$xy.varWR[i,2]] = Pars[3]+Pars[2]
  for (i in 1:nrow(VC$xy.varBRT))
   V[VC$xy.varBRT[i,1], VC$xy.varBRT[i,2]] = Pars[4]


And now are simply talking REAL BUSINESS :-D:-D:-D:-D:-D :lol:
The optimiser is returning blazingly fast with an REML estimate and corresponding variance components. Depending on the tolerance required it is almost instantaneous (<0.5 secs) on my laptop and that machine isn't a high tech equipment at all. The speed with "big V" far outruns the profile approach.
I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines..


However, I am also seeing a different little thing:
If I optimise with the profile approach I can make BFGS converge, but this will not work if I try it it in the "big V" approach. Lindstrom and Bates noticed something similar in the sense that they wrote that Quasi-Newton methods perform better when the profile approach is used:
"We recommend optimizing the profile log- likelihood, since it will usually require fewer iterations, the derivatives are somewhat simpler, and the convergence is more consistent. We have also encountered examples where the NR algorithm failed to converge when opti- mizing the likelihood including a but was able to optimize the profile likelihood with ease."

So, lots of options and I am not going to be late for supper :-D

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.
PharmCat
★    

Russia,
2020-08-09 18:22
(72 d 03:38 ago)

@ ElMaestro
Posting: # 21844
Views: 5,698
 

 Speed improvement

» I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines..

It is very good question to explore. If we suppose that some routine knows something about structure it should make big work to recognize block-diagonal or/and symmetric matrix structure.

I think, main cause is a overhead for each function call. Another problem - many intermediate products for %*%
PharmCat
★    

Russia,
2020-08-10 11:48
(71 d 10:12 ago)

(edited by PharmCat on 2020-08-10 12:11)
@ ElMaestro
Posting: # 21849
Views: 5,627
 

 Some tests...

Hi ElMaestro,

» now it gets really interesting.

I made some tests. And results is surprising.

First code:
using BlockDiagonals
using LinearAlgebra
using BenchmarkTools
Am = rand(150, 150)
Bm = rand(150, 150)
@rput Am
@rput Bm
blocks  = [rand(3,3) for i in 1:50]
BDm     = BlockDiagonal(blocks)
V       = Matrix(BDm)
@rput V
b1 = @benchmark R"solve(V)"
b2 = @benchmark R"solve(Bm)"
b3 = @benchmark inv(Bm)


b1 1.109ms VS b2 3.751ms , so R solve block-diagonal matrix more than twice faster.
In Julia b3 2.707 ms: random matrix solving in Julia faster than in R, but more slower then b1 when matrix is block-diagonal.

Second code:

using RCall
using BlockDiagonals
using LinearAlgebra
using BenchmarkTools
blocks  = [rand(3,3) for i in 1:50]
BDm     = BlockDiagonal(blocks)
V       = Matrix(BDm)
@rput V
@rput blocks
b1 = @benchmark inv(V)
b2 = @benchmark inv.(blocks)
b3 = @benchmark inv(BDm)
b4 = @benchmark R"solve(V)"
b5 = @benchmark R"for (i in 1:50) {solve(blocks[[i]])}"


Solving by blocks in R (b5 3.275 ms) really slower (a little more faster than solving random matrix).

But solving block-diagonal matrix in Julia when structure is described takes 56.484 μs (b3 in second code), this is 20 times faster than in R.

I'm trying to discuss this in Julia forum here.

I'am not so expirienced in R, but I think that Solve.block can increase performance for block-diagonal matrices, docs here.
Activity
 Admin contact
21,170 posts in 4,411 threads, 1,474 registered users;
online 11 (0 registered, 11 guests [including 1 identified bots]).
Forum time: Tuesday 22:01 CEST (Europe/Vienna)

In theory, there is no difference between theory and practice.
But, in practice, there is.    Jan L.A. van de Snepscheut

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5