Who can help? [R for BE/BA]

posted by ElMaestro  – Belgium?, 2020-07-12 12:28 (78 d 01:35 ago) – Posting: # 21675
Views: 9,606

Hi all,

some help needed.:-D

Specifically, I am asking for help to make code work and not to make it smarter (achieving the same with less key strokes, using built-in alternatives etc) although there is generally a preferred focus on the latter here :-)

Hopefully we will discuss later the need for additional or other variance components once we have made the optimiser work somehow (or while we are making it work, if such vc's are totally necessary). There are probably a ton of errors to iron out. There is at least one big one whose effects I a suffering :-D

The idea is to construct the model from the bottoms up.
  1. First I do a full-rank design matrix X for the fixed effects, without subjects
  2. Next I do the covariance matrix with just three variance components as discussed above.
  3. I generate some cheap starter guesses for the variance components
  4. I define an objective function for "a constant + log likelihood"
  5. I optimise the variance components, in REML the effect estimates will come along with them.
I am so far facing a singularity with the apparent log likelihood, so obviously something is terribly wrong. I include a ref to the function where I got inspiration for the log likelihood objective function.
Another site for the objective function is: https://www.stt.msu.edu/users/pszhong/Lecture_16_Spring_2017.pdf (page 13+14), but this also did not quite get me to convergence.


So, this is what I have so far. I don't understand matrix algebra totally well, and I am at a loss of you say Hermitian, expectation-minimisation, Jacobian, etc.

What bothers me is that some years ago I wrote a working optimiser in C on basis of slides some some Danish guy called Waagepetersen, who walked the reader slowly through the process of optimising REML. I can't find my code now, it was on another computer, and I can't find his slides any more (there are some slides with his name out there, which I can find with google etc but they do not have that old recipe).






rm(list=ls(all=TRUE))

###### par vector is c(varT, varBR, varWR)

######
######   Section 1: Household things
######



CreateX=function(D)
{
 ##first the two treatments
 TrtT=as.numeric(as.character(D$Trt)=="T")
 TrtR=as.numeric(as.character(D$Trt)=="R")
 X=cbind(TrtT, TrtR)
 R= qr(X)$rank

 ## subjects: in a mixed model with subj as random
 ## we do not do subjects also as fixed, therefore they are #'ed away here
 ## for (s in unique(D$Subj))
 ## {
 ## v=as.numeric(D$Subj==s)
 ## #print(v)
 ## XX=data.frame(X, v)
 ## names(XX)[ncol(XX)] = paste("S", s, sep="")
 ## rnk=qr(XX)$rank
 ## if (rnk>R)
  ##  {
  ##     X=XX
  ##     R=rnk
  ##  }
  ## }
 ##now the Pers
 for (p in unique(D$Per))
 {
  v=as.numeric(D$Per==p)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("P", p, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }

 for (q in unique(D$Seq))
 {
  v=as.numeric(D$Seq==q)
  #print(v)
  XX=data.frame(X, v)
  names(XX)[ncol(XX)] = paste("Q", q, sep="")
  rnk=qr(XX)$rank
  if (rnk>R)
    {
       X=XX
       R=rnk
    }
 }
 return(as.matrix(X))
}




Create.CovM=function(Params)
##block diagonal covariance matrix
{
  varT=Params[1]
  varBR=Params[2]
  varWR=Params[3]
  #varRT=Params[4]

  #cat("Vars:", varT, varBR, varWR,"\n")

  Nobs=length(D$Y)
  V=matrix(0,ncol=Nobs, nrow=Nobs)
  for (iRow in 1:Nobs)
  for (iCol in 1:Nobs)
  {
   
   ## the diagonal
   if (iCol==iRow)
    {
      if (D$Trt[iRow]=="T") V[iRow,iCol]=V[iRow,iCol]+varT
      if (D$Trt[iRow]=="R") V[iRow,iCol]=V[iRow,iCol]+varWR
    }

   ## off diagonal
   if (iCol!=iRow)
   if (D$Subj[iRow]==D$Subj[iCol])
    {
     if (D$Trt[iCol]==D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varBR
     #if (D$Trt[iCol]!=D$Trt[iRow]) V[iRow,iCol]= V[iRow,iCol]+varRT
    }
  }
  return(as.matrix(V))
}

######
######   Section 2: Matrix things
######


Obj.F12=function(Pars)
##this is line 3 of page 10 of:
##http://people.csail.mit.edu/xiuming/docs/tutorials/reml.pdf
{
  CovM=Create.CovM(Pars)

  A= -0.5*log(det(CovM))
  B= -0.5*log(det(t(X) %*% solve(CovM) %*% X))
  est.b = solve(t(X) %*% solve(CovM) %*% X) %*% t(X) %*% solve(CovM) %*% y
  tmp= y - X %*% est.b
  C=-0.5 *(t(tmp) %*% solve(CovM) %*% tmp)
  return(A+B+C)
}



Some.Initial.Guesses=function(foo)
{
  #guess variance within R
  D1=subset(D, D$Trt=="R")
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per) , data=D1)
  #print(anova(m))
  varWR= anova(m)["Residuals", "Mean Sq"]
  m=lm(log(Y)~factor(Subj)+factor(Seq)+factor(Per)+factor(Trt) , data=D)
  varBR= anova(m)["factor(Subj)", "Mean Sq"]
  ##total var T, cheaply
  ##actually, this may be much better: varT=varBR+varWR
  D2=subset(D, D$Trt=="T")
  varT=var(log(D2$Y))
  varT=varBR+varWR
  L=list(varT=varT, varBR=varBR, varWR=varWR)
  return(L)
}



####
#### section 3: execution
####



D=read.csv("EMAII.a.csv" , header=T, sep="\t") ##public = easier
X=CreateX(D) ##public = easier
y=log(D$Y) ##public = easier


Check.Surf=function(i, x1,x2, N)
## Checks the log likelihood surface of the i'th variance component
## (given the other variance components) for the interval var = x1 to x2
## with a granularity of N points.
{
  L=Some.Initial.Guesses(0)
  parIni=c(L$varT, L$varBR, L$varWR )
  x=x1;
  vX=NULL
  vY=NULL
  dx=(x2-x1)/N
  for (j in 1:N)
  {
    vX=c(vX, x)
    parIni[i]=x
    vY=c(vY, Obj.F12(parIni))
    x=x+dx
  }
  plot(vX, vY)
}

MyREML=function(foo.bar)
## ideally this would work, but it doesn't right now :-(
## also, I have a feeling the surface may be quite rugged,
## so I may switch to a homecooked optimiser
## once I have ironed out the singularity issue, see below.
{
  L=Some.Initial.Guesses(0)
  print(L)
  parIni=c(L$varT, L$varBR, L$varWR)
  F=optim(par=parIni, fn=Obj.F12, #method="L-BFGS-B",lower=c(0,0,0),
      control=list(reltol=1e-10, trace=T))
  print(F)
}
# if ti worked it'd be called through
# MyREML(0)

# however:

Check.Surf(1, 0.01, 0.2, 50)
  ## ouch, that's a singularity.
  ## so something is wrong in the covariance matrix
  ## or in the "const + likelihood function"                     
  ## or elsewhere. Any ideas?

#
#
#

I could be wrong, but...

Best regards,
ElMaestro

R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming.

Complete thread:

Activity
 Admin contact
21,082 posts in 4,397 threads, 1,468 registered users;
online 9 (0 registered, 9 guests [including 1 identified bots]).
Forum time: Monday 14:03 CEST (Europe/Vienna)

A central lesson of science is that to understand complex issues
(or even simple ones), we must try to free our minds of dogma and
to guarantee the freedom to publish, to contradict, and to experiment.
Arguments from authority are unacceptable.    Carl Sagan

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5