ElMaestro ★★★ Belgium?, 20200710 13:19 (102 d 08:41 ago) Posting: # 21669 Views: 11,403 

Hi all, I am writing a little optimizer based on REML in R for use with semireplicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not. Does anyone have a dataset with wellestablished REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated Many thanks. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200710 19:03 (102 d 02:57 ago) @ ElMaestro Posting: # 21670 Views: 10,661 

Hi ElMaestro, » I am writing a little optimizer based on REML in R for use with semireplicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not. Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. » Does anyone have a dataset with wellestablished REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated Here you are (for DS II, of course). Results in Phoenix for the FDA’s covariance structure „Banded NoDiagonal Factor Analytic (f=2)” which is in SASlingo FA0(2) . After 7 iterations (2REML 48.2, AIC 26.2):lambda(1,1) 0.19030501 G Matrix* 0.03621600 0.04563730 As usual: WARNING: Newton's algorithm converged with modified Hessian. Output is suspect. s2wT is nonsense, since T is not replicated and the model is overspecified. The lambdas are called in SASlingo FA(x,y) . After 7 iterations (2REML 30.7, AIC 20.7):Cov Parm Subject Group Estimate Also: NOTE: Convergence criteria met but final hessian is not positive definite. No problems with convergence with f=1 (or FA0(1) in SAS). FA0(1) helps in general but not always.lambda(1,1) 0.19030501 Different s2wT but crap as well. You can get anything you like. The same data set throwing warnings in one software but not the other; same with convergence. In Certara’s forum a user posted a data set which failed to converge both in SAS and Phoenix, even with FA0(1) . Then one is in deep shit.Even if a data set converges without warnings, you might get different estimates (except the first one and s2wR) in SAS and Phoenix. Not surprising cause the optimizer walks around in parameterspace trying to find the maximum of a surface which is essentially flat. Increasing the number of iterations and/or decreasing the tolerance rarely helps. I still wait for someone explaining why he/she is using such a wacky design. Beyond me.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200710 19:24 (102 d 02:36 ago) @ Helmut Posting: # 21671 Views: 10,631 

Hi Hötzi, » Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. Challenge accepted Thanks for the results — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200711 11:57 (101 d 10:03 ago) @ ElMaestro Posting: # 21672 Views: 10,143 

Hi ElMaestro, » » Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. » » Challenge accepted Good luck! I added the variancecovariance matrices and the complete warnings above. From this SASians’ document:
NOTE: Convergence criteria met but final hessian is not positive definite. In my understanding:
You can’t fix by analysis — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200711 14:43 (101 d 07:17 ago) @ Helmut Posting: # 21673 Views: 10,103 

Hi again, it is rather experimental. I do not have a level of understanding that allows me to use V=ZGZ^{t}+R ,so I will be constructing V directly from the data listing without the intermediary Z and G. I do not wish to get bogged down by the various aspects of covariance matrices; if CSH does not pan out well in SAS then it is an implenetation or convergence issue only, it is not because CSH is irrelevant. CSH is for all practical purposes the covariance matrix that makes sense, which means we have: A variance for T (combined within and between) A variance for withinR A variance for betweenR Hessian, positive definite, etc.. all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with optim in R (BFGS, LBFGSB bounded, and NelderMead) and with one of my own making.— I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200711 15:38 (101 d 06:22 ago) @ ElMaestro Posting: # 21674 Views: 10,095 

Hi ElMaestro, » it is rather experimental. » […] which means we have: » A variance for T (combined within and between) » A variance for withinR » A variance for betweenR Correct. Will be interesting what you get because \(\sigma_{wT}^2+\sigma_{bT}^2\): 0.04367 (Phoenix) and 0.03771 (SAS). » Hessian, positive definite, etc.. all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with optim in R (BFGS, LBFGSB bounded, and NelderMead) and with one of my own making.If you succeed, the community will love you. Might win you the Fields medal. As long as a genius (you?) is not able to come up with a solution working in all cases, I can only repeat: Avoid partial replicate designs! It might not be possible to evaluate the study for ABE (though RSABE works in any case).— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200712 12:28 (100 d 09:32 ago) @ Helmut Posting: # 21675 Views: 10,024 

Hi all, some help needed. Specifically, I am asking for help to make code work and not to make it smarter (achieving the same with less key strokes, using builtin alternatives etc) although there is generally a preferred focus on the latter here Hopefully we will discuss later the need for additional or other variance components once we have made the optimiser work somehow (or while we are making it work, if such vc's are totally necessary). There are probably a ton of errors to iron out. There is at least one big one whose effects I a suffering The idea is to construct the model from the bottoms up.
Another site for the objective function is: https://www.stt.msu.edu/users/pszhong/Lecture_16_Spring_2017.pdf (page 13+14), but this also did not quite get me to convergence. So, this is what I have so far. I don't understand matrix algebra totally well, and I am at a loss of you say Hermitian, expectationminimisation, Jacobian, etc. What bothers me is that some years ago I wrote a working optimiser in C on basis of slides some some Danish guy called Waagepetersen, who walked the reader slowly through the process of optimising REML. I can't find my code now, it was on another computer, and I can't find his slides any more (there are some slides with his name out there, which I can find with google etc but they do not have that old recipe). rm(list=ls(all=TRUE)) — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200712 13:26 (100 d 08:34 ago) (edited by ElMaestro on 20200712 13:39) @ ElMaestro Posting: # 21676 Views: 9,996 

Important update, I forgot vBR on the diagonal. Now the singuality is less of a heacahe:
rm(list=ls(all=TRUE)) — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200712 21:36 (100 d 00:24 ago) @ ElMaestro Posting: # 21677 Views: 9,847 

After rerererereading the documentation for optim , I think it can be forced to look for either a minimum (default) or for a maximum if need be through the control argument or, more simply , the objective function should be handled for a minimum by putting a minus sign on the return of Obj.F12 ...... but even so the estimates are still somewhat off. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200712 21:46 (100 d 00:14 ago) @ ElMaestro Posting: # 21678 Views: 9,824 

Look at this baby:
rm(list=ls(all=TRUE)) On my machine it gives: $par — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200712 22:27 (99 d 23:33 ago) (edited by ElMaestro on 20200712 23:51) @ ElMaestro Posting: # 21679 Views: 9,709 

which gives
Var.Component Value Ini.value By the way: You can adjust how many decimals you want on e.g. varWR (~s2WR) through the reltol setting. Use e.g. 1e12 to get 0.0.01324652 with just a few more iterations.Job done? — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200713 13:52 (99 d 08:08 ago) @ ElMaestro Posting: # 21684 Views: 9,549 

Hi ElMaestro, » Job done? I prefer medium rare over well done (aka quick and dirty or clean and never)! Below my more Rish code. Changes:
############################### Gives:
Maximum : 75.98722 — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200713 14:18 (99 d 07:42 ago) (edited by ElMaestro on 20200713 14:31) @ Helmut Posting: # 21686 Views: 9,513 

Hi Hötzi, I think expressing the G matrix after this fit is a little backward; The whole point is not to decompose V into ZGZ^{t}+R , because this design makes that decomposition pointless even when the design is otherwise relevant.That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R  they really aren't when the model is constructed my way. The 2trt, 3seq, 3per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution. Note also: The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged. If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200713 15:34 (99 d 06:26 ago) @ ElMaestro Posting: # 21688 Views: 9,482 

Hi ElMaestro, » I think expressing the G matrix after this fit is a little backward;… OK, true. » That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R  they really aren't when the model is constructed my way. Yep. » The 2trt, 3seq, 3per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution. Kudos to your work! I still see no reason to perform a study in one of the partial replicates. Sorry. What is given as ‘justifications’:
Hence, everybody has to believe in the software. As a starter I would expect the same level of error handling, documentation, like the package replicateBE . Otherwise, assessors would be skeptical.» The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged. In Section 2: Final < function(Pars) { At the end: PE < Final(res$Estimate) Gives: PE: 1.372138 In PHX 1.372138 (Patterson/Jones got in SAS 1.37 ). Great!TODO: Get the Satterthwaite’s DFs for the CI. » If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. See this post with links to John’s datasets. Maybe there are others. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200714 03:22 (98 d 18:38 ago) (edited by ElMaestro on 20200714 12:43) @ Helmut Posting: # 21691 Views: 9,396 

Hi Hötzi, » In Obj.F12() the line» B < 0.5*log(det(t(X) %*% solve(CovM) %*% X)) » is the reason for the warnings because t(X) is not a square matrix. Therefore B is NaN.I looked at this one by testing is.nan(B) and printing the components if/when it is. It happens right at the start for this data. The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation. All is in fact hunky dory when it happens, the variance estimates being: [1] 0.06841615 0.03576077 0.01240137 0.05893007 (on my machine) that's nothing special, just a wee change from initial. The issue here is that the determinant suddenly becomes apparently negative. This is mysterious to me at this point, I am not much into matrices. I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards. However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error  you can try it with the vector above. Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful. I believe the issue can not be remedied by fiddling with X. I will be happy to hear a hardcore matrix algebrist's opinion here. Update 8 hours later: I think CovM becomes noninvertible at select levels of a variance component, given the others, when/if a certain value of the variance component means that one or more columns (rows) add up to one or more other columns. At exactly that point CovM will drop in its rank, and it becomes noninvertible; beyond the value that creates this phenomenon CovM is invertiable but the determinant may switch sign. It is a phenomenon with a singularity in the determinant easily explained in the numbers, but I cannot interpret the higher perspective. I am grateful that NM in the implementation of optim apparently safeguards against this phenomenon as long as the first call to the objective is evaluable. Whodathunkit? — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200714 13:58 (98 d 08:02 ago) @ ElMaestro Posting: # 21692 Views: 9,330 

Hi ElMaestro, » The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation. Ah, yes! » The issue here is that the determinant suddenly becomes apparently negative. Might point towards a similar issue in PHX complaining about negative variance components and SAS forcing them to zero. » This is mysterious to me at this point, I am not much into matrices. So am I. Sorry. » I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards. Luckily. » However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error  you can try it with the vector above. » Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful. But how? » I believe the issue can not be remedied by fiddling with X. Agree. » Update 8 hours later: … Sounds reasonable. But I think that we are going in circles. This story is all about RSABE where we have to go for ABE if s_{wR} <0.294 (or the mixedmodel for ABE in general). The scaling part is easily doable in R. Since you found a solution for estimating s_{wR} and the p
For the imbalanced dataset of this post PHX gives me: Δ 0.03084859 SE_{Δ} 0.03256626 df 18.24154 and therefore, \(\small{CI=\exp (\Delta\mp t_{0.05,df}\times SE)=\left \{ 0.9747, 1.0912 \right \}}\). — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200714 15:07 (98 d 06:53 ago) @ Helmut Posting: # 21693 Views: 9,332 

Hi again Hötzi, still at a loss, how? From my own perspective, this was not about creating a package (roof, walls) but just to find a way to use an optimiser to find the solid estimate of swr for the TRR/RTR/RRT design (base), and this is what was deemed rather difficult at the outset. It succeeded when I did not express V directly and not via ZGZ^{t}+R. Did not take so much time as I had feared. Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)? I am not sure I can do it, not sure I can't, either. Did not look into it at all, for me this part is a nonissue as is generalising the idea here to make it work for all other designs. Will be happy to take a look at the other stuff, too, if I have something to contribute with. For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200714 15:32 (98 d 06:28 ago) @ ElMaestro Posting: # 21695 Views: 9,314 

Hi ElMaestro, » still at a loss, how? Amazingly you solved an old riddle – which is only part of the game in RSABE. » […] Did not take so much time as I had feared. Well, 60+ hours is a lot for a spare time project. Welcome to the club. » Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)? Exactly. I still don’t see how we could mimic the FDA’s mixed model for ABE in R. » I am not sure I can do it, not sure I can't, either. Heads up. You could be the first succeeding in R. PharmCat produced something in Julia (see there). Duno the state of affairs. » Will be happy to take a look at the other stuff, too, if I have something to contribute with. » For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. By “we” I’ve meant the community hoping for an open source solution. Sorry for the huge amount of time you’ve spent on this. s_{wR} is only needed as a decision point whether to go for RSABE or ABE. Great that we can get it now in R but the evaluation for ABE remains a mystery. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200713 10:13 (99 d 11:47 ago) @ Helmut Posting: # 21681 Views: 9,634 

Hi all, The solution I posted is a total winner. Where do I collect my medal? Will be happy to learn of performance on other datasets, in particular of differences between results or nonconvergence (which is often something related to instability caused by the initial guess being too far from the optimum). Obviously, with this implementation the input format has to meet the expectation of the algo. For example, T and R coded as T and R, not A and B, in this implementation. All this is unrelated to the task of this thread and can be rather easily worked out as well. A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ) to reflect the optimal solution:
Some.Initial.Guesses=function(foo) — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200713 14:16 (99 d 07:44 ago) @ ElMaestro Posting: # 21685 Views: 9,540 

Hi ElMaestro, » The solution I posted is a total winner. Where do I collect my medal? Hey, it looks great indeed! However, that’s only the foundation of the house. Walls, a door, windows, and a roof would be nice. » Will be happy to learn of performance on other datasets, … Try my code with path < "https://raw.githubusercontent.com/Helmut01/replicateBE/master/inst/extdata/DS04.csv" That’s an PHX (convergence after 8 iterations without warnings, 2REML 292.14, AIC 314.14): lambda(1,1) 0.43651691 Manual: lambda(1,1)^2=var_bR=0.1905470 My master’s code: Component Estimate Initial Not that bad. » … in particular of differences between results or nonconvergence (which is often something related to instability caused by the initial guess being too far from the optimum). See this post with links to John’s nasty datasets. Will see whether I still have the one which failed completely both in SAS and PHX. » A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ) to reflect the optimal solution: Will check that later.^{2}
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
PharmCat ★ Russia, 20200715 14:19 (97 d 07:41 ago) @ ElMaestro Posting: # 21699 Views: 9,131 

Hello all! As I understand some code for mixed model solution was done? Great! » Will be happy to learn of performance on other datasets, in particular of differences between results or nonconvergence (which is often something related to instability caused by the initial guess being too far from the optimum). Here you can find some self generated (and some public) datasets for package validation and results from SPSS here. Also you can make any random dataset with any design with Julia, docs here. Is is possible to make easy performance comparison on my machine? Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post #5! [Helmut] 
ElMaestro ★★★ Belgium?, 20200802 17:39 (79 d 04:21 ago) @ PharmCat Posting: # 21817 Views: 6,721 

Hi PharmCat, » Is is possible to make easy performance comparison on my machine? the code on this page would suffice for the comparison, however I am not going in the direction of estimating denominator DF's and calculating CI's. I am solely interested in redefining our specification of the covariance matrix so that the model that is being fit contains exactly the variance components we can realistically ask about for a 3period semireplicated study. I have reached that goal. Also, I am writing an optimiser and it works bloody great! Not terribly fast but seems to get there in a very orderly fashion — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200715 14:27 (97 d 07:33 ago) @ ElMaestro Posting: # 21700 Views: 9,137 

Hi ElMaestro, we were all blind! Just to make clear what I mean by “we”: Myself, you, the EMA’s PKWP (and the BSWP fabricating the example datasets and the evaluation by ‘Method C’), and Patterson & Jones. Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REMLestimate (see this post). Fortunately the templates for Phoenix are correct. Quick & dirty Rcode at the end. Results: file descr s2wR swR CVwR (%) PHX match In the Q&A we find for DSII the REMLbased CV_{wR} 11.5% (PHX 11.43%). Patterson & Jones reported for their dataset 61% (SAS); I got in PHX 61.96%. For DSI the Q&A gives 47.3% (PHX 46.96%). Therefore, congratulations to all of us answering a question which was not asked by the FDA.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200715 16:25 (97 d 05:36 ago) @ Helmut Posting: # 21701 Views: 9,079 

Hi again, to me it is exactly the opposite way around: A model with more than one variance component is a mixed model. I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance. My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design. I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part. For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML? What would your answer imply for any future recommendations for guidelines/guidances? (I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators) This thread already caught the attention of a number cruncher at an agency across the Atlantic unofficially, so I am OK with it all — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200715 17:33 (97 d 04:27 ago) @ ElMaestro Posting: # 21702 Views: 9,072 

Hi ElMaestro, » to me it is exactly the opposite way around: » A model with more than one variance component is a mixed model. Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course. » I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance. Well, the EMA’s models are wacky because they assume equal intrasubject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away. I don’t know why the FDA does not directly estimate \(s_{wR}^2\) from the mixed model but work with the intrasubject contrasts of R instead. More below. » My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design. OK. » I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part. With the current guidance one gets always an estimate of \(s_{wR}^2\). No problems in the RSABEpart if ≥0.0294. Only in the mixedmodel for ABE and the partial replicate designs one may run into the convergence issues. That’s not resolved yet, unless one uses a simpler^{1} variancecovariance structure, which is against the guidance. But – again and again – there are cases were nothing helped, neither in SAS nor in Phoenix. Duno whether it is possible to specify NelderMead in SAS or Phoenix.^{2} That’s why I still hold that the partial replicates are lousy designs because one might end up with no result for ABE unless one opts for the EMA’s models (which are wrong, IMHO). OK, one step back. Since the mixed model for ABE is recommended by the FDA for ages, I wonder what people have done when they faced convergence problems… » For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML? Good question, next question. _{Maybe REML…} » What would your answer imply for any future recommendations for guidelines/guidances? Oh dear! I participated in all GBHI workshops. The discussions about referencescaling (Rockville 2016, Amsterdam 2018) were disastrous. What do we have?
» (I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators) Agree.
— Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
d_labes ★★★ Berlin, Germany, 20200715 18:13 (97 d 03:47 ago) @ Helmut Posting: # 21703 Views: 9,055 

Hi Helmut, Hi ElMaestro, » ... » Well, the EMA’s models are wacky because they assume equal intrasubject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away. Full ACK! » I don’t know why the FDA does not directly estimate \(\small{s_\textrm{wR}^2}\) from the mixed model but work with the intrasubject contrasts of R instead. ... I was always wondering about the different methodologies in the code of the progesteron guidance:
Politics? Nostalgia (Since years recommended the Proc MIXED code)? Or the other way round: Why not use the components of the RSABE criterion (pe and 90% CI, s2wR) from the mixed model approach? I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial . — Regards, Detlew 
Helmut ★★★ Vienna, Austria, 20200716 11:11 (96 d 10:49 ago) @ d_labes Posting: # 21709 Views: 8,863 

Hi Detlew, » Why not use the ISC estimate of TR in the ABE decision in case of swR < 0.029356 (CVwR < 30%)? » Politics? Nostalgia (Since years recommended the Proc MIXED code)? The guidance “Statistical Approaches to Establishing Bioequivalence” of January 2001 (where the same SAS code is given in APPENDIX E) is still in force. Hence, it is only consistent to give it in the progesterone guidance. » I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial . Cannot agree more. All problems would vanish. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200715 23:06 (96 d 22:54 ago) @ Helmut Posting: # 21705 Views: 9,003 

Hi Hötzi, » » to me it is exactly the opposite way around: » » A model with more than one variance component is a mixed model. » » Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course. May I offer the completely opposite view? Search the forum for "bogus statement" , check the SAS documentation for what the statement actually actually does to Proc GLM: When ProcGLM is used for 222BE with or without the bogus statement it still means a normal linear model is fit (even when the theory in e.g. C&L calls it "random"). There is a single variance component in such fits. If you wish to verify it: There is a model matrix with columns for subjects; check the df's for such models, it would be (much) higher if subjects were fit as random. Conversely, if subject were random subject would not appear as a factor in the anova. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
Helmut ★★★ Vienna, Austria, 20200716 10:59 (96 d 11:01 ago) @ ElMaestro Posting: # 21708 Views: 8,866 

Hi ElMaestro, » May I offer the completely opposite view? Search the forum for "bogus statement" , check the SAS documentation for what the statement actually actually does to Proc GLM: […] I know, I know. I was talking about a mixed model and not the way how SAS uses the RANDOM statement in PROC GLM in order to get the DFs right. A true all fixed effects model – with subjects instead of the crazy subjects(sequence) – requires unique coding of subjcts. I practice always the case (OK, maybe not in multicenter studies but then you have center as another fixed effect). I recently reviewed a manuscript where subjects in each sequence had the same codes. The authors obviously had no clue about how things work. BTW, same agencies require that in crossovers CV_{intra} and CV_{inter} are reported. The latter is not possible when subject is a fixed effect. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
mittyri ★★ Russia, 20200719 00:42 (93 d 21:18 ago) @ Helmut Posting: # 21745 Views: 8,327 

Hi Helmut, » Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REMLestimate (see this post). Fortunately the templates for Phoenix are correct. are you referring to that template for RSABE? — Kind regards, Mittyri 
Helmut ★★★ Vienna, Austria, 20200719 01:45 (93 d 20:15 ago) @ mittyri Posting: # 21747 Views: 8,321 

Hi mittyri, » are you referring to that template for RSABE? Nope. I was using “FDA RSABE Project template_ v1.4.phxproj” of 20141111 (should be accessible at https://www.certarauniversity.com/store/637337106flfreephoenixtemplatesforbioequivalenceregulatoryguidances but my credentials were not accepted (why the heck?) and when I tried to reset the password I didn’t receive the confirmation email…). Relevant part:
The main difference to v1.3 is that v1.4 uses the function chiinv(0.95,df) further down which was not available in PHX6.3 and a table of values had to be used instead.However, with v1.3 you get the same result like in this post. PS: Ref.35 of doi:10.1208/s1224802004276 leads to limbo. Not very professional. I asked Certara’s support to establish a permanent redirect on the server. — Diftor heh smusma 🖖 Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes 
ElMaestro ★★★ Belgium?, 20200724 10:07 (88 d 11:53 ago) @ ElMaestro Posting: # 21783 Views: 7,610 

Hi all, to make the objective function return the likelihood, use e.g.:
Obj.F12=function(Pars) The optimised object (F in my original code) will hold the optimised likelihood as F$value. Opportunity: Will run a tiny bit faster if k is public, and if solve(CovM) is done a single time. For example at reltol=1e8 I get with EMA dataset II a value of 15.33728. This matches exactly what SAS reports, see post # 21670 above. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200724 12:52 (88 d 09:08 ago) @ ElMaestro Posting: # 21785 Views: 7,570 

Obj.F12=function(Pars) ....this is exactly also the REML likelihood function that SAS defines. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200803 14:24 (78 d 07:36 ago) @ ElMaestro Posting: # 21820 Views: 6,672 

» ....this is exactly also the REML likelihood function that SAS defines. Hello! Do you calculate REML for all data at once or calculate REML subject by subject? Is it possible to use function caching in R? 
ElMaestro ★★★ Belgium?, 20200803 22:55 (77 d 23:05 ago) @ PharmCat Posting: # 21821 Views: 6,624 

Hi Pharmcat, » Do you calculate REML for all data at once or calculate REML subject by subject? REML is not generally done on a persubject basis. Each model is assumed to have an optimal fit, given the data and the variance/covariance structure. The optimal fit, when putatively found, corresponds to the fixed effects and the variance components, and the log likelihood serves as objective function. I am sure I am missing somethign clever in your question. Can you describe what you meant? » Is it possible to use function caching in R? You can always write some public members to that effect where necessary and appropriate. I don't think R has a builtin optimizer which takes care of it as the language is interpreted, but other users of this forum should confirm; I am not expert on the inner workings of R. I do not see how this would affect the REML process  can you describe with more words what your perspective on caching was, then we'll possibly figure out something? — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200805 01:41 (76 d 20:19 ago) @ ElMaestro Posting: # 21823 Views: 6,482 

» Can you describe what you meant? Hello ElMaestro! logREML of all data is just sum of individual subject logREML. See Mary J. Lindstrom and Douglas M. Bates, NewtonRaphson and EM Algorithms for Linear MixedEffects Models for RepeatedMeasures Data. You can make big sparce V matrix for all data and then inverce it ( solve(CovM) in your code above) or you can take indiviual X_{i} and V_{i} and calculate logREML for each subject and sum. Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it blockdiagonal) it can be done faster. Problem2: In BE case you have many equal Z_{i} matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result  you can seriously increase performance. So, I didn't read all R code (sorry) and I didn't understand what approach is used. 
ElMaestro ★★★ Belgium?, 20200805 08:13 (76 d 13:47 ago) @ PharmCat Posting: # 21824 Views: 6,468 

Thanks PharmCat, funny, I read the ref. you mention (Lindstrom & Bates) yesterday and I did not understand much of it. Will try again. » Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it blockdiagonal) it can be done faster. Sounds neat, will check if I can somehow do such a thing. » Problem2: In BE case you have many equal Z_{i} matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result  you can seriously increase performance. My approach is very simple: I make a case for saying that when T is not replicated while R is replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens). If we insest on ZGZt+R then we get irrelevant variance components associated with T whichever way we parameterise the covariance matrix. Therefore, I am working in V directly, meaning I am writing the blocks into it directly. This may still be compatible with your proposal, though. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200805 16:37 (76 d 05:23 ago) (edited by PharmCat on 20200806 00:34) @ ElMaestro Posting: # 21825 Views: 6,421 

Hello! » Sounds neat, will check if I can somehow do such a thing. I found Lindstrom&Bates, equation 3.4 or J. GURKA, 2006, equation 5. Some experiment: m = matrix(c(5, 1, 0, 0, 1, 2, 0, 0, 0, 0, 5, 3, 0, 0, 3, 4), nrow = 4, ncol = 4) So V_{i} can be result of function ƒ(Z_{i}, θ); Z_{i} is known for all subjects, θ  vector of optimizables parameters. » My approach is very simple: I make a case for saying that when T is not replicated while R is » replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens). Mmm... I can't say that ZGZ'+R is meaningful. If you construct V manualy it can still be represented as ZGZ'+R if ρ not equal 0. If used random model with covariation (ρ≠0) nondiagonal elements of G depends on diagonal and when you construct V some nondiagonal elements depends on both variance parameters. Then we add diagonal R matrix and have combination (parts of V): σ^{2}_{BT}+σ^{2}_{WT}, σ^{2}_{BR}+σ^{2}_{WR}, σ^{2}_{BR}, σ_{BT}*σ_{BR}*ρ (we have no σ^{2}_{BT}) so we can estimate σ^{2}_{WT} from σ^{2}_{BT}+σ^{2}_{WT} and σ^{2}_{BT} is stable because it is part of σ_{BT}*σ_{BR}*ρ (Yes estimate of σ^{2}_{WT} is indirect). If ρ = 0 we have σ_{BT}*σ_{BR}*ρ = 0, and we can estimate only σ^{2}_{BT}+σ^{2}_{WT}. And I think that V can be decomposed to ZGZ'+R if you know Z, ρ (ρ≠0) and σ^{2}_{BR} in model with covariance and diagonal R. 
ElMaestro ★★★ Belgium?, 20200806 21:11 (75 d 00:49 ago) (edited by ElMaestro on 20200806 21:23) @ PharmCat Posting: # 21827 Views: 6,240 

Hi PharmCat, I really like your idea. I don't have a good understanding of this as a whole but I know that when I work with a little these things they become easier for me to grasp, so I am giving it a try. Let us turn to Gurka 2006 equation 3 and 5 and remembering that the rank of our overall design matrix X is 6: Gurka requires me to derive Xi, the design matrix for fixed effects of subject i. They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject with rank p" At just three observations per subject we cannot have a rank of p=6 for Xi. The maximum rank is naturally 3. I don't see a way to handle Xi on a subject by subject basis, but perhaps you have some insight. Possibly I could do it by lumping subjects together so that the "smaller" X of sorts still has more rows thank columns and thus a rank of p=6. Do you see my issue with Xi? Could you indicate your view on Xi here? Is lumping subjects together a way to go to ensure the right rank of the reduced design matrices? Many thanks in advance. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200807 00:02 (74 d 21:58 ago) @ ElMaestro Posting: # 21828 Views: 6,241 

Hi ElMaestro! » Gurka requires me to derive Xi, the design matrix for fixed effects of subject i. » They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject » with rank p" I think it means rank of whole X, because after: "and β is a (p × 1) vector of unknown, constant population parameters". X_{i} is a part of X and have same count of columns, rank X_{i} can be lower. Here is no contradictions I think. If look at Xiuming Zhang, 2015* p is just length of β and rank should be the same. If not, we will have rankdeficient matrix. And if look generally  it really doesn't matter because (Np)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets  this will matter. And as I know some softaware using different N for calculation (some use all observations, some only statistically independent (m)), and some include second constant part: logdet(Σ_{i}^{m}(X'X))/2 So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets. *A Tutorial on Restricted Maximum Likelihood Estimation in Linear Regression and Linear MixedEffects Model Example N=3, n=9, p=6 X: β1 β2 β3 β4 β5 β6 X_{1} β1 β2 β3 β4 β5 β6 X_{2} β1 β2 β3 β4 β5 β6 X_{3} β1 β2 β3 β4 β5 β6 
ElMaestro ★★★ Belgium?, 20200807 07:49 (74 d 14:11 ago) @ PharmCat Posting: # 21829 Views: 6,186 

Hi PharmCat, » And if look generally  it really doesn't matter because (Np)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets  this will matter. I am afraid you missed my point. The issue is not related to constants. As you say they are not of importance to find the maximum, which will just be vertically displaced. Xi is the design matrix for fixed effects for the i'th subject. Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject). Thus we cannot use Xi if i is a single subject during the optimization for the RTR/TRR/RRT design. Applying Gurkas equations will not work and I am looking for a way to work on derived versions of X (like for a few subjects at a time?) so that X retains in rank of 6 (in this design). » So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets. Don't worry, the constants are of no importance to find the variance components. » Example N=3, n=9, p=6 These examples of X are possibly not very illustrative; as you can see column 2+3 add up to column 1, the intercept, column 4+5+6 add up to that as well, and so forth. X is columnwise not full rank. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200807 11:29 (74 d 10:31 ago) @ ElMaestro Posting: # 21830 Views: 6,167 

Hi ElMaestro! I thought you asked about p, sorry :) » Xi is the design matrix for fixed effects for the i'th subject. » Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject). I think X_{i} is not shuld be fullrank design matrix. It is just part of X for subject i. Moreover all X_{i} should be the same width (num of columns) for Σ(X_{i}'V^{1}X_{i}). Also with (y_{i}  X_{i}*β) you get residuals r_{i} (col num of X_{i} should be equal length β). There is not very accurate description in Gurka's paper, but it provide good concept. If you check, it should work. 
ElMaestro ★★★ Belgium?, 20200807 13:08 (74 d 08:52 ago) (edited by ElMaestro on 20200807 13:25) @ PharmCat Posting: # 21831 Views: 6,161 

Hi PharmCat, thanks for trying to assist on this. It is kind of you. » I think X_{i} is not shuld be fullrank design matrix. It is just part of X for subject i. Moreover all X_{i} should be the same width (num of columns) for Σ(X_{i}'V^{1}X_{i}). Also with (y_{i}  X_{i}*β) you get residuals r_{i} (col num of X_{i} should be equal length β). » There is not very accurate description in Gurka's paper, but it provide good concept. » If you check, it should work. I can't make it work. In Gurka's equation 3, where we can write the shorthand beta= M_{1}^{1}*M_{2}: When using the little 3by6 matrices for Xi, M_{1} becomes noninvertible. I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model. I wonder if Gurka's equation only works under assumptions not mentioned in the paper or if I am just a very lousy programmer. My money is clearly on the latter — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200807 15:42 (74 d 06:18 ago) @ ElMaestro Posting: # 21833 Views: 6,143 

Hi ElMaestro! » I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model. If you calculate β with initial V  it will be wrong. It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y But if you have wrong β at final iteration  it is bad case. You can find same equation in LINDSTROM&BATES  1.2 (where X not a design matix, but vector of X_{i}, IMHO) If you provide me data from 6 subjects I can make calculations in Julia. Magical linear algebra in julia:
using LinearAlgebra 
ElMaestro ★★★ Belgium?, 20200807 16:44 (74 d 05:16 ago) @ PharmCat Posting: # 21834 Views: 6,129 

Hi PharmCat. » If you calculate β with initial V  it will be wrong. Correct and that is not what I am doing. For every iteration through the optimizer I am deriving a new beta vector; this is beta given V. The loglikelihood is calculated on basis of V and beta, where beta is derived from V via X. Therefore LL can be said to be a function of V (or of the variance components in it). There are two alternative ways I can get that beta vector for any given V: 1. I can do it the simple way with the very big X. It works well. 2. I can perhaps (=I should be able to?) do it by partition X into small bits, Xi, like in Gurka equation 3, or Lindstrom&Bates. But I cannot get the same beta vector for a given set of variance components in V. The or a reason is that if I reduce X to Xi (individual subjects) and do all the summation then the M1 and written above is not invertible. This, in turn, appears to be because the individual Xi's are not of the same rank as X. If I lump Xi's together like the first 6 subjects, then the next 6 subjects etc, and so forth and use Gurka's equation 3, then I can generate a beta estimate (from 4 chunks of 6 subjects), but it is still not the same as in pt 1, for any given set of components in V. I am not even at a final iteration, I am just at any iteration where I am comparing beta via the two methods for a set of starting values, or for that matter, if I plug in the values that come out as optimal values from pt 1 above after everything is coverged. » It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y You do not need this since you do not need starting values for beta, cf the LL above. You only need starter values for theta (variance components in V). Every pass through the REML LL equation generates a beta estimate from the variance compnents, given X. Once the algo has converge d the estimate of beta is final. All this imaterial to my challenge at hand. » But if you have wrong β at final iteration  it is bad case. This is not about the "final iterattion". It is about any iteration. For any iteraiton, the two ways to derive beta results in different values. Therefore something s wrong. » You can find same equation in LINDSTROM&BATES  1.2 (where X not a design matix, but vector of X_{i}, IMHO) » If you provide me data from 6 subjects I can make calculations in Julia. Here all subjects are (y is logarithmised): Subj Seq Trt Per y 1 RTR R 1 8.307361 1 RTR T 2 8.286622 1 RTR R 3 8.229191 2 RTR R 1 8.001757 2 RTR T 2 7.774351 2 RTR R 3 7.939016 3 RRT R 1 8.150295 3 RRT R 2 8.113786 3 RRT T 3 8.301224 4 TRR T 1 8.319961 4 TRR R 2 8.068152 4 TRR R 3 8.243703 5 TRR T 1 8.469640 5 TRR R 2 8.421255 5 TRR R 3 8.278936 6 RRT R 1 8.023159 6 RRT R 2 8.015393 6 RRT T 3 7.791358 7 RTR R 1 7.836054 7 RTR T 2 8.030084 7 RTR R 3 7.993823 8 TRR T 1 7.698483 8 TRR R 2 7.621391 8 TRR R 3 7.609862 9 RTR R 1 8.444106 9 RTR T 2 8.333174 9 RTR R 3 8.131531 10 TRR T 1 7.708949 10 TRR R 2 7.766586 10 TRR R 3 7.705803 11 RRT R 1 7.530373 11 RRT R 2 7.701833 11 RRT T 3 7.780888 12 RRT R 1 7.731229 12 RRT R 2 8.061613 12 RRT T 3 8.275682 13 RRT R 1 7.878686 13 RRT R 2 7.795811 13 RRT T 3 7.961789 14 RTR R 1 8.016582 14 RTR T 2 7.807835 14 RTR R 3 7.996452 15 RTR R 1 7.720639 15 RTR T 2 7.598299 15 RTR R 3 7.910003 16 TRR T 1 7.992809 16 TRR R 2 8.143808 16 TRR R 3 8.114504 17 TRR T 1 7.781890 17 TRR R 2 7.885856 17 TRR R 3 7.683404 18 RRT R 1 7.910224 18 RRT R 2 7.939373 18 RRT T 3 8.054078 19 RTR R 1 7.791027 19 RTR T 2 7.919283 19 RTR R 3 7.825645 20 RRT R 1 7.886983 20 RRT R 2 7.982689 20 RRT T 3 8.018691 21 RRT R 1 7.961928 21 RRT R 2 7.888485 21 RRT T 3 8.029107 22 TRR T 1 7.989221 22 TRR R 2 7.981665 22 TRR R 3 7.956967 23 TRR T 1 8.056680 23 TRR R 2 8.066396 23 TRR R 3 8.174308 24 RTR R 1 7.536257 24 RTR T 2 7.500419 24 RTR R 3 7.912350 — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200807 18:14 (74 d 03:46 ago) (edited by PharmCat on 20200808 01:12) @ ElMaestro Posting: # 21835 Views: 6,126 

» Here all subjects are (y is logarithmised) # Theta estimate θ was taken by REML using CSV, DataFrames, StatsModels, StatsBase I get beta: 7.904258681084915 0.0547761264037151 0.05092362547466389 0.0012959346740553102 0.048118895192829976 0.02239133333333365 
ElMaestro ★★★ Belgium?, 20200807 18:23 (74 d 03:37 ago) @ PharmCat Posting: # 21836 Views: 6,114 

Hi PharmCat, it is kind of you, but this looks like apples compared to pears. The whole point is that G and R are meaningless so I cannot compare to an output generated with a model involving G and R. I am doing V directly, because we do not work with within and betweens for T. That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here. Note also, I am not after correlations per se. Therefore I am writing the variances directly into the covariance matrix (perhaps like you are used to if you work with compound symmetry of the heterogenous type). If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
ElMaestro ★★★ Belgium?, 20200807 21:31 (74 d 00:29 ago) (edited by ElMaestro on 20200807 22:48) @ ElMaestro Posting: # 21837 Views: 6,063 

I think I made beta work now, thanks a lot PharmCat It appears that beta can be constructed even if the local Xi is not of rank p; I guess it must be a little typo in Gurka's paper. Anyways, I reprogrammed the lines, and now I get conformable matrices, and beta is the same whichever way I calculate it. May try and see if I can do some comparisons about the performance, I still need a lot of code to put it all into function. Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200808 01:08 (73 d 20:52 ago) @ ElMaestro Posting: # 21838 Views: 6,046 

Hi ElMaestro! » That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here. I understand your point. And generaly your variance component function is better because is not redundant. From other side if we have some different functions V = ƒ_{1}(θ_{1}) and V = ƒ_{2}(θ_{2}) it could be any and from any θ if we get equal V (so it could be CSH, FA0(2), manual construct). » If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model. I have 15.337299690629834, model is the same it is good proof: different V construction methods lead to same results. » I guess it must be a little typo in Gurka's paper. From time to time I find inaccuracies even in good articles and sometime it give strong headake when you try directly reproduce some methods. log(ΣX) instead log(ΣX) can give hours of frustration. » Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help. I am very glad if my thoughts were useful 
ElMaestro ★★★ Belgium?, 20200808 12:25 (73 d 09:36 ago) @ PharmCat Posting: # 21839 Views: 6,023 

I am seeing about a 30% speed improvement on my machine when using arrays of matrices. Am using lists for that purpose, that's the only way I can think of. Further improvement may be possible, because I may only need to store principally three different matrices in these arrays (per array) an keep track of which subjects use which matrices (that would be defined by their sequences), but this, I think, for now is not a priority for me to do. A further option is to only use one kind of matrix and then reorder all periods in the global data list, so that all subjects have e.g. RTR, then we'd need to work with only a single matrix type in the profiles; this may improve speed further. Will think about it... 30%...It ain't much but it's honest work. — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200808 17:27 (73 d 04:33 ago) (edited by PharmCat on 20200808 17:43) @ ElMaestro Posting: # 21840 Views: 5,903 

» 30%...It ain't much but it's honest work. It is very good boost. I donn't know how matrix multipication works in R, but some improovments can be done if make function A = B'*C*B or A += B'*C*B. At each step of this function there are many intermediate products that can be avoided with direct computation. (Or make it in C) Example incremantal function:
""" julia> @time mulsimple!(θ, β, A, B, C); 0.000007 seconds (10 allocations: 1.172 KiB) julia> @time mulθβinc!(θ, β, A, B, C); 0.000004 seconds (2 allocations: 144 bytes) eightfold memory savings 
ElMaestro ★★★ Belgium?, 20200808 18:10 (73 d 03:50 ago) @ ElMaestro Posting: # 21841 Views: 5,894 

Hi again, now it gets really interesting. I made two observations: 1. There is this 30% improvement if I switch to profile log likelihood in stead of "the big one" for lac kof betyter wording 2. I can achieve about an improvement of several thousand percent if stay with the big X and V, however, I must then use a smart tactic: I am only generating V once, but though each pass on the optimiser I am updating V rather than rewriting it. Instead of iterating across all rows and column and checking in the data list where to put the individual components, I am initially generating a list of coordinates for each variance component. Something like this: V.Coordinates=function(Data) This returns a list with four components. The first component is a set of coordinates in V for varT, the second is a set of coordinates for varBR. And so forth. Then, in the REML function, I am having a block like this:
for (i in 1:nrow(VC$xy.varT)) And now are simply talking REAL BUSINESS The optimiser is returning blazingly fast with an REML estimate and corresponding variance components. Depending on the tolerance required it is almost instantaneous (<0.5 secs) on my laptop and that machine isn't a high tech equipment at all. The speed with "big V" far outruns the profile approach. I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines.. However, I am also seeing a different little thing: If I optimise with the profile approach I can make BFGS converge, but this will not work if I try it it in the "big V" approach. Lindstrom and Bates noticed something similar in the sense that they wrote that QuasiNewton methods perform better when the profile approach is used: "We recommend optimizing the profile log likelihood, since it will usually require fewer iterations, the derivatives are somewhat simpler, and the convergence is more consistent. We have also encountered examples where the NR algorithm failed to converge when opti mizing the likelihood including a but was able to optimize the profile likelihood with ease." So, lots of options and I am not going to be late for supper — I could be wrong, but... Best regards, ElMaestro R's base package has 274 reserved words and operators, along with 1761 functions. I can use 18 of them (about 14 of them properly). I believe this makes me the Donald Trump of programming. 
PharmCat ★ Russia, 20200809 18:22 (72 d 03:38 ago) @ ElMaestro Posting: # 21844 Views: 5,698 

» I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines.. It is very good question to explore. If we suppose that some routine knows something about structure it should make big work to recognize blockdiagonal or/and symmetric matrix structure. I think, main cause is a overhead for each function call. Another problem  many intermediate products for %*% 
PharmCat ★ Russia, 20200810 11:48 (71 d 10:12 ago) (edited by PharmCat on 20200810 12:11) @ ElMaestro Posting: # 21849 Views: 5,627 

Hi ElMaestro, » now it gets really interesting. I made some tests. And results is surprising. First code:
using BlockDiagonals b1 1.109ms VS b2 3.751ms , so R solve blockdiagonal matrix more than twice faster. In Julia b3 2.707 ms: random matrix solving in Julia faster than in R, but more slower then b1 when matrix is blockdiagonal. Second code: using RCall Solving by blocks in R (b5 3.275 ms) really slower (a little more faster than solving random matrix). But solving blockdiagonal matrix in Julia when structure is described takes 56.484 μs (b3 in second code), this is 20 times faster than in R. I'm trying to discuss this in Julia forum here. I'am not so expirienced in R, but I think that Solve.block can increase performance for blockdiagonal matrices, docs here. 