ElMaestro ★★★ Denmark, 2020-07-10 15:19 (1558 d 15:08 ago) Posting: # 21669 Views: 21,901 |
|
Hi all, I am writing a little optimizer based on REML in R for use with semi-replicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not. Does anyone have a dataset with well-established REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated Many thanks. — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-10 21:03 (1558 d 09:24 ago) @ ElMaestro Posting: # 21670 Views: 20,504 |
|
Hi ElMaestro, ❝ I am writing a little optimizer based on REML in R for use with semi-replicated designs. I am starting from the bottom and working upwards, since the various packages do not really help when R is replicated and T is not. Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. ❝ Does anyone have a dataset with well-established REML reference values for the variance components for the TRR/RRT/RTR design? A little more than the CVref of EMA dataset II would be appreciated Here you are (for DS II, of course). Results in Phoenix for the FDA’s covariance structure „Banded No-Diagonal Factor Analytic (f=2)” which is in SAS-lingo FA0(2) . After 7 iterations (-2REML -48.2, AIC -26.2):lambda(1,1) 0.19030501 G Matrix* 0.03621600 0.04563730 As usual: WARNING: Newton's algorithm converged with modified Hessian. Output is suspect. s2wT is nonsense, since T is not replicated and the model is over-specified. The lambdas are called in SAS-lingo FA(x,y) . After 7 iterations (-2REML -30.7, AIC -20.7):Cov Parm Subject Group Estimate Also: NOTE: Convergence criteria met but final hessian is not positive definite. No problems with convergence with f=1 (or FA0(1) in SAS). FA0(1) helps in general but not always.lambda(1,1) 0.19030501 Different s2wT but crap as well. You can get anything you like. The same data set throwing warnings in one software but not the other; same with convergence. In Certara’s forum a user posted a data set which failed to converge both in SAS and Phoenix, even with FA0(1) . Then one is in deep shit.Even if a data set converges without warnings, you might get different estimates (except the first one and s2wR) in SAS and Phoenix. Not surprising cause the optimizer walks around in parameter-space trying to find the maximum of a surface which is essentially flat. Increasing the number of iterations and/or decreasing the tolerance rarely helps. I still wait for someone explaining why he/she is using such a wacky design. Beyond me.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-10 21:24 (1558 d 09:03 ago) @ Helmut Posting: # 21671 Views: 20,365 |
|
Hi Hötzi, ❝ Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. Challenge accepted Thanks for the results — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-11 13:57 (1557 d 16:30 ago) @ ElMaestro Posting: # 21672 Views: 20,072 |
|
Hi ElMaestro, ❝ ❝ Wow! Nobody mastered that in R so far. I don’t want to dampen your enthusiasm but I think that is not possible at all. ❝ ❝ Challenge accepted Good luck! I added the variance-covariance matrices and the complete warnings above. From this SASians’ document:
NOTE: Convergence criteria met but final hessian is not positive definite. In my understanding:
You can’t fix by analysis — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-11 16:43 (1557 d 13:45 ago) @ Helmut Posting: # 21673 Views: 19,868 |
|
Hi again, it is rather experimental. I do not have a level of understanding that allows me to use V=ZGZt+R ,so I will be constructing V directly from the data listing without the intermediary Z and G. I do not wish to get bogged down by the various aspects of covariance matrices; if CSH does not pan out well in SAS then it is an implenetation or convergence issue only, it is not because CSH is irrelevant. CSH is for all practical purposes the covariance matrix that makes sense, which means we have: A variance for T (combined within and between) A variance for within-R A variance for between-R Hessian, positive definite, etc.. all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with optim in R (BFGS, L-BFGS-B bounded, and Nelder-Mead) and with one of my own making.— Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-11 17:38 (1557 d 12:49 ago) @ ElMaestro Posting: # 21674 Views: 19,852 |
|
Hi ElMaestro, ❝ it is rather experimental. ❝ […] which means we have: ❝ A variance for T (combined within and between) ❝ A variance for within-R ❝ A variance for between-R Correct. Will be interesting what you get because \(\sigma_{wT}^2+\sigma_{bT}^2\): 0.04367 (Phoenix) and 0.03771 (SAS). ❝ Hessian, positive definite, etc.. all this comes down to the details of SAS's or WinNonlin's optimiser. I am playing around with If you succeed, the community will love you. Might win you the Fields medal. As long as a genius (you?) is not able to come up with a solution working in all cases, I can only repeat: Avoid partial replicate designs! It might not be possible to evaluate the study for ABE (though RSABE works in any case).— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-12 14:28 (1556 d 16:00 ago) @ Helmut Posting: # 21675 Views: 19,794 |
|
Hi all, some help needed. Specifically, I am asking for help to make code work and not to make it smarter (achieving the same with less key strokes, using built-in alternatives etc) although there is generally a preferred focus on the latter here Hopefully we will discuss later the need for additional or other variance components once we have made the optimiser work somehow (or while we are making it work, if such vc's are totally necessary). There are probably a ton of errors to iron out. There is at least one big one whose effects I a suffering The idea is to construct the model from the bottoms up.
Another site for the objective function is: https://www.stt.msu.edu/users/pszhong/Lecture_16_Spring_2017.pdf (page 13+14), but this also did not quite get me to convergence. So, this is what I have so far. I don't understand matrix algebra totally well, and I am at a loss of you say Hermitian, expectation-minimisation, Jacobian, etc. What bothers me is that some years ago I wrote a working optimiser in C on basis of slides some some Danish guy called Waagepetersen, who walked the reader slowly through the process of optimising REML. I can't find my code now, it was on another computer, and I can't find his slides any more (there are some slides with his name out there, which I can find with google etc but they do not have that old recipe). rm(list=ls(all=TRUE)) — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-07-12 15:26 (1556 d 15:01 ago) @ ElMaestro Posting: # 21676 Views: 19,735 |
|
Important update, I forgot vBR on the diagonal. Now the singuality is less of a heacahe:
rm(list=ls(all=TRUE)) — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-07-12 23:36 (1556 d 06:52 ago) @ ElMaestro Posting: # 21677 Views: 19,646 |
|
After re-re-re-re-reading the documentation for optim , I think it can be forced to look for either a minimum (default) or for a maximum if need be through the control argument or, more simply, the objective function should be handled for a minimum by putting a minus sign on the return of Obj.F12 ...... but even so the estimates are still somewhat off. — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-07-12 23:46 (1556 d 06:42 ago) @ ElMaestro Posting: # 21678 Views: 19,561 |
|
Look at this baby:
rm(list=ls(all=TRUE)) On my machine it gives: $par — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-07-13 00:27 (1556 d 06:01 ago) @ ElMaestro Posting: # 21679 Views: 19,503 |
|
which gives
Var.Component Value Ini.value By the way: You can adjust how many decimals you want on e.g. varWR (~s2WR) through the reltol setting. Use e.g. 1e-12 to get 0.0.01324652 with just a few more iterations.Job done? — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-13 15:52 (1555 d 14:36 ago) @ ElMaestro Posting: # 21684 Views: 19,293 |
|
Hi ElMaestro, ❝ Job done? I prefer medium rare over well done (aka quick and dirty or clean and never)! Below my more R-ish code. Changes:
############################### Gives:
Maximum : 75.98722 — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-13 16:18 (1555 d 14:09 ago) @ Helmut Posting: # 21686 Views: 19,267 |
|
Hi Hötzi, I think expressing the G matrix after this fit is a little backward; The whole point is not to decompose V into ZGZt+R , because this design makes that decomposition pointless even when the design is otherwise relevant.That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R - they really aren't when the model is constructed my way. The 2-trt, 3-seq, 3-per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution. Note also: The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged. If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-13 17:34 (1555 d 12:53 ago) @ ElMaestro Posting: # 21688 Views: 19,361 |
|
Hi ElMaestro, ❝ I think expressing the G matrix after this fit is a little backward;… OK, true. ❝ That SAS and WinNonlin and more do not have a way to make provisions for a solution outside those involving this decomposition, it merely a testament to their development capabilities than to the design itself. If you start defining the the stats model on basis of G and R rather than V then obviously you must create two variance components which are not uniquely estimable, but whose sum is. Therefore, just screw R and G, go directly for V, and present the components without in any way calling them members of G or R - they really aren't when the model is constructed my way. Yep. ❝ The 2-trt, 3-seq, 3-per design is good, healthy and appropriate, until now it was just the statistical work on it that has been lagging. The post is an attempt at providing a solution. Kudos to your work! I still see no reason to perform a study in one of the partial replicates. Sorry. What is given as ‘justifications’:
Hence, everybody has to believe in the software. As a starter I would expect the same level of error handling, documentation, like the package replicateBE . Otherwise, assessors would be skeptical.❝ The PE will be accessible via the est.b vector. You shuld be able to extract it (difference between first two effects, one good reason to do X without intercept) after the model has converged. In Section 2: Final <- function(Pars) { At the end: PE <- Final(res$Estimate) Gives: PE: 1.372138 In PHX 1.372138 (Patterson/Jones got in SAS 1.37 ). Great!TODO: Get the Satterthwaite’s DFs for the CI. ❝ If you have a dataset that does not readily converge with SAS then I'd be eager to learn if it does in some flavour of my code. See this post with links to John’s datasets. Maybe there are others. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-14 05:22 (1555 d 01:06 ago) (edited on 2020-07-14 12:43) @ Helmut Posting: # 21691 Views: 19,225 |
|
Hi Hötzi, ❝ In ❝ ❝ is the reason for the warnings because I looked at this one by testing is.nan(B) and printing the components if/when it is. It happens right at the start for this data. The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation. All is in fact hunky dory when it happens, the variance estimates being: [1] 0.06841615 0.03576077 0.01240137 0.05893007 (on my machine)- that's nothing special, just a wee change from initial. The issue here is that the determinant suddenly becomes apparently negative. This is mysterious to me at this point, I am not much into matrices. I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards. However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error - you can try it with the vector above. Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful. I believe the issue can not be remedied by fiddling with X. I will be happy to hear a hardcore matrix algebrist's opinion here. Update 8 hours later: I think CovM becomes noninvertible at select levels of a variance component, given the others, when/if a certain value of the variance component means that one or more columns (rows) add up to one or more other columns. At exactly that point CovM will drop in its rank, and it becomes non-invertible; beyond the value that creates this phenomenon CovM is invertiable but the determinant may switch sign. It is a phenomenon with a singularity in the determinant easily explained in the numbers, but I cannot interpret the higher perspective. I am grateful that NM in the implementation of optim apparently safeguards against this phenomenon as long as the first call to the objective is evaluable. Whodathunkit? — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-14 15:58 (1554 d 14:30 ago) @ ElMaestro Posting: # 21692 Views: 19,167 |
|
Hi ElMaestro, ❝ The problem is not that X changes dimensions, X is the design matrix and this is generally never a square matrix, just full column rank with as many rows as observations. X does not change at all during the optimisation. Ah, yes! ❝ The issue here is that the determinant suddenly becomes apparently negative. Might point towards a similar issue in PHX complaining about negative variance components and SAS forcing them to zero. ❝ This is mysterious to me at this point, I am not much into matrices. So am I. Sorry. ❝ I take it as a case of NM taking some unfortunate decision, which is then corrected afterwards. Luckily. ❝ However this does give food for thought: if the parameter vector that gives a negative determinant is chosen as a starting value then NM will stop immediately and throw an error - you can try it with the vector above. ❝ Therefore, it may be desirable to ascertain positiveness of that determinant (and the others?) when starting the algo, and to slightly modify individual starter values in case of such suboptimally chosen starter values until Obj.F12 actually returns something meaningful. But how? ❝ I believe the issue can not be remedied by fiddling with X. Agree. ❝ Update 8 hours later: … Sounds reasonable. But I think that we are going in circles. This story is all about RSABE where we have to go for ABE if swR <0.294 (or the mixed-model for ABE in general). The scaling part is easily doable in R. Since you found a solution for estimating swR and the p
For the imbalanced dataset of this post PHX gives me: Δ 0.03084859 SEΔ 0.03256626 df 18.24154 and therefore, \(\small{CI=\exp (\Delta\mp t_{0.05,df}\times SE)=\left \{ 0.9747, 1.0912 \right \}}\). — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-14 17:07 (1554 d 13:21 ago) @ Helmut Posting: # 21693 Views: 19,036 |
|
Hi again Hötzi, still at a loss, how? From my own perspective, this was not about creating a package (roof, walls) but just to find a way to use an optimiser to find the solid estimate of swr for the TRR/RTR/RRT design (base), and this is what was deemed rather difficult at the outset. It succeeded when I did not express V directly and not via ZGZt+R. Did not take so much time as I had feared. Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)? I am not sure I can do it, not sure I can't, either. Did not look into it at all, for me this part is a non-issue as is generalising the idea here to make it work for all other designs. Will be happy to take a look at the other stuff, too, if I have something to contribute with. For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-14 17:32 (1554 d 12:56 ago) @ ElMaestro Posting: # 21695 Views: 19,190 |
|
Hi ElMaestro, ❝ still at a loss, how? Amazingly you solved an old riddle – which is only part of the game in RSABE. ❝ […] Did not take so much time as I had feared. Well, 60+ hours is a lot for a spare time project. Welcome to the club. ❝ Did you mean to say that now it was shown possible, there is a need for other components (and that these are the tricky ones?)? Exactly. I still don’t see how we could mimic the FDA’s mixed model for ABE in R. ❝ I am not sure I can do it, not sure I can't, either. Heads up. You could be the first succeeding in R. PharmCat produced something in Julia (see there). Duno the state of affairs. ❝ Will be happy to take a look at the other stuff, too, if I have something to contribute with. ❝ For me, all this was just about the s2wr. For you ("we") maybe it is seen differently? Eager to learn of it, will see without being certain if I can produce something. By “we” I’ve meant the community hoping for an open source solution. Sorry for the huge amount of time you’ve spent on this. swR is only needed as a decision point whether to go for RSABE or ABE. Great that we can get it now in R but the evaluation for ABE remains a mystery. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-13 12:13 (1555 d 18:14 ago) @ Helmut Posting: # 21681 Views: 19,448 |
|
Hi all, The solution I posted is a total winner. Where do I collect my medal? Will be happy to learn of performance on other datasets, in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum). Obviously, with this implementation the input format has to meet the expectation of the algo. For example, T and R coded as T and R, not A and B, in this implementation. All this is unrelated to the task of this thread and can be rather easily worked out as well. A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ) to reflect the optimal solution:
Some.Initial.Guesses=function(foo) — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-13 16:16 (1555 d 14:11 ago) @ ElMaestro Posting: # 21685 Views: 19,588 |
|
Hi ElMaestro, ❝ The solution I posted is a total winner. Where do I collect my medal? Hey, it looks great indeed! However, that’s only the foundation of the house. Walls, a door, windows, and a roof would be nice. ❝ Will be happy to learn of performance on other datasets, … Try my code with path <- "https://raw.githubusercontent.com/Helmut01/replicateBE/master/inst/extdata/DS04.csv" That’s an PHX (convergence after 8 iterations without warnings, -2REML 292.14, AIC 314.14): lambda(1,1) 0.43651691 Manual: lambda(1,1)^2=var_bR=0.1905470 My master’s code: Component Estimate Initial Not that bad. ❝ … in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum). See this post with links to John’s nasty datasets. Will see whether I still have the one which failed completely both in SAS and PHX. ❝ A slight change this morning, for the initial guesses, less clumsy and more likely (in the nonrestricted fashion ) to reflect the optimal solution: Will check that later.2
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
PharmCat ★ Russia, 2020-07-15 16:19 (1553 d 14:09 ago) @ ElMaestro Posting: # 21699 Views: 18,884 |
|
Hello all! As I understand some code for mixed model solution was done? Great! ❝ Will be happy to learn of performance on other datasets, in particular of differences between results or non-convergence (which is often something related to instability caused by the initial guess being too far from the optimum). Here you can find some self generated (and some public) datasets for package validation and results from SPSS here. Also you can make any random dataset with any design with Julia, docs here. Is is possible to make easy performance comparison on my machine? Edit: Full quote removed. Please delete everything from the text of the original poster which is not necessary in understanding your answer; see also this post #5! [Helmut] |
ElMaestro ★★★ Denmark, 2020-08-02 19:39 (1535 d 10:48 ago) @ PharmCat Posting: # 21817 Views: 16,421 |
|
Hi PharmCat, ❝ Is is possible to make easy performance comparison on my machine? the code on this page would suffice for the comparison, however I am not going in the direction of estimating denominator DF's and calculating CI's. I am solely interested in redefining our specification of the covariance matrix so that the model that is being fit contains exactly the variance components we can realistically ask about for a 3-period semireplicated study. I have reached that goal. Also, I am writing an optimiser and it works bloody great! Not terribly fast but seems to get there in a very orderly fashion — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-15 16:27 (1553 d 14:01 ago) @ ElMaestro Posting: # 21700 Views: 18,941 |
|
Hi ElMaestro, we were all blind! Just to make clear what I mean by “we”: Myself, you, the EMA’s PKWP (and the BSWP fabricating the example datasets and the evaluation by ‘Method C’), and Patterson & Jones. Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REML-estimate (see this post). Fortunately the templates for Phoenix are correct. Quick & dirty R-code at the end. Results: file descr s2wR swR CVwR (%) PHX match In the Q&A we find for DSII the REML-based CVwR 11.5% (PHX 11.43%). Patterson & Jones reported for their dataset 61% (SAS); I got in PHX 61.96%. For DSI the Q&A gives 47.3% (PHX 46.96%). Therefore, congratulations to all of us answering a question which was not asked by the FDA.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-15 18:25 (1553 d 12:03 ago) @ Helmut Posting: # 21701 Views: 18,926 |
|
Hi again, to me it is exactly the opposite way around: A model with more than one variance component is a mixed model. I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance. My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design. I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part. For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML? What would your answer imply for any future recommendations for guidelines/guidances? (I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators) This thread already caught the attention of a number cruncher at an agency across the Atlantic unofficially, so I am OK with it all — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-15 19:33 (1553 d 10:54 ago) @ ElMaestro Posting: # 21702 Views: 18,915 |
|
Hi ElMaestro, ❝ to me it is exactly the opposite way around: ❝ A model with more than one variance component is a mixed model. Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course. ❝ I am of the impression that FDA and EMA have both been in a slight predicament with their guidances because they have both been "somewhat" aware of the specification/convergence issues with REML; hence both went in a direction with all effect s fixed as done explicitly by EMA and implicitly by FDA (when they use GLM for swr) in the guidance. Well, the EMA’s models are wacky because they assume equal intra-subject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away. I don’t know why the FDA does not directly estimate \(s_{wR}^2\) from the mixed model but work with the intra-subject contrasts of R instead. More below. ❝ My contribution here is solely to open someone's eyes to the fact that we may not need to rely on a normal linear model for s2wr in a TRR/RTR/RRT design. OK. ❝ I would much more like to think of this thread as a contribution to regulators defining the next iteration of giudelines/guidances. Whether they will see it as an advantage to use REML to derive s2wr now it is shown to be doable (not causing convergence issues, because it is not involving the silly and unestimable s2wt component), is of course their decision solely. I have done my part. With the current guidance one gets always an estimate of \(s_{wR}^2\). No problems in the RSABE-part if ≥0.0294. Only in the mixed-model for ABE and the partial replicate designs one may run into the convergence issues. That’s not resolved yet, unless one uses a simpler1 variance-covariance structure, which is against the guidance. But – again and again – there are cases were nothing helped, neither in SAS nor in Phoenix. Duno whether it is possible to specify Nelder-Mead in SAS or Phoenix.2 That’s why I still hold that the partial replicates are lousy designs because one might end up with no result for ABE unless one opts for the EMA’s models (which are wrong, IMHO). OK, one step back. Since the mixed model for ABE is recommended by the FDA for ages, I wonder what people have done when they faced convergence problems… ❝ For consideration further down the line, for those who enter discussions about estimator bias being catastrophic: Which estimator is more biased when handling RRT/RTR/TRR: the s2wr you get from the linear model or the one you get via REML? Good question, next question. Maybe REML… ❝ What would your answer imply for any future recommendations for guidelines/guidances? Oh dear! I participated in all GBHI workshops. The discussions about reference-scaling (Rockville 2016, Amsterdam 2018) were disastrous. What do we have?
❝ (I am not a hardliner myself: I am usually thinking biased estimators are still useful estimators) Agree.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
d_labes ★★★ Berlin, Germany, 2020-07-15 20:13 (1553 d 10:15 ago) @ Helmut Posting: # 21703 Views: 18,923 |
|
Hi Helmut, Hi ElMaestro, ❝ ... ❝ Well, the EMA’s models are wacky because they assume equal intra-subject variances of T and R. An assumption which was shown to be false in many cases. Essentially most of the information obtainable in a replicate design is thrown away. Full ACK! ❝ I don’t know why the FDA does not directly estimate \(\small{s_\textrm{wR}^2}\) from the mixed model but work with the intra-subject contrasts of R instead. ... I was always wondering about the different methodologies in the code of the progesteron guidance:
Politics? Nostalgia (Since years recommended the Proc MIXED code)? Or the other way round: Why not use the components of the RSABE criterion (pe and 90% CI, s2wR) from the mixed model approach? I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial . — Regards, Detlew |
Helmut ★★★ Vienna, Austria, 2020-07-16 13:11 (1552 d 17:17 ago) @ d_labes Posting: # 21709 Views: 18,628 |
|
Hi Detlew, ❝ Why not use the ISC estimate of T-R in the ABE decision in case of swR < 0.029356 (CVwR < 30%)? ❝ Politics? Nostalgia (Since years recommended the Proc MIXED code)? The guidance “Statistical Approaches to Establishing Bioequivalence” of January 2001 (where the same SAS code is given in APPENDIX E) is still in force. Hence, it is only consistent to give it in the progesterone guidance. ❝ I would opt for the full ISC approach because it may be unambiguously implemented in SAS, R, Phoenix and so on for every replicate design, full or partial . Cannot agree more. All problems would vanish. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-16 01:06 (1553 d 05:22 ago) @ Helmut Posting: # 21705 Views: 18,734 |
|
Hi Hötzi, ❝ ❝ to me it is exactly the opposite way around: ❝ ❝ A model with more than one variance component is a mixed model. ❝ ❝ Not necessarily. It depends on what you believe [sic] is a random effect. Treatment, sequence, and period are fixed effects, right? IMHO, subject is random. When we think about interaction(s) we enter the gray zone, of course. May I offer the completely opposite view? Search the forum for "bogus statement" , check the SAS documentation for what the statement actually actually does to Proc GLM: When ProcGLM is used for 222BE with or without the bogus statement it still means a normal linear model is fit (even when the theory in e.g. C&L calls it "random"). There is a single variance component in such fits. If you wish to verify it: There is a model matrix with columns for subjects; check the df's for such models, it would be (much) higher if subjects were fit as random. Conversely, if subject were random subject would not appear as a factor in the anova. — Pass or fail! ElMaestro |
Helmut ★★★ Vienna, Austria, 2020-07-16 12:59 (1552 d 17:28 ago) @ ElMaestro Posting: # 21708 Views: 18,646 |
|
Hi ElMaestro, ❝ May I offer the completely opposite view? Search the forum for "bogus statement" , check the SAS documentation for what the statement actually actually does to Proc GLM: […] I know, I know. I was talking about a mixed model and not the way how SAS uses the RANDOM statement in PROC GLM in order to get the DFs right. A true all fixed effects model – with subjects instead of the crazy subjects(sequence) – requires unique coding of subjcts. In practice always the case (OK, maybe not in multi-center studies but then you have center as another fixed effect). I recently reviewed a manuscript where subjects in each sequence had the same codes. The authors obviously had no clue about how things work. BTW, same agencies require that in crossovers CVintra and CVinter are reported. The latter is not possible when subject is a fixed effect. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
mittyri ★★ Russia, 2020-07-19 02:42 (1550 d 03:45 ago) @ Helmut Posting: # 21745 Views: 18,137 |
|
Hi Helmut, ❝ Detlew read the guidance’s Step 1 thoroughly enough to realize that \(\small{s_\textrm{wR}^2}\) is not an REML-estimate (see this post). Fortunately the templates for Phoenix are correct. are you referring to that template for RSABE? — Kind regards, Mittyri |
Helmut ★★★ Vienna, Austria, 2020-07-19 03:45 (1550 d 02:43 ago) @ mittyri Posting: # 21747 Views: 18,570 |
|
Hi mittyri, ❝ are you referring to that template for RSABE? Nope. I was using “FDA RSABE Project template_ v1.4.phxproj” of 2014-11-11 (should be accessible at https://www.certarauniversity.com/store/637337-106-fl-free-phoenix-templates-for-bioequivalence-regulatory-guidances but my credentials were not accepted (why the heck?) and when I tried to reset the password I didn’t receive the confirmation e-mail…). Relevant part:
The main difference to v1.3 is that v1.4 uses the function chiinv(0.95,df) further down which was not available in PHX6.3 and a table of values had to be used instead.However, with v1.3 you get the same result like in this post. PS: Ref.35 of doi:10.1208/s12248-020-0427-6 leads to limbo. Not very professional. I asked Certara’s support to establish a permanent redirect on the server. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2020-07-24 12:07 (1544 d 18:21 ago) @ ElMaestro Posting: # 21783 Views: 17,330 |
|
Hi all, to make the objective function return the likelihood, use e.g.:
Obj.F12=function(Pars) The optimised object (F in my original code) will hold the optimised likelihood as F$value. Opportunity: Will run a tiny bit faster if k is public, and if solve(CovM) is done a single time. For example at reltol=1e-8 I get with EMA dataset II a value of 15.33728. This matches exactly what SAS reports, see post # 21670 above. — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-07-24 14:52 (1544 d 15:36 ago) @ ElMaestro Posting: # 21785 Views: 17,319 |
|
Obj.F12=function(Pars) ....this is exactly also the REML likelihood function that SAS defines. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-03 16:24 (1534 d 14:04 ago) @ ElMaestro Posting: # 21820 Views: 16,429 |
|
❝ ....this is exactly also the REML likelihood function that SAS defines. Hello! Do you calculate REML for all data at once or calculate REML subject by subject? Is it possible to use function caching in R? |
ElMaestro ★★★ Denmark, 2020-08-04 00:55 (1534 d 05:32 ago) @ PharmCat Posting: # 21821 Views: 16,323 |
|
Hi Pharmcat, ❝ Do you calculate REML for all data at once or calculate REML subject by subject? REML is not generally done on a per-subject basis. Each model is assumed to have an optimal fit, given the data and the variance/covariance structure. The optimal fit, when putatively found, corresponds to the fixed effects and the variance components, and the log likelihood serves as objective function. I am sure I am missing something clever in your question. Can you describe what you meant? ❝ Is it possible to use function caching in R? You can always write some public members to that effect where necessary and appropriate. I don't think R has a built-in optimizer which takes care of it as the language is interpreted, but other users of this forum should confirm; I am not expert on the inner workings of R. I do not see how this would affect the REML process - can you describe with more words what your perspective on caching was, then we'll possibly figure out something? — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-05 03:41 (1533 d 02:47 ago) @ ElMaestro Posting: # 21823 Views: 16,226 |
|
❝ Can you describe what you meant? Hello ElMaestro! logREML of all data is just sum of individual subject logREML. See Mary J. Lindstrom and Douglas M. Bates, Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data. You can make big sparce V matrix for all data and then inverce it ( solve(CovM) in your code above) or you can take indiviual Xi and Vi and calculate logREML for each subject and sum. Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it block-diagonal) it can be done faster. Problem2: In BE case you have many equal Zi matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result - you can seriously increase performance. So, I didn't read all R code (sorry) and I didn't understand what approach is used. |
ElMaestro ★★★ Denmark, 2020-08-05 10:13 (1532 d 20:14 ago) @ PharmCat Posting: # 21824 Views: 16,257 |
|
Thanks PharmCat, funny, I read the ref. you mention (Lindstrom & Bates) yesterday and I did not understand much of it. Will try again. ❝ Problem1: when you have big matrix invercing is slow as hell, but if you inverce by blocks (you can do it because it block-diagonal) it can be done faster. Sounds neat, will check if I can somehow do such a thing. ❝ Problem2: In BE case you have many equal Zi matrices, so in logREML calculation (if you calc by subject) you inverce one matrix many times (because V = ZGZ'+R, Z can be different, G and R not changing subject by subject). If you cache result - you can seriously increase performance. My approach is very simple: I make a case for saying that when T is not replicated while R is replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens). If we insest on ZGZt+R then we get irrelevant variance components associated with T whichever way we parameterise the covariance matrix. Therefore, I am working in V directly, meaning I am writing the blocks into it directly. This may still be compatible with your proposal, though. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-05 18:37 (1532 d 11:51 ago) (edited on 2020-08-06 00:34) @ ElMaestro Posting: # 21825 Views: 16,237 |
|
Hello! ❝ Sounds neat, will check if I can somehow do such a thing. I found Lindstrom&Bates, equation 3.4 or J. GURKA, 2006, equation 5. Some experiment: m = matrix(c(5, 1, 0, 0, 1, 2, 0, 0, 0, 0, 5, 3, 0, 0, 3, 4), nrow = 4, ncol = 4) So Vi can be result of function ƒ(Zi, θ); Zi is known for all subjects, θ - vector of optimizables parameters. ❝ My approach is very simple: I make a case for saying that when T is not replicated while R is ❝ replicated, then it is not possible to make a meaningful model when V=ZGZt+R (as long as we count on R having the withins and G having the betweens). Mmm... I can't say that ZGZ'+R is meaningful. If you construct V manualy it can still be represented as ZGZ'+R if ρ not equal 0. If used random model with covariation (ρ≠0) nondiagonal elements of G depends on diagonal and when you construct V some nondiagonal elements depends on both variance parameters. Then we add diagonal R matrix and have combination (parts of V): σ2BT+σ2WT, σ2BR+σ2WR, σ2BR, σBT*σBR*ρ (we have no σ2BT) so we can estimate σ2WT from σ2BT+σ2WT and σ2BT is stable because it is part of σBT*σBR*ρ (Yes estimate of σ2WT is indirect). If ρ = 0 we have σBT*σBR*ρ = 0, and we can estimate only σ2BT+σ2WT. And I think that V can be decomposed to ZGZ'+R if you know Z, ρ (ρ≠0) and σ2BR in model with covariance and diagonal R. |
ElMaestro ★★★ Denmark, 2020-08-06 23:11 (1531 d 07:17 ago) @ PharmCat Posting: # 21827 Views: 15,951 |
|
Hi PharmCat, I really like your idea. I don't have a good understanding of this as a whole but I know that when I work with a little these things they become easier for me to grasp, so I am giving it a try. Let us turn to Gurka 2006 equation 3 and 5 and remembering that the rank of our overall design matrix X is 6: Gurka requires me to derive Xi, the design matrix for fixed effects of subject i. They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject with rank p" At just three observations per subject we cannot have a rank of p=6 for Xi. The maximum rank is naturally 3. I don't see a way to handle Xi on a subject by subject basis, but perhaps you have some insight. Possibly I could do it by lumping subjects together so that the "smaller" X of sorts still has more rows thank columns and thus a rank of p=6. Do you see my issue with Xi? Could you indicate your view on Xi here? Is lumping subjects together a way to go to ensure the right rank of the reduced design matrices? Many thanks in advance. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-07 02:02 (1531 d 04:26 ago) @ ElMaestro Posting: # 21828 Views: 15,953 |
|
Hi ElMaestro! ❝ Gurka requires me to derive Xi, the design matrix for fixed effects of subject i. ❝ They mention that "Xi is a (ni × p) known, constant design matrix for the i' th subject ❝ with rank p" I think it means rank of whole X, because after: "and β is a (p × 1) vector of unknown, constant population parameters". Xi is a part of X and have same count of columns, rank Xi can be lower. Here is no contradictions I think. If look at Xiuming Zhang, 2015* p is just length of β and rank should be the same. If not, we will have rank-deficient matrix. And if look generally - it really doesn't matter because -(N-p)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets - this will matter. And as I know some softaware using different N for calculation (some use all observations, some only statistically independent (m)), and some include second constant part: logdet(Σim(X'X))/2 So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets. *A Tutorial on Restricted Maximum Likelihood Estimation in Linear Regression and Linear Mixed-Effects Model Example N=3, n=9, p=6 X: β1 β2 β3 β4 β5 β6 X1 β1 β2 β3 β4 β5 β6 X2 β1 β2 β3 β4 β5 β6 X3 β1 β2 β3 β4 β5 β6 |
ElMaestro ★★★ Denmark, 2020-08-07 09:49 (1530 d 20:39 ago) @ PharmCat Posting: # 21829 Views: 15,972 |
|
Hi PharmCat, ❝ And if look generally - it really doesn't matter because -(N-p)*log(2π)/2 is a constant and it can be excluded from optimization. If you want find AIC ets - this will matter. I am afraid you missed my point. The issue is not related to constants. As you say they are not of importance to find the maximum, which will just be vertically displaced. Xi is the design matrix for fixed effects for the i'th subject. Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject). Thus we cannot use Xi -if i is a single subject- during the optimization for the RTR/TRR/RRT design. Applying Gurkas equations will not work and I am looking for a way to work on derived versions of X (like for a few subjects at a time?) so that X retains in rank of 6 (in this design). ❝ So if you want find only θ, you can minimize REML without constants at all. And add constants when you need to have real value of REML or AIC, BIC, ets. Don't worry, the constants are of no importance to find the variance components. ❝ Example N=3, n=9, p=6 These examples of X are possibly not very illustrative; as you can see column 2+3 add up to column 1, the intercept, column 4+5+6 add up to that as well, and so forth. X is column-wise not full rank. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-07 13:29 (1530 d 16:59 ago) @ ElMaestro Posting: # 21830 Views: 16,036 |
|
Hi ElMaestro! I thought you asked about p, sorry :) ❝ Xi is the design matrix for fixed effects for the i'th subject. ❝ Thus is has a rank of max 3 and never more. This means we can at most use it to work with three parameter estimates. For example an intercept, a treatment effect, a period effect (within any given subject). I think Xi is not shuld be full-rank design matrix. It is just part of X for subject i. Moreover all Xi should be the same width (num of columns) for Σ(Xi'V-1Xi). Also with (yi - Xi*β) you get residuals ri (col num of Xi should be equal length β). There is not very accurate description in Gurka's paper, but it provide good concept. If you check, it should work. |
ElMaestro ★★★ Denmark, 2020-08-07 15:08 (1530 d 15:20 ago) @ PharmCat Posting: # 21831 Views: 15,883 |
|
Hi PharmCat, thanks for trying to assist on this. It is kind of you. ❝ I think Xi is not shuld be full-rank design matrix. It is just part of X for subject i. Moreover all Xi should be the same width (num of columns) for Σ(Xi'V-1Xi). Also with (yi - Xi*β) you get residuals ri (col num of Xi should be equal length β). ❝ There is not very accurate description in Gurka's paper, but it provide good concept. ❝ If you check, it should work. I can't make it work. In Gurka's equation 3, where we can write the shorthand beta= M1-1*M2: When using the little 3-by-6 matrices for Xi, M1 becomes non-invertible. I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model. I wonder if Gurka's equation only works under assumptions not mentioned in the paper or if I am just a very lousy programmer. My money is clearly on the latter — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-07 17:42 (1530 d 12:46 ago) @ ElMaestro Posting: # 21833 Views: 16,182 |
|
Hi ElMaestro! ❝ I tried to lump 6 subjects together at a time, and now I am getting an estimate of beta but it appears wrong, in the sense it is not corresponding to what I get for the full model. If you calculate β with initial V - it will be wrong. It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y But if you have wrong β at final iteration - it is bad case. You can find same equation in LINDSTROM&BATES - 1.2 (where X not a design matix, but vector of Xi, IMHO) If you provide me data from 6 subjects I can make calculations in Julia. Magical linear algebra in julia:
using LinearAlgebra |
ElMaestro ★★★ Denmark, 2020-08-07 18:44 (1530 d 11:43 ago) @ PharmCat Posting: # 21834 Views: 15,845 |
|
Hi PharmCat. ❝ If you calculate β with initial V - it will be wrong. Correct and that is not what I am doing. For every iteration through the optimizer I am deriving a new beta vector; this is beta given V. The loglikelihood is calculated on basis of V and beta, where beta is derived from V via X. Therefore LL can be said to be a function of V (or of the variance components in it). There are two alternative ways I can get that beta vector for any given V: 1. I can do it the simple way with the very big X. It works well. 2. I can perhaps (=I should be able to?) do it by partition X into small bits, Xi, like in Gurka equation 3, or Lindstrom&Bates. But I cannot get the same beta vector for a given set of variance components in V. The -or a- reason is that if I reduce X to Xi (individual subjects) and do all the summation then the M1 and written above is not invertible. This, in turn, appears to be because the individual Xi's are not of the same rank as X. If I lump Xi's together like the first 6 subjects, then the next 6 subjects etc, and so forth and use Gurka's equation 3, then I can generate a beta estimate (from 4 chunks of 6 subjects), but it is still not the same as in pt 1, for any given set of components in V. I am not even at a final iteration, I am just at any iteration where I am comparing beta via the two methods for a set of starting values, or for that matter, if I plug in the values that come out as optimal values from pt 1 above after everything is coverged. ❝ It will be estimate for current V, then you calk REML end opimize θ... And then again... recalk β, ets... You can get initial β as β = inv(qr(X).R)*qr(X).Q'*y You do not need this since you do not need starting values for beta, cf the LL above. You only need starter values for theta (variance components in V). Every pass through the REML LL equation generates a beta estimate from the variance compnents, given X. Once the algo has converge d the estimate of beta is final. All this imaterial to my challenge at hand. ❝ But if you have wrong β at final iteration - it is bad case. This is not about the "final iterattion". It is about any iteration. For any iteraiton, the two ways to derive beta results in different values. Therefore something s wrong. ❝ You can find same equation in LINDSTROM&BATES - 1.2 (where X not a design matix, but vector of Xi, IMHO) ❝ If you provide me data from 6 subjects I can make calculations in Julia. Here all subjects are (y is logarithmised): Subj Seq Trt Per y 1 RTR R 1 8.307361 1 RTR T 2 8.286622 1 RTR R 3 8.229191 2 RTR R 1 8.001757 2 RTR T 2 7.774351 2 RTR R 3 7.939016 3 RRT R 1 8.150295 3 RRT R 2 8.113786 3 RRT T 3 8.301224 4 TRR T 1 8.319961 4 TRR R 2 8.068152 4 TRR R 3 8.243703 5 TRR T 1 8.469640 5 TRR R 2 8.421255 5 TRR R 3 8.278936 6 RRT R 1 8.023159 6 RRT R 2 8.015393 6 RRT T 3 7.791358 7 RTR R 1 7.836054 7 RTR T 2 8.030084 7 RTR R 3 7.993823 8 TRR T 1 7.698483 8 TRR R 2 7.621391 8 TRR R 3 7.609862 9 RTR R 1 8.444106 9 RTR T 2 8.333174 9 RTR R 3 8.131531 10 TRR T 1 7.708949 10 TRR R 2 7.766586 10 TRR R 3 7.705803 11 RRT R 1 7.530373 11 RRT R 2 7.701833 11 RRT T 3 7.780888 12 RRT R 1 7.731229 12 RRT R 2 8.061613 12 RRT T 3 8.275682 13 RRT R 1 7.878686 13 RRT R 2 7.795811 13 RRT T 3 7.961789 14 RTR R 1 8.016582 14 RTR T 2 7.807835 14 RTR R 3 7.996452 15 RTR R 1 7.720639 15 RTR T 2 7.598299 15 RTR R 3 7.910003 16 TRR T 1 7.992809 16 TRR R 2 8.143808 16 TRR R 3 8.114504 17 TRR T 1 7.781890 17 TRR R 2 7.885856 17 TRR R 3 7.683404 18 RRT R 1 7.910224 18 RRT R 2 7.939373 18 RRT T 3 8.054078 19 RTR R 1 7.791027 19 RTR T 2 7.919283 19 RTR R 3 7.825645 20 RRT R 1 7.886983 20 RRT R 2 7.982689 20 RRT T 3 8.018691 21 RRT R 1 7.961928 21 RRT R 2 7.888485 21 RRT T 3 8.029107 22 TRR T 1 7.989221 22 TRR R 2 7.981665 22 TRR R 3 7.956967 23 TRR T 1 8.056680 23 TRR R 2 8.066396 23 TRR R 3 8.174308 24 RTR R 1 7.536257 24 RTR T 2 7.500419 24 RTR R 3 7.912350 — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-07 20:14 (1530 d 10:13 ago) (edited on 2020-08-08 01:12) @ ElMaestro Posting: # 21835 Views: 15,882 |
|
❝ Here all subjects are (y is logarithmised) # Theta estimate θ was taken by REML using CSV, DataFrames, StatsModels, StatsBase I get beta: 7.904258681084915 0.0547761264037151 0.05092362547466389 0.0012959346740553102 0.048118895192829976 0.02239133333333365 |
ElMaestro ★★★ Denmark, 2020-08-07 20:23 (1530 d 10:05 ago) @ PharmCat Posting: # 21836 Views: 15,868 |
|
Hi PharmCat, it is kind of you, but this looks like apples compared to pears. The whole point is that G and R are meaningless so I cannot compare to an output generated with a model involving G and R. I am doing V directly, because we do not work with within and betweens for T. That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here. Note also, I am not after correlations per se. Therefore I am writing the variances directly into the covariance matrix (perhaps like you are used to if you work with compound symmetry of the heterogenous type). If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model. — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2020-08-07 23:31 (1530 d 06:56 ago) @ ElMaestro Posting: # 21837 Views: 15,915 |
|
I think I made beta work now, thanks a lot PharmCat It appears that beta can be constructed even if the local Xi is not of rank p; I guess it must be a little typo in Gurka's paper. Anyways, I reprogrammed the lines, and now I get conformable matrices, and beta is the same whichever way I calculate it. May try and see if I can do some comparisons about the performance, I still need a lot of code to put it all into function. Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-08 03:08 (1530 d 03:20 ago) @ ElMaestro Posting: # 21838 Views: 15,741 |
|
Hi ElMaestro! ❝ That is why you have five elements in theta (T is not replicated, but you are trying to fit both a within and a between variance for it). I am saying we do not work on them both, in stead we replace their sum by a single variance estimate. This is the whole point of the thread here. I understand your point. And generaly your variance component function is better because is not redundant. From other side if we have some different functions V = ƒ1(θ1) and V = ƒ2(θ2) it could be any and from any θ if we get equal V (so it could be CSH, FA0(2), manual construct). ❝ If you are looking at a final log likehood around 15.337 then I think we are at least talking about the same model. I have 15.337299690629834, model is the same it is good proof: different V construction methods lead to same results. ❝ I guess it must be a little typo in Gurka's paper. From time to time I find inaccuracies even in good articles and sometime it give strong headake when you try directly reproduce some methods. log(Σ|X|) instead log(|ΣX|) can give hours of frustration. ❝ Thanks PharmCat, pointing me towards Gurka and insisting on Xi was a great help. I am very glad if my thoughts were useful |
ElMaestro ★★★ Denmark, 2020-08-08 14:25 (1529 d 16:03 ago) @ PharmCat Posting: # 21839 Views: 15,751 |
|
I am seeing about a 30% speed improvement on my machine when using arrays of matrices. Am using lists for that purpose, that's the only way I can think of. Further improvement may be possible, because I may only need to store principally three different matrices in these arrays (per array) an keep track of which subjects use which matrices (that would be defined by their sequences), but this, I think, for now is not a priority for me to do. A further option is to only use one kind of matrix and then reorder all periods in the global data list, so that all subjects have e.g. R-T-R, then we'd need to work with only a single matrix type in the profiles; this may improve speed further. Will think about it... 30%...It ain't much but it's honest work. — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-08 19:27 (1529 d 11:01 ago) @ ElMaestro Posting: # 21840 Views: 15,730 |
|
❝ 30%...It ain't much but it's honest work. It is very good boost. I donn't know how matrix multipication works in R, but some improovments can be done if make function A = B'*C*B or A += B'*C*B. At each step of this function there are many intermediate products that can be avoided with direct computation. (Or make it in C) Example incremantal function:
""" julia> @time mulsimple!(θ, β, A, B, C); 0.000007 seconds (10 allocations: 1.172 KiB) julia> @time mulθβinc!(θ, β, A, B, C); 0.000004 seconds (2 allocations: 144 bytes) eightfold memory savings |
ElMaestro ★★★ Denmark, 2020-08-08 20:10 (1529 d 10:18 ago) @ ElMaestro Posting: # 21841 Views: 15,671 |
|
Hi again, now it gets really interesting. I made two observations: 1. There is this 30% improvement if I switch to profile log likelihood in stead of "the big one" for lac kof betyter wording 2. I can achieve about an improvement of several thousand percent if stay with the big X and V, however, I must then use a smart tactic: I am only generating V once, but though each pass on the optimiser I am updating V rather than re-writing it. Instead of iterating across all rows and column and checking in the data list where to put the individual components, I am initially generating a list of coordinates for each variance component. Something like this: V.Coordinates=function(Data) This returns a list with four components. The first component is a set of coordinates in V for varT, the second is a set of coordinates for varBR. And so forth. Then, in the REML function, I am having a block like this:
for (i in 1:nrow(VC$xy.varT)) And now are simply talking REAL BUSINESS The optimiser is returning blazingly fast with an REML estimate and corresponding variance components. Depending on the tolerance required it is almost instantaneous (<0.5 secs) on my laptop and that machine isn't a high tech equipment at all. The speed with "big V" far outruns the profile approach. I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines.. However, I am also seeing a different little thing: If I optimise with the profile approach I can make BFGS converge, but this will not work if I try it it in the "big V" approach. Lindstrom and Bates noticed something similar in the sense that they wrote that Quasi-Newton methods perform better when the profile approach is used: "We recommend optimizing the profile log- likelihood, since it will usually require fewer iterations, the derivatives are somewhat simpler, and the convergence is more consistent. We have also encountered examples where the NR algorithm failed to converge when optimizing the likelihood including a but was able to optimize the profile likelihood with ease." So, lots of options and I am not going to be late for supper — Pass or fail! ElMaestro |
PharmCat ★ Russia, 2020-08-09 20:22 (1528 d 10:06 ago) @ ElMaestro Posting: # 21844 Views: 15,435 |
|
❝ I believe this implies that matrix inversion is done in a clever fashion in R's solve function; as you pointed out PharmCat inversion can be slow (and unstable) as matrices get large, but perhaps the LAPACK routine used by R reduces the task intelligently, possibly through some of the same tricks described for e.g. the Alglib routines.. It is very good question to explore. If we suppose that some routine knows something about structure it should make big work to recognize block-diagonal or/and symmetric matrix structure. I think, main cause is a overhead for each function call. Another problem - many intermediate products for %*% |
PharmCat ★ Russia, 2020-08-10 13:48 (1527 d 16:40 ago) @ ElMaestro Posting: # 21849 Views: 15,378 |
|
Hi ElMaestro, ❝ now it gets really interesting. I made some tests. And results is surprising. First code:
using BlockDiagonals b1 1.109ms VS b2 3.751ms , so R solve block-diagonal matrix more than twice faster. In Julia b3 2.707 ms: random matrix solving in Julia faster than in R, but more slower then b1 when matrix is block-diagonal. Second code: using RCall Solving by blocks in R (b5 3.275 ms) really slower (a little more faster than solving random matrix). But solving block-diagonal matrix in Julia when structure is described takes 56.484 μs (b3 in second code), this is 20 times faster than in R. I'm trying to discuss this in Julia forum here. I'am not so expirienced in R, but I think that Solve.block can increase performance for block-diagonal matrices, docs here. |