ElMaestro ★★★ Denmark, 2017-10-07 01:01 (2772 d 22:17 ago) Posting: # 17871 Views: 19,339 |
|
Hi all, I have some different datasets that are not large at all, and I am doing some model fits on them and from that I am extracting a correlation measure like r squared etc. For the sake of simplicity let us just assume there in a single model like a linear regression without weights. I would like to compare apples and pears, or at least the fits of three different datasets A, B and C. These datasets do not have the same number of data points, so I cannot fairly compare e.g. r squared of A vs B vs C directly. Akaike and Schwarz are presumably not the way to go, I think, as I am not varying the model but the dataset, so to say. Kolmogorov-Smirnoff would potentially be useful if I had a boatload of points, which I don't anyway. I am very poor at explaining what I think I am looking for ![]() ![]() Does anyone here know of a handy statistic that allows a fair comparison of goodness of fits across datasets with unequal sizes, given a single model?? Muchas gracias. — Pass or fail! ElMaestro |
nobody nothing 2017-10-07 18:03 (2772 d 05:16 ago) @ ElMaestro Posting: # 17872 Views: 17,942 |
|
Hi Danmark! ❝ ... I am very poor at explaining what I think I am looking for Ehhm, whut? Maybe you are to fixated for statical testing? How about some oldfashioned "visual inspection of fit" paired with "back-calculated value of calibration samples from calibration function, including relative deviation"? Just saying ![]() No idea what you want to compare in the end, so... — Kindest regards, nobody |
ElMaestro ★★★ Denmark, 2017-10-07 20:06 (2772 d 03:13 ago) @ nobody Posting: # 17873 Views: 17,989 |
|
Hi nobody, ❝ Ehhm, whut? Maybe you are to fixated for statical testing? How about some oldfashioned "visual inspection of fit" paired with "back-calculated value of calibration samples from calibration function, including relative deviation"? ❝ Just saying Your idea is absolutely good. But there are complications... ❝ No idea what you want to compare in the end, so... Let us say I have a cattle prod, a long-haired Austrian, and a microphone that is rigged up to my computer for recording. The cattle prod is a wonderful tool, it makes persuasion such an easy task. My cattle prod has a lot of knobs and buttons and sliders that allow me to tweak combinations of: - Tesla coil zap modifier strength - Evil discharge combobulator intensity - Ion stream barbaric voltage gain - Apocalyptic wolfram anode ray modulation ...and so forth. I am zapping the Austrian is his sorry rear and recording the intensity (dB) of his immediate...shall we say.... verbal reaction. Let me add, each experiment takes no more than a few seconds but there is a bit of recovery time before the next experiment can be run and that isn't related to latency of the prod. Surprisingly, under the present experimental conditions my sensitive microphone also records a lot of cursing and swearing after each zap. I am not sure why or what the source is, but it is a matter I am looking into as I am not sure such behavioral excursions are good for my poor sensitive electronics. Anyways, I know from the past that with the default prod settings there is a decent -though not overwhelmingly good- correlation of the reaction recorded in dB and the prod's output measured in GJ per zap. I have done that experiment with 16 evenly dispersed levels between 0 and 17500 GJ/zap and I got an r squared of 0.9583. The question is if I can improve the correlation by playing a little around with the seetings mentioned above. So I acquired via Amazon an ACME Automated Experimental Prod Analyzer Kit (only $ 598 w. 2 years of warranty) so that I can automate the experiments comprising of changing a lot of setting, actuating the prod and recording the resulting reaction. So I have generated 9629 datasets, some of which have 16 data points and some have 15, 14 or 13 because of missing values (in practice this means a relative lack of contact between the tip of the prod when actuated and the Austrian's butt). I need to find the settings for which I am most likely to be able give a zap that triggers a 138 dB (+/- 0.1 dB) response. Hence I need the correlation to be as good as possible, including a correction for sample size. With 9629 datasets I cannot visually inspect every single set and determine the best fit. I must find a solution in R. This is my grave predicament. "Help me noone kenobi, you are my only hope". — Pass or fail! ElMaestro |
ElMaestro ★★★ Denmark, 2017-10-07 21:07 (2772 d 02:11 ago) @ ElMaestro Posting: # 17874 Views: 18,103 |
|
Dear all, for anyone interested in the setup please follow this link. Around 0:42 into the educational journey you will see an early analog version of a cattle prod (actually back then the concenpt was only semi-developed so some may call it only a cat prod), and a few seconds later you will witness the natural consequence of the prod's encounter with it's intended biological target. I believe the reaction here amounts to only roughly 115 dB, arguably because of the relative lack of advanced settings and the choice of population. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-10-08 19:17 (2771 d 04:02 ago) @ ElMaestro Posting: # 17875 Views: 17,879 |
|
Hi ElMaestro, nice setup. Given ❝ […] a lot of knobs and buttons and sliders that allow me to tweak combinations of: ❝ - Tesla coil zap modifier strength ❝ - Evil discharge combobulator intensity ❝ - Ion stream barbaric voltage gain ❝ - Apocalyptic wolfram anode ray modulation ❝ ...and so forth. I’m I right that you are aiming at multiple regression (i.e., >1 regressor and and one regressand)? See there for an arsenal of methods. For the same number of regressors the adjusted R2 takes the number of data points into account.
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-10-08 19:30 (2771 d 03:48 ago) @ Helmut Posting: # 17876 Views: 17,893 |
|
Hi Hötzi, I simply have a lot of regression lines each of the type y=ax+b. I just need to find the best one and the only complication is that not all lines have the same number of points. No multiple regression and no fancy stuff like weights or funky residual distributions. — Pass or fail! ElMaestro |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-10-08 19:35 (2771 d 03:43 ago) @ ElMaestro Posting: # 17877 Views: 17,865 |
|
Hi ElMeastro, ❝ I simply have a lot of regression lines each of the type y=ax+b. I just need to find the best one and the only complication is that not all lines have the same number of points. See my edit above. Do you have the raw data or only the parameters a1, a2, … ai, b1, b2, … bi, and the sizes? In the former case easy (max. R²adj), in the latter no idea. — Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |
ElMaestro ★★★ Denmark, 2017-10-08 19:50 (2771 d 03:29 ago) @ Helmut Posting: # 17878 Views: 17,901 |
|
Hi again, and thanks for looking into it. It is as simple as this:
x1=c(1,2,3,4,5) I would like to know how to judge if the correlation is quantitatively appearing to be best for the first line or for the second line. — Pass or fail! ElMaestro |
nobody nothing 2017-10-08 22:26 (2771 d 00:52 ago) @ ElMaestro Posting: # 17879 Views: 18,018 |
|
❝ x1=c(1,2,3,4,5) ❝ y1=c(10.5, 11.4, 12.6, 13.3, 14.6) ❝ ❝ x2=c(1,2,3,4,5) ❝ y2=c(10.3, 11.4, NA, 13.5, 14.6) ❝ 13.3 vs. 13.5? By intention? "RMSE The RMSE is the square root of the variance of the residuals. It indicates the absolute fit of the model to the data–how close the observed data points are to the model’s predicted values. Whereas R-squared is a relative measure of fit, RMSE is an absolute measure of fit. As the square root of a variance, RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction." ...educated guess, no idea if sensitive for number of calibrators ![]() http://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/ ..didn't read it, maybe you want to have a look: http://orfe.princeton.edu/~jqfan/papers/01/ant.pdf — Kindest regards, nobody |
yjlee168 ★★★ ![]() ![]() Kaohsiung, Taiwan, 2017-10-08 23:28 (2770 d 23:50 ago) (edited on 2017-10-09 08:35) @ ElMaestro Posting: # 17880 Views: 18,013 |
|
Dear Elmaestro, I am not quite sure if ANCOVA is what you need. Here is a 10-min video (title: ANCOVA in R) from Youtube. — All the best, -- Yung-jin Lee bear v2.9.2:- created by Hsin-ya Lee & Yung-jin Lee Kaohsiung, Taiwan https://www.pkpd168.com/bear Download link (updated) -> here |
DavidManteigas ★ Portugal, 2017-10-09 12:34 (2770 d 10:45 ago) @ ElMaestro Posting: # 17881 Views: 17,852 |
|
Dear El Maestro, That is changing the logic of statistics ![]() For those models, I think R squared is the way to go, although different sample sizes might have an influence depending on the range of samples that you have. How "small" are the samples? Between 30 and 50 observations per sample? Would methods like cross-validation work on the datasets? What about calculating confidence intervals for the adjusted R squared of each model and use a Forest plot of those CI, one line for each dataset? This might give you an idea of the impact of the different sample sizes in this particular problem and to decide whether just looking at R square would be enough. Depending on the number of datasets, visual inspection and influence statistics might help on deciding which is the best data for the model. ![]() Regards, David |
nobody nothing 2017-10-09 12:45 (2770 d 10:33 ago) @ DavidManteigas Posting: # 17882 Views: 17,920 |
|
Best guess: in case of unweighted linear regression the highest calibrators always have the highest impact. For your example: Take 138 dBA as the mean of the regression function, as predictions will be best arround the mean. If there is heteroscedastic data and weighted regression, the calibrators arround the weighted mean will have least impact on the calibration function. I still have not fully understood the scope of the survey. Is it on best practice of study planning for calibration experiments? — Kindest regards, nobody |
Helmut ★★★ ![]() ![]() Vienna, Austria, 2017-10-10 20:15 (2769 d 03:03 ago) @ ElMaestro Posting: # 17885 Views: 17,873 |
|
Hi ElMaestro, ❝ I would like to know how to judge if the correlation is quantitatively appearing to be best for the first line or for the second line. Sumfink along this?
Fits linear models to the sets and gives results sorted by R²adj (descending) and n (ascending). y-values of set 2 and 3 are identical. Shows that NA gives the same result as a completely missing row due to the default na.action=na.omit in lm() .
Edit: Toys for boys:
— Dif-tor heh smusma 🖖🏼 Довге життя Україна! ![]() Helmut Schütz ![]() The quality of responses received is directly proportional to the quality of the question asked. 🚮 Science Quotes |