Model check in R [Software]

posted by Helmut Homepage – Vienna, Austria, 2008-06-05 17:12 (6181 d 02:16 ago) – Posting: # 1925
Views: 16,536

Dear Vishal!

❝ I got y=0.1314x+0.1217 (y= mx+c) equation and R2 0.9925 using excel sheet.

  1. Hopefully it’s just a typo; I get 0.1317 for the slope.
  2. R2 is – almost – irrelevant.

❝ Now i want […] in excel sheet.


Have you read my last post – especially about validation?

❝ Is it possible in EXCEl?

There is no function for weighted regression integrated – stop searching… :sleeping:
Why do you want to stick with this awful piece of software? Just to give you an idea how your problem would look in R:
x         <- c( 1,   2,   4,   8,  10   ) # concentrations
y         <- c( 0.2, 0.4, 0.7, 1.2, 1.4 ) # responses
yu        <- c( 0.3, 0.45,0.71,1.3, 1.35) # unknown samples
plot(x, y)                                # simple plot
model1    <- lm(y ~ x)                    # unweighted regression
model2    <- lm(y ~ x, weights = 1/x)     # w = 1/x
model3    <- lm(y ~ x, weights = 1/x^2)   # w = 1/x^2
# results section #
summary(model1)                           # parameters
confint(model1, level = 0.95)             # confidence intervals
extractAIC(model1, k = 2)                 # AIC
fitted(model1)                            # fitted values
residuals(model1)                         # residuals
((y-coef(model1)[1])/coef(model1)[2]
  -x)/x*100                               # bias (back-calculated values)
xu        <- (yu-coef(model1)[1])/
             coef(model1)[2]              # concentration of samples
xu                                        # show results
abline(coef=coef(model1))                 # unweighted
abline(coef=coef(model2), col="blue")     # 1/x
abline(coef=coef(model3), col="red")      # 1/x²


In the first three lines input from external files are possible as well (plain text, Excel :-D, any ODBC-application, a MySQL-database, and many others).
If you replace model1 in the results section with model2 and model3, respectively, you'll get results for w=1/x and w=1/x2.
Output looks like this:
Call:
lm(formula = y ~ x)
Residuals:
       1        2        3        4        5
-0.05333  0.01500  0.05167  0.02500 -0.03833
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 0.121667   0.040127   3.032 0.056221 . 
x           0.131667   0.006597  19.959 0.000275 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0511 on 3 degrees of freedom
Multiple R-squared: 0.9925,     Adjusted R-squared:  0.99
F-statistic: 398.4 on 1 and 3 DF,  p-value: 0.0002749
                   2.5 %    97.5 %
(Intercept) -0.006035715 0.2493690
x            0.110672524 0.1526608
[1]   2.00000 -28.29403
        1         2         3         4         5
0.2533333 0.3850000 0.6483333 1.1750000 1.4383333
          1           2           3           4           5
-0.05333333  0.01500000  0.05166667  0.02500000 -0.03833333
[1] -40.506329   5.696203   9.810127   2.373418  -2.911392
[1] 1.354430 2.493671 4.468354 8.949367 9.329114


If you want to go with a quadratic model, use
model4    <- lm(y ~ x + I(x^2))
model5    <- lm(y ~ x + I(x^2), weights = 1/x)
model6    <- lm(y ~ x + I(x^2), weights = 1/x^2)

and in the result section e.g.
if(coef(model4)[3]<0) {
  (-(coef(model4)[2]/2/coef(model4)[3]+
  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100
  xu      <- -(coef(model4)[2]/2/coef(model4)[3]+
             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
             (coef(model4)[1]-yu)/coef(model4)[3]))
} else {
  (-(coef(model4)[2]/2/coef(model4)[3]-
  sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
  (coef(model4)[1]-y)/coef(model4)[3]))-x)/x*100
  xu      <- -(coef(model4)[2]/2/coef(model4)[3]-
             sqrt((coef(model4)[2]/2/coef(model4)[3])^2-
             (coef(model4)[1]-yu)/coef(model4)[3]))
}

There are two possible solutions for the square root depending on the sign of the quadratic parameter (the third coefficient); if the value is exactly zero, the model should reduce to the linear case. Since this sniplet is just quick and dirty I leave the homework to somebody else.

❝ If possible, which equation obtain using above data and How can i obtain this equation?


Even in Excel it’s possible to get weighted regressions running. For formulas see e.g. Draper/Smith (Applied Regression Analysis, Wiley, 3rd ed, 1998) or Miller/Miller (Statistics and Chemometrics for Analytical Chemistry, Pearson, 5th ed, 2005). Of course you will have to struggle with a spreadsheet which is not suitable for this kind of job (it’s like writing a letter instead by means of a word+processor in Adobe’s Photoshop).

BTW, a linear model is wrong for your example’s data.
You have to justify the model, which for your data is simply not linear at all. A plot of residuals (ycalc - yobs) vs. x would have shown a nice parabola – where we expect a random arrangement independent from x. The residuals show a pattern [–+++–] of 3 runs (changes of sign). Back-calculated x are biased within the range -40.5% to +9.81%. Weighting improves this mess a little (1/x: -18.6%, +12.9%; 1/x2: -9.83%, +13.8%), but according to residual analyses it's the wrong model. Guidelines call for a justification, because if you apply a wrong model you willJust compare the linear with the quadratic model y=A+Bx+Cx2. 4 runs (of a possible 5), no more trend in residuals, back-calculated x: unweighted (-6.07%, +4.46%), 1/x (-2.61%, +4.32%), 1/x2 (-3.07%, +3.36%).
According to AIC (Akaike’s Information Criterion) and back-calculated x the best model is the quadratic one with w=1/x2: y=0.001814+0.2070x-0.006936x2.

You can easily compare models by an F-test, e.g.
anova(model3, model6, test="F")
getting
Analysis of Variance Table
Model 1: y ~ x
Model 2: y ~ x + I(x^2)
  Res.Df        RSS Df  Sum of Sq      F  Pr(>F) 
1      3 0.00090656                               
2      2 0.00006905  1 0.00083751 24.258 0.03884 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


But even the unweighted quadratic is by far better than any weighting scheme applied to a linear model. Your example shows the importance of analyzing residuals – not only plotting x vs. y as you have stated in your second post.

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes

Complete thread:

UA Flag
Activity
 Admin contact
23,424 posts in 4,927 threads, 1,673 registered users;
177 visitors (0 registered, 177 guests [including 16 identified bots]).
Forum time: 19:28 CEST (Europe/Vienna)

Young man, in mathematics you don’t understand things.
You just get used to them.    John von Neumann

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5