Shuanghe
★★  

Spain,
2019-12-17 13:01

Posting: # 20981
Views: 607
 

 Possible Excel bug? [Software]

Hi,

I just found out something interesting when compare Excel and R function for t distribution when I was calculating p value.

I was writing a function for a friend to do some calculation but then decided to transfer it to Excel since my friend doesn't know much about R. When degree of freedom is integer, R and Excel give very similar result but when df is not integer, the difference is noticeable. I don't know if that's kind of bug or my installation of MS office has some problem. Could anyone check it out?

Example,
options(digits=16)
pt(4.5, 10)
gives result of 0.999428447456598, and Excel T.DIST(4.5, 10, TRUE) gives 0.999428447456598. identical. However, pt(4.5, 10.1) gives 0.9994425014527133 but Excel T.DIST(4.5, 10.1, TRUE) gives 0.999428447456598. The difference is at 5th decimal. Comparing to identical result at 15th decimal point when df is integer, this is a big difference!

Obviously, i trust R much more than Excel. however, I'd appreciate if anyone can verify it and maybe offer some explanation if that's not a bug?

Thanks.

Edit:

Forgot to mention that the Excel I'm using is 2013, on Windows 7 Pro machine.

All the best,
Shuanghe
ElMaestro
★★★

Belgium?,
2019-12-17 13:49

@ Shuanghe
Posting: # 20982
Views: 334
 

 Possible Excel bug?

Hi Shuanghe,


» Obviously, i trust R much more than Excel. however, I'd appreciate if anyone can verify it and maybe offer some explanation if that's not a bug?

I do not have Excel, but it sounds like an embarrassing implementation.
Can you try, just for the fun of it, to ask Excel what it thinks the critical value at df=10.1 is for:

a. p=0.9994425014527133
b. p=0.999428447456598

Le tits now.

Best regards,
ElMaestro
Shuanghe
★★  

Spain,
2019-12-17 14:15

@ ElMaestro
Posting: # 20983
Views: 328
 

 Possible Excel bug?

Hi ElMaestro,

» I do not have Excel, but it sounds like an embarrassing implementation.

I guess that I shouldn't be surprised given its history...

» Can you try, just for the fun of it, to ask Excel what it thinks the critical value at df=10.1 is for:
»
» a. p=0.9994425014527133
» b. p=0.999428447456598

Good idea.
T.INV(0.999442501452713, 10.1) = 4.51612251417097
T.INV(0.999428447456598, 10.1) = 4.50000000000000

In fact, I just checked that T.DIST(4.5, df, TRUE) will give exactly the same result for df = 10.1, 10.2, ... 10.9, (increase df but let df < 11). The result only changed when df = 11!

Nice work, M$.

All the best,
Shuanghe
mittyri
★★  

Russia,
2019-12-17 14:23

@ Shuanghe
Posting: # 20984
Views: 327
 

 Possible Excel bug?

Hi Shuanghe,

try this
=T.DIST(4.5, 10.1,TRUE )-T.DIST(4.5, 10,TRUE)

If you want more pain, see here
  1. Heiser 2006. Statistical tests, tests of significance, and tests of a hypothesis using Excel. Journal of Applied Statistical Methods, 5:155–171
  2. McCullough and Heiser 2008. On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis 52:4570–4578

even in the latest Office 365...

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-12-17 15:00

@ mittyri
Posting: # 20985
Views: 313
 

 M$: It’s not a bug, it’s a feature!

Hi all,

known for ages. All (‼) versions of Excel round the degrees of freedom down to the nearest integer.

df <- seq(10, 11, 0.2)
q  <- 4.5
res <- data.frame(df = df, correct = pt(q, df), Excel = pt(q, floor(df)))
print(res, row.names = FALSE)

[image]  df   correct     Excel
10.0 0.9994284 0.9994284
10.2 0.9994561 0.9994284
10.4 0.9994818 0.9994284
10.6 0.9995059 0.9994284
10.8 0.9995284 0.9994284
11.0 0.9995494 0.9995494


Quoting Martin:

Never never never never use Excel. Not even for calculation of arithmetic means.


[image]

Congratulations to M$ for making a continuous function discrete. :thumb down:

I had fun comparing papers by regulators based on Luther Gwaza’s Excel-Sheet for adjusted indirect comparisons with my R-package. Satterthwaite’s degrees of freedom are practically never* integers. Remember this goody of the EMA?

Results obtained by alternative, validated statistical programs are also acceptable except spreadsheets because outputs of spreadsheets are not suitable for secondary assessment.



  • Only if variances are equal and sequences (crossovers) or group sizes (parallel) are equal.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Shuanghe
★★  

Spain,
2019-12-17 15:41

@ Helmut
Posting: # 20986
Views: 291
 

 M$: It’s not a bug, it’s a feature!

Thanks to all!

I knew that many years ago we had some discussion about some "features" in Excel (though I didn't recall it's the t distribution) but that's like 7, 8 years ago so I thought M$ fixed the problems.

How can they screwed up like this? ...

Anyway, I'll tel my friend use my R function instead.

All the best,
Shuanghe
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-12-17 16:00

@ Shuanghe
Posting: # 20987
Views: 293
 

 M$: It’s not a bug, it’s a feature!

Hi Shuanghe,

» I knew that many years ago we had some discussion about some "features" in Excel (though I didn't recall it's the t distribution)…

It was the function TINV(alpha, df) in all versions up to 2003. In order to get the correct value people had to use the workaround TINV(2*alpha, df). M$ could not change the implementation because it would break all existing spreadsheets. Hence in 2007 the new function T.INV(alpha, df) was introduced. However, the degrees of freedom are still flawed.
Of course, the same crap in CHI.INV(), F.INV().

» How can they screwed up like this? ...

Too big to fail?

     A refund for defective software might be nice,
except it would bankrupt the entire software industry
in the first year.
      [image] Andrew S. Tanenbaum

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Activity
 Admin contact
20,246 posts in 4,261 threads, 1,397 registered users;
online 12 (0 registered, 12 guests [including 6 identified bots]).
Forum time (Europe/Vienna): 03:56 CET

To call the statistician after the experiment is done
may be no more than asking him to perform a postmortem examination:
he may be able to say what the experiment died of.    R.A. Fisher

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5