Helmut
★★★  Vienna, Austria,
2019-04-26 16:03
(748 d 04:17 ago)

Posting: # 20244
Views: 5,456

## R Inferno [R for BE/BA]

Dear R-Users,

I can’t get my head around this:

loc.stat <- function(x, type, na.rm, distr) {  non.numerics    <- which(is.na(suppressWarnings(as.numeric(x))))  x[non.numerics] <- NA  x <- as.numeric(x)  if (distr == "nonpar") {    switch (type,      pct2.5  = as.numeric(quantile(x, probs=0.025, na.rm=na.rm)),      pct25   = as.numeric(quantile(x, probs=0.25, na.rm=na.rm)),      median  = as.numeric(median(x, na.rm=na.rm)),      pct75   = as.numeric(quantile(x, probs=0.75, na.rm=na.rm)),      pct97.5 = as.numeric(quantile(x, probs=0.975, na.rm=na.rm)))  }  if (distr == "normal") {    mean.x <-mean(x, na.rm=na.rm)    sd.x   <- sd(x, na.rm=na.rm)    n.not.nas <- length(x)-length(which(is.na(x)))    se <- sd.x/sqrt(n.not.nas)    switch (type,      cla2.5  = mean.x+qt(0.025, df=n.not.nas-1, lower.tail=TRUE)*se,      ar.mean = mean.x,      cla97.5 = mean.x+qt(0.975, df=n.not.nas-1, lower.tail=TRUE)*se)  } } set.seed(123456) x <- rnorm(12, 100, 20) median <- loc.stat(x, "median", TRUE, "nonpar") mean   <- loc.stat(x, "ar.mean", TRUE, "normal")

Gives

R> median NULL R> mean  109.7066

Why the heck? Quoting the R-Inferno:

Curly braces are also useful with loops, switch and if.

What? How? Google and R-help were not my friends.

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★

Russia,
2019-04-26 16:20
(748 d 04:00 ago)

@ Helmut
Posting: # 20245
Views: 4,965

## full conditions!

Dear Helmut,

the code is worth a thousand sentences
loc.stat <- function(x, type, na.rm, distr) {   non.numerics    <- which(is.na(suppressWarnings(as.numeric(x))))   x[non.numerics] <- NA   x <- as.numeric(x)   if (distr == "nonpar") {     switch (type,             pct2.5  = as.numeric(quantile(x, probs=0.025, na.rm=na.rm)),             pct25   = as.numeric(quantile(x, probs=0.25, na.rm=na.rm)),             median  = as.numeric(median(x, na.rm=na.rm)),             pct75   = as.numeric(quantile(x, probs=0.75, na.rm=na.rm)),             pct97.5 = as.numeric(quantile(x, probs=0.975, na.rm=na.rm)))   } else if (distr == "normal") {     mean.x <-mean(x, na.rm=na.rm)     sd.x   <- sd(x, na.rm=na.rm)     n.not.nas <- length(x)-length(which(is.na(x)))     se <- sd.x/sqrt(n.not.nas)     switch (type,             cla2.5  = mean.x+qt(0.025, df=n.not.nas-1, lower.tail=TRUE)*se,             ar.mean = mean.x,             cla97.5 = mean.x+qt(0.975, df=n.not.nas-1, lower.tail=TRUE)*se)   } else {     NULL   } } set.seed(123456) x <- rnorm(12, 100, 20) median <- loc.stat(x, "median", TRUE, "nonpar") mean   <- loc.stat(x, "ar.mean", TRUE, "normal")

Kind regards,
Mittyri
Helmut
★★★  Vienna, Austria,
2019-04-26 16:41
(748 d 03:39 ago)

@ mittyri
Posting: # 20247
Views: 4,962

## full conditions!

Hi mittyri,

you made my day! Goofy and R-help were not my friends…

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2019-04-27 14:29
(747 d 05:51 ago)

@ mittyri
Posting: # 20252
Views: 4,893

## full conditions!

Dear mittyri!

May I ask: Whats goin on here?
Don't get it.

Do you have the thousand sentences for me?

Regards,

Detlew
mittyri
★★

Russia,
2019-04-27 22:23
(746 d 21:57 ago)

(edited by mittyri on 2019-04-27 23:47)
@ d_labes
Posting: # 20253
Views: 4,916

## magik of R implicit return

Dear Detlew!

» Do you have the thousand sentences for me?
I have many of those, but for clarity I prefer a simplified version what was going on:

veryspecialfunction <- function(flag) {   if (flag == 1) {     "one"   }     if (flag == 2){     "two"   }  } 
Now running:
veryspecialfunction(2)
 "two"

veryspecialfunction(1)
# nothing here!
typeof(veryspecialfunction(1))
 "NULL"

Kind regards,
Mittyri
ElMaestro
★★★

Denmark,
2019-04-27 22:34
(746 d 21:46 ago)

@ mittyri
Posting: # 20254
Views: 4,897

## magik of R implicit return

Hi all,

"smart" lost this round.

I always use explicit return(something) in my functions, when they are supposed to return a value regardless of whether it is a list, numeric, NULL or data frame some other thingy.

Yes, that involves tapping a few times more on the keyboard but it does solve a lot of problems and best of all, it makes code readable.

In language like C you would never run into this phenomenon. You declare explicitly which type the function returns and you will be bombarded with errors or warnings if you fail to return the right type of if you forget the return statement. I find it very comfortable.

Pass or fail!
ElMaestro
d_labes
★★★

Berlin, Germany,
2019-04-28 19:36
(746 d 00:45 ago)

@ ElMaestro
Posting: # 20255
Views: 4,837

## implicit return?

Hi Öberster Größter Meister,

» "smart" lost this round.

Totally.

» I always use explicit return(something) in my functions, when they are supposed to return a value regardless of whether it is a list, numeric, NULL or data frame some other thingy.

A good idea. But if you use R you are told that doing so has the penalty of a longer run-time.
Bloody Hell!

» Yes, that involves tapping a few times more on the keyboard but it does solve a lot of problems and best of all, it makes code readable.

Full ACK!

» In language like C you would never run into this phenomenon. You declare explicitly which type the function returns and you will be bombarded with errors or warnings if you fail to return the right type of if you forget the return statement. I find it very comfortable.

Weakly typed (R) against strong typed language (C or C++).
But IIRC has C or C++ also some weaknesses.

Regards,

Detlew
ElMaestro
★★★

Denmark,
2019-04-28 20:36
(745 d 23:44 ago)

@ d_labes
Posting: # 20256
Views: 4,833

## implicit return?

Hi d_labes,

» But IIRC has C or C++ also some weaknesses.

That's right.
For example, the number of built in functions in C is small. You need to write most functionality from scratch, so what you can do with three lines of code in R can easily be 100 lines of code in C. Much more if you think of e.g. the plot function which opens up a graphic device and with one line gives you a graph. Horrendously many hours of programming is needed in C to do such a thing.

The biggest disadvantage I can think of in C is clearly: Pointers (especially character pointers). They were hard to grasp initially. C does not have strings.
The biggest advantage I can think of in C is: Pointers. They speed things up immensely. Like in the bootstrap code for dissolution posted here in this forum some time ago. Infinitely faster than any implementation seen out there. But then again, when you are working on a dossier it usually does not matter for real if you get the output in 1 or 10000 seconds as long as you meet your submission deadline I am not well versed with C++ as I never really had a need for classes/objects. They are not so necessary for numerical purposes.

Pass or fail!
ElMaestro
Helmut
★★★  Vienna, Austria,
2019-04-28 20:50
(745 d 23:31 ago)

@ d_labes
Posting: # 20257
Views: 4,847

## implicit return?

Dear Detlew,

» » I always use explicit return(something) in my functions,…
»
» A good idea.

Yep.

» But if you use R you are told that doing so has the penalty of a longer run-time.
» Bloody Hell!

Does that really matter?
library(microbenchmark) impl <- function(x) {   if (x == 0) {     "foo"   } else {     "bar"   } } expl <- function(x) {   if (x == 0) {     return("foo")   } else {     return("bar")   } } res <- microbenchmark(impl(round(runif(1), 0)),                       expl(round(runif(1), 0)), times=2000L) print(res) Unit: microseconds                      expr   min    lq     mean median    uq      max neval cld  impl(round(runif(1), 0)) 1.811 1.812 2.686167  1.813 2.114 1383.479  2000   a  expl(round(runif(1), 0)) 1.811 1.812 2.955002  1.813 2.114 1879.152  2000   a

» Weakly typed (R) against strong typed language (C or C++).

Exactly. You know that I’m facing a similar story with the forum scripts (PHP: weakly typed) generating database-queries (SQL: strongly typed). Bloody scavenger hunt.

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2019-04-28 21:16
(745 d 23:04 ago)

(edited by ElMaestro on 2019-04-28 21:40)
@ Helmut
Posting: # 20258
Views: 4,840

## implicit return?

Hi Hotzi,

I even go a step further, perhaps only because I am rather dumb.

My functions almost always look like this:
 Foo=function(bar) {   ##blah blah, conditions and functionality and loops and loads of curly braces here.   ##whatever happens in loops and conditions it sets a variable called e.g. rslt   return(rslt) ##I always have a line at the end that looks like this! }

I do that to make sure that whatever happens with pairs of curly braces I know that my function return is captured at the end. When finding errors and bugs I just need to look at rslt and work backwards. print(rslt) immediately before the return statement is often a very good beginning to that. Not sure if this is smarter (in terms of error finding) but it is my solution to a lot of trouble. Possibly just owing to my own lack of skills.

In C you can also exit with curly braces in the middle of a function. Debugging that sh!t is impossible for me .

Pass or fail!
ElMaestro
Helmut
★★★  Vienna, Austria,
2019-04-28 23:38
(745 d 20:42 ago)

@ ElMaestro
Posting: # 20259
Views: 4,826

## implicit return?

Hi ElMaestro,

» My functions almost always look like this:
» » Foo=function(bar)» { »   ##blah blah, conditions and functionality and loops and loads of curly braces here. »   ##whatever happens in loops and conditions it sets a variable called e.g. rslt» »   return(rslt) ##I always have a line at the end that looks like this!» }

Good practice. I use it in longer functions as well and in ones which become part of a package, always. Only very, very slightly slower:
library(microbenchmark) impl <- function(x) {   if (x == 0) {     "foo"   } else {     "bar"   } } expl <- function(x) {   if (x == 0) {     return("foo")   } else {     return("bar")   } } braces.with.suspenders <- function(x) {   if (x == 0) {     res <- "foo"   } else {     res <- "bar"   }   return(res) } res <- microbenchmark(impl(round(runif(1), 0)),                       expl(round(runif(1), 0)),                       braces.with.suspenders(round(runif(1), 0)),                       times=3000L) print(res) Unit: microseconds                                       expr   min    lq     mean median    uq    max neval cld                   impl(round(runif(1), 0)) 1.811 1.812 1.976830  1.813 2.114  5.434  3000  a                   expl(round(runif(1), 0)) 1.811 1.812 1.976926  1.813 2.114  6.340  3000  a braces.with.suspenders(round(runif(1), 0)) 1.811 1.812 2.027029  2.114 2.114 16.905  3000   b

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2019-04-29 09:40
(745 d 10:40 ago)

@ Helmut
Posting: # 20260
Views: 4,764

## implicit return?

Hi Hötzi,

Interestingly (at least to moi!), I tried out of curiosity to do this:

QWERTY=function(x) {  ifelse ((x==0), "foo", "bar") }

The benchmark for this version is much, much slower than any of the other proposals (at least on my system). Another example that shows that condensing code towards fewer keystrokes is not always fastest. I wonder what goes on internally since this is much slower? Perhaps space is allocated for both "foo" and "bar" in memory and then there is a decision as to which one to throw away and which one to keep on the CPU stack? I don't have a very good at understanding of these things.

This one is appearing even worse:
 QWERTY2=function(x) {  ifelse (x, "bar", "foo") } 

I have no idea why QWERTY2 would not outperform QWERTY or at least be the same if R has a kind of optimiser in the interpreter. Would you happen know????

Pass or fail!
ElMaestro
Helmut
★★★  Vienna, Austria,
2019-04-29 10:41
(745 d 09:39 ago)

@ ElMaestro
Posting: # 20261
Views: 4,766

## implicit return?

Hi ElMaestro,

» Interestingly (at least to moi!), I tried out of curiosity to do this:
»
» QWERTY=function(x)» { »  ifelse ((x==0), "foo", "bar")» }
»
» The benchmark for this version is much, much slower than any of the other proposals (at least on my system).

On any system. » Another example that shows that condensing code towards fewer keystrokes is not always fastest. I wonder what goes on internally since this is much slower?

Use ifelse() for vectorized conditions. See this thread.

Try this:
library(microbenchmark) fun1 <- function(x, cond, print=FALSE) {   if (x == cond) {     "foo"     if (print) cat("foo\n")   } else {     "bar"      if (print) cat("foo\n")  } } fun2 <- function(x, cond, print=FALSE) {   if (x == cond) {     if (print) cat("foo\n")     return("foo")   } else {     if (print) cat("bar\n")     return("bar")   } } fun3 <- function(x, cond, print=FALSE) {   if (x == cond) {     res <- "foo"   } else {     res <- "bar"   }   if (print) cat(res, "\n")   return(res) } fun4 <- function(x, cond, print=FALSE) {   ifelse (x == cond, "foo", "bar") } fun5 <- function(x, cond, print=FALSE) {   ifelse ((x == cond), res <- "foo", res <- "bar")   if (print) cat(res, "\n")   return(res) } fun6 <- function(x, cond, print=FALSE) {   ifelse (x == cond, res <- "foo", res <- "bar")   if (print) cat(res, "\n")   return(res) } res1 <- microbenchmark(fun1(round(runif(1, 0, 1), 0), 0),                        fun2(round(runif(1, 0, 1), 0), 0),                        fun3(round(runif(1, 0, 1), 0), 0),                        fun4(round(runif(1, 0, 1), 0), 0),                        fun5(round(runif(1, 0, 1), 0), 0),                        fun6(round(runif(1, 0, 1), 0), 0),                        times=1000L) res2 <- microbenchmark(fun1(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        fun2(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        fun3(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        fun4(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        fun5(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        fun6(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),                             c(1, 2)),                        times=1000L) print(res1) print(res2)
Shortended output of res1 and res2:
  expr  median cld fun1()   2.415   a fun2()   2.415   a fun3()   2.416   a fun4()   3.623   a fun5()   3.624   a fun6()   3.624   a   expr  median cld fun1() 54.0350   b fun2() 53.7340   b fun3() 54.0360   b fun4() 12.2265  a  fun5() 12.6800  a  fun6() 12.3780  a  There were 50 or more warnings (use warnings() to see the first 50)
Hey, were are the warnings coming from? Try fun1() to fun3() with a vector-condition and print=TRUE.

Dif-tor heh smusma 🖖
Helmut Schütz The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2019-04-29 12:55
(745 d 07:25 ago)

@ Helmut
Posting: # 20262
Views: 4,720

## implicit return?

Hi Hötzi,

» Use ifelse() for vectorized conditions. See this thread.

Try this:
library(microbenchmark) x=round(runif(100),0) Len=100 f1=function(x, Len) {  ifelse ((x==0), "foo", "bar") } f1(x, Len) f2=function(x, Len) {  rslt=rep("foo", Len)  for (i in 1:Len)   if (x[i]) rslt[i]="bar"  return(rslt) } f2(x, Len) res <- microbenchmark(                       f1(round(runif(Len),0), Len),                       f2(round(runif(Len),0), Len),                       times=3000L) print(res) 

note I am passing Len to f1 even though it is not used there in order to give the function call the same overhead, otherwise the comparison might be called unfair.

On my system f2 is a lot faster, so the advantage of runif may be solely syntactic and not in any way a true vectorisation advantage (easier but not more efficient).

Pass or fail!
ElMaestro
mittyri
★★

Russia,
2019-04-30 13:05
(744 d 07:15 ago)

(edited by mittyri on 2019-04-30 13:17)
@ ElMaestro
Posting: # 20263
Views: 4,668

## built-in ifelse

Hi ElMaestro,

 function (test, yes, no) {   if (is.atomic(test)) {     if (typeof(test) != "logical")       storage.mode(test) <- "logical"     if (length(test) == 1 && is.null(attributes(test))) {       if (is.na(test))         return(NA)       else if (test) {         if (length(yes) == 1) {           yat <- attributes(yes)           if (is.null(yat) || (is.function(yes) && identical(names(yat),             "srcref")))             return(yes)         }       }       else if (length(no) == 1) {         nat <- attributes(no)         if (is.null(nat) || (is.function(no) && identical(names(nat),           "srcref")))           return(no)       }     }   }   else test <- if (isS4(test))     methods::as(test, "logical")   else as.logical(test)   ans <- test   ok <- !is.na(test)   if (any(test[ok]))     ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]   if (any(!test[ok]))     ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]   ans }

That's why Hadley implemented his own if_else function. Your one if in f2 is substituted to many of in f1. And the last part is two separated actions for 'No' and 'Yes'.
So you can write your own ifelse function (for your own purposes) which will be much faster than the built-in

And the winner is
f3 = function(x, Len) {   rslt=rep("foo", Len)   rslt[which(x==1, arr.ind = T)]="bar"   return(rslt) }

Kind regards,
Mittyri Ing. Helmut Schütz 