Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-04-26 16:03

Posting: # 20244
Views: 2,734
 

 R Inferno [R for BE/BA]

Dear R-Users,

I can’t get my head around this:

loc.stat <- function(x, type, na.rm, distr) {
 non.numerics    <- which(is.na(suppressWarnings(as.numeric(x))))
 x[non.numerics] <- NA
 x <- as.numeric(x)
 if (distr == "nonpar") {
   switch (type,
     pct2.5  = as.numeric(quantile(x, probs=0.025, na.rm=na.rm)),
     pct25   = as.numeric(quantile(x, probs=0.25, na.rm=na.rm)),
     median  = as.numeric(median(x, na.rm=na.rm)),
     pct75   = as.numeric(quantile(x, probs=0.75, na.rm=na.rm)),
     pct97.5 = as.numeric(quantile(x, probs=0.975, na.rm=na.rm)))
 }
 if (distr == "normal") {
   mean.x <-mean(x, na.rm=na.rm)
   sd.x   <- sd(x, na.rm=na.rm)
   n.not.nas <- length(x)-length(which(is.na(x)))
   se <- sd.x/sqrt(n.not.nas)
   switch (type,
     cla2.5  = mean.x+qt(0.025, df=n.not.nas-1, lower.tail=TRUE)*se,
     ar.mean = mean.x,
     cla97.5 = mean.x+qt(0.975, df=n.not.nas-1, lower.tail=TRUE)*se)
 }
}
set.seed(123456)
x <- rnorm(12, 100, 20)
median <- loc.stat(x, "median", TRUE, "nonpar")
mean   <- loc.stat(x, "ar.mean", TRUE, "normal")


Gives

R> median
NULL
R> mean
[1] 109.7066


Why the heck? Quoting the R-Inferno:

Curly braces are also useful with loops, switch and if.

What? How? Google and R-help were not my friends.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
mittyri
★★  

Russia,
2019-04-26 16:20

@ Helmut
Posting: # 20245
Views: 2,576
 

 full conditions!

Dear Helmut,

the code is worth a thousand sentences
loc.stat <- function(x, type, na.rm, distr) {
  non.numerics    <- which(is.na(suppressWarnings(as.numeric(x))))
  x[non.numerics] <- NA
  x <- as.numeric(x)
  if (distr == "nonpar") {
    switch (type,
            pct2.5  = as.numeric(quantile(x, probs=0.025, na.rm=na.rm)),
            pct25   = as.numeric(quantile(x, probs=0.25, na.rm=na.rm)),
            median  = as.numeric(median(x, na.rm=na.rm)),
            pct75   = as.numeric(quantile(x, probs=0.75, na.rm=na.rm)),
            pct97.5 = as.numeric(quantile(x, probs=0.975, na.rm=na.rm)))
  } else if (distr == "normal") {
    mean.x <-mean(x, na.rm=na.rm)
    sd.x   <- sd(x, na.rm=na.rm)
    n.not.nas <- length(x)-length(which(is.na(x)))
    se <- sd.x/sqrt(n.not.nas)
    switch (type,
            cla2.5  = mean.x+qt(0.025, df=n.not.nas-1, lower.tail=TRUE)*se,
            ar.mean = mean.x,
            cla97.5 = mean.x+qt(0.975, df=n.not.nas-1, lower.tail=TRUE)*se)
  } else {
    NULL
  }
}
set.seed(123456)
x <- rnorm(12, 100, 20)
median <- loc.stat(x, "median", TRUE, "nonpar")
mean   <- loc.stat(x, "ar.mean", TRUE, "normal")

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-04-26 16:41

@ mittyri
Posting: # 20247
Views: 2,573
 

 full conditions!

Hi mittyri,

you made my day! Goofy and R-help were not my friends…

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
d_labes
★★★

Berlin, Germany,
2019-04-27 14:29

@ mittyri
Posting: # 20252
Views: 2,508
 

 full conditions!

Dear mittyri!

May I ask: Whats goin on here?
Don't get it.

Do you have the thousand sentences for me?

Regards,

Detlew
mittyri
★★  

Russia,
2019-04-27 22:23
(edited by mittyri on 2019-04-27 23:47)

@ d_labes
Posting: # 20253
Views: 2,503
 

 magik of R implicit return

Dear Detlew!

» Do you have the thousand sentences for me?
I have many of those, but for clarity I prefer a simplified version what was going on:

veryspecialfunction <- function(flag) {
  if (flag == 1) {
    "one"
  }
 
  if (flag == 2){
    "two"
  } 
}

Now running:
veryspecialfunction(2)
[1] "two"

veryspecialfunction(1)
# nothing here!
typeof(veryspecialfunction(1))
[1] "NULL"

Kind regards,
Mittyri
ElMaestro
★★★

Belgium?,
2019-04-27 22:34

@ mittyri
Posting: # 20254
Views: 2,502
 

 magik of R implicit return

Hi all,

"smart" lost this round.

I always use explicit return(something) in my functions, when they are supposed to return a value regardless of whether it is a list, numeric, NULL or data frame some other thingy.

Yes, that involves tapping a few times more on the keyboard but it does solve a lot of problems and best of all, it makes code readable.

In language like C you would never run into this phenomenon. You declare explicitly which type the function returns and you will be bombarded with errors or warnings if you fail to return the right type of if you forget the return statement. I find it very comfortable.

I could be wrong, but...
Best regards,
ElMaestro
d_labes
★★★

Berlin, Germany,
2019-04-28 19:36

@ ElMaestro
Posting: # 20255
Views: 2,444
 

 implicit return?

Hi Öberster Größter Meister,

» "smart" lost this round.

Totally.

» I always use explicit return(something) in my functions, when they are supposed to return a value regardless of whether it is a list, numeric, NULL or data frame some other thingy.

A good idea. But if you use R you are told that doing so has the penalty of a longer run-time.
Bloody Hell!

» Yes, that involves tapping a few times more on the keyboard but it does solve a lot of problems and best of all, it makes code readable.

Full ACK!

» In language like C you would never run into this phenomenon. You declare explicitly which type the function returns and you will be bombarded with errors or warnings if you fail to return the right type of if you forget the return statement. I find it very comfortable.

Weakly typed (R) against strong typed language (C or C++).
But IIRC has C or C++ also some weaknesses.

Regards,

Detlew
ElMaestro
★★★

Belgium?,
2019-04-28 20:36

@ d_labes
Posting: # 20256
Views: 2,446
 

 implicit return?

Hi d_labes,

» But IIRC has C or C++ also some weaknesses.

That's right.
For example, the number of built in functions in C is small. You need to write most functionality from scratch, so what you can do with three lines of code in R can easily be 100 lines of code in C. Much more if you think of e.g. the plot function which opens up a graphic device and with one line gives you a graph. Horrendously many hours of programming is needed in C to do such a thing.

The biggest disadvantage I can think of in C is clearly: Pointers (especially character pointers). They were hard to grasp initially. C does not have strings.
The biggest advantage I can think of in C is: Pointers. :-D They speed things up immensely. Like in the bootstrap code for dissolution posted here in this forum some time ago. Infinitely faster than any implementation seen out there. But then again, when you are working on a dossier it usually does not matter for real if you get the output in 1 or 10000 seconds as long as you meet your submission deadline :-)

I am not well versed with C++ as I never really had a need for classes/objects. They are not so necessary for numerical purposes.

I could be wrong, but...
Best regards,
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-04-28 20:50

@ d_labes
Posting: # 20257
Views: 2,438
 

 implicit return?

Dear Detlew,

» » I always use explicit return(something) in my functions,…
»
» A good idea.

Yep.

» But if you use R you are told that doing so has the penalty of a longer run-time.
» Bloody Hell!

Does that really matter?
library(microbenchmark)
impl <- function(x) {
  if (x == 0) {
    "foo"
  } else {
    "bar"
  }
}
expl <- function(x) {
  if (x == 0) {
    return("foo")
  } else {
    return("bar")
  }
}
res <- microbenchmark(impl(round(runif(1), 0)),
                      expl(round(runif(1), 0)), times=2000L)
print(res)

Unit: microseconds
                     expr   min    lq     mean median    uq      max neval cld
 impl(round(runif(1), 0)) 1.811 1.812 2.686167  1.813 2.114 1383.479  2000   a
 expl(round(runif(1), 0)) 1.811 1.812 2.955002  1.813 2.114 1879.152  2000   a


» Weakly typed (R) against strong typed language (C or C++).

Exactly. You know that I’m facing a similar story with the forum scripts (PHP: weakly typed) generating database-queries (SQL: strongly typed). Bloody scavenger hunt.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2019-04-28 21:16
(edited by ElMaestro on 2019-04-28 21:40)

@ Helmut
Posting: # 20258
Views: 2,443
 

 implicit return?

Hi Hotzi,

I even go a step further, perhaps only because I am rather dumb.

My functions almost always look like this:
Foo=function(bar)
{
  ##blah blah, conditions and functionality and loops and loads of curly braces here.
  ##whatever happens in loops and conditions it sets a variable called e.g. rslt

  return(rslt) ##I always have a line at the end that looks like this!
}


I do that to make sure that whatever happens with pairs of curly braces I know that my function return is captured at the end. When finding errors and bugs I just need to look at rslt and work backwards. print(rslt) immediately before the return statement is often a very good beginning to that. Not sure if this is smarter (in terms of error finding) but it is my solution to a lot of trouble. Possibly just owing to my own lack of skills.

In C you can also exit with curly braces in the middle of a function. Debugging that sh!t is impossible for me:-).

I could be wrong, but...
Best regards,
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-04-28 23:38

@ ElMaestro
Posting: # 20259
Views: 2,425
 

 implicit return?

Hi ElMaestro,

» My functions almost always look like this:
» » Foo=function(bar)
» {
»   ##blah blah, conditions and functionality and loops and loads of curly braces here.
»   ##whatever happens in loops and conditions it sets a variable called e.g. rslt
»
»   return(rslt) ##I always have a line at the end that looks like this!
» }


Good practice. I use it in longer functions as well and in ones which become part of a package, always. Only very, very slightly slower:
library(microbenchmark)
impl <- function(x) {
  if (x == 0) {
    "foo"
  } else {
    "bar"
  }
}
expl <- function(x) {
  if (x == 0) {
    return("foo")
  } else {
    return("bar")
  }
}
braces.with.suspenders <- function(x) {
  if (x == 0) {
    res <- "foo"
  } else {
    res <- "bar"
  }
  return(res)
}
res <- microbenchmark(impl(round(runif(1), 0)),
                      expl(round(runif(1), 0)),
                      braces.with.suspenders(round(runif(1), 0)),
                      times=3000L)
print(res)

Unit: microseconds
                                      expr   min    lq     mean median    uq    max neval cld
                  impl(round(runif(1), 0)) 1.811 1.812 1.976830  1.813 2.114  5.434  3000  a
                  expl(round(runif(1), 0)) 1.811 1.812 1.976926  1.813 2.114  6.340  3000  a
braces.with.suspenders(round(runif(1), 0)) 1.811 1.812 2.027029  2.114 2.114 16.905  3000   b

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2019-04-29 09:40

@ Helmut
Posting: # 20260
Views: 2,389
 

 implicit return?

Hi Hötzi,


Interestingly (at least to moi!), I tried out of curiosity to do this:

QWERTY=function(x)
{
 ifelse ((x==0), "foo", "bar")
}


The benchmark for this version is much, much slower than any of the other proposals (at least on my system). Another example that shows that condensing code towards fewer keystrokes is not always fastest. I wonder what goes on internally since this is much slower? Perhaps space is allocated for both "foo" and "bar" in memory and then there is a decision as to which one to throw away and which one to keep on the CPU stack? I don't have a very good at understanding of these things.

This one is appearing even worse:
QWERTY2=function(x)
{
 ifelse (x, "bar", "foo")
}


I have no idea why QWERTY2 would not outperform QWERTY or at least be the same if R has a kind of optimiser in the interpreter. Would you happen know????

I could be wrong, but...
Best regards,
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2019-04-29 10:41

@ ElMaestro
Posting: # 20261
Views: 2,370
 

 implicit return?

Hi ElMaestro,


» Interestingly (at least to moi!), I tried out of curiosity to do this:
»
» QWERTY=function(x)
» {
»  ifelse ((x==0), "foo", "bar")
» }

»
» The benchmark for this version is much, much slower than any of the other proposals (at least on my system).

On any system. ;-)

» Another example that shows that condensing code towards fewer keystrokes is not always fastest. I wonder what goes on internally since this is much slower?

Use ifelse() for vectorized conditions. See this thread.

Try this:
library(microbenchmark)
fun1 <- function(x, cond, print=FALSE) {
  if (x == cond) {
    "foo"
    if (print) cat("foo\n")
  } else {
    "bar"
     if (print) cat("foo\n")
 }
}
fun2 <- function(x, cond, print=FALSE) {
  if (x == cond) {
    if (print) cat("foo\n")
    return("foo")
  } else {
    if (print) cat("bar\n")
    return("bar")
  }
}
fun3 <- function(x, cond, print=FALSE) {
  if (x == cond) {
    res <- "foo"
  } else {
    res <- "bar"
  }
  if (print) cat(res, "\n")
  return(res)
}
fun4 <- function(x, cond, print=FALSE) {
  ifelse (x == cond, "foo", "bar")
}
fun5 <- function(x, cond, print=FALSE) {
  ifelse ((x == cond), res <- "foo", res <- "bar")
  if (print) cat(res, "\n")
  return(res)
}
fun6 <- function(x, cond, print=FALSE) {
  ifelse (x == cond, res <- "foo", res <- "bar")
  if (print) cat(res, "\n")
  return(res)
}
res1 <- microbenchmark(fun1(round(runif(1, 0, 1), 0), 0),
                       fun2(round(runif(1, 0, 1), 0), 0),
                       fun3(round(runif(1, 0, 1), 0), 0),
                       fun4(round(runif(1, 0, 1), 0), 0),
                       fun5(round(runif(1, 0, 1), 0), 0),
                       fun6(round(runif(1, 0, 1), 0), 0),
                       times=1000L)
res2 <- microbenchmark(fun1(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       fun2(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       fun3(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       fun4(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       fun5(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       fun6(round(c(runif(1, 0, 1), runif(1, 1, 2)), 0),
                            c(1, 2)),
                       times=1000L)
print(res1)
print(res2)

Shortended output of res1 and res2:
  expr  median cld
fun1()   2.415   a
fun2()   2.415   a
fun3()   2.416   a
fun4()   3.623   a
fun5()   3.624   a
fun6()   3.624   a

  expr  median cld
fun1() 54.0350   b
fun2() 53.7340   b
fun3() 54.0360   b
fun4() 12.2265  a 
fun5() 12.6800  a 
fun6() 12.3780  a 
There were 50 or more warnings (use warnings() to see the first 50)

Hey, were are the warnings coming from? Try fun1() to fun3() with a vector-condition and print=TRUE.

Cheers,
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Belgium?,
2019-04-29 12:55

@ Helmut
Posting: # 20262
Views: 2,350
 

 implicit return?

Hi Hötzi,

» Use ifelse() for vectorized conditions. See this thread.

Try this:
library(microbenchmark)

x=round(runif(100),0)
Len=100

f1=function(x, Len)
{
 ifelse ((x==0), "foo", "bar")
}

f1(x, Len)


f2=function(x, Len)
{
 rslt=rep("foo", Len)
 for (i in 1:Len)
  if (x[i]) rslt[i]="bar"
 return(rslt)
}

f2(x, Len)



res <- microbenchmark(
                      f1(round(runif(Len),0), Len),
                      f2(round(runif(Len),0), Len),
                      times=3000L)
print(res)


note I am passing Len to f1 even though it is not used there in order to give the function call the same overhead, otherwise the comparison might be called unfair.

On my system f2 is a lot faster, so the advantage of runif may be solely syntactic and not in any way a true vectorisation advantage (easier but not more efficient).

I could be wrong, but...
Best regards,
ElMaestro
mittyri
★★  

Russia,
2019-04-30 13:05
(edited by mittyri on 2019-04-30 13:17)

@ ElMaestro
Posting: # 20263
Views: 2,265
 

 built-in ifelse

Hi ElMaestro,

please see ifelse internals:
function (test, yes, no)
{
  if (is.atomic(test)) {
    if (typeof(test) != "logical")
      storage.mode(test) <- "logical"
    if (length(test) == 1 && is.null(attributes(test))) {
      if (is.na(test))
        return(NA)
      else if (test) {
        if (length(yes) == 1) {
          yat <- attributes(yes)
          if (is.null(yat) || (is.function(yes) && identical(names(yat),
            "srcref")))
            return(yes)
        }
      }
      else if (length(no) == 1) {
        nat <- attributes(no)
        if (is.null(nat) || (is.function(no) && identical(names(nat),
          "srcref")))
          return(no)
      }
    }
  }
  else test <- if (isS4(test))
    methods::as(test, "logical")
  else as.logical(test)
  ans <- test
  ok <- !is.na(test)
  if (any(test[ok]))
    ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
  if (any(!test[ok]))
    ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
  ans
}


That's why Hadley implemented his own if_else function. Your one if in f2 is substituted to many of in f1. And the last part is two separated actions for 'No' and 'Yes'.
So you can write your own ifelse function (for your own purposes) which will be much faster than the built-in

And the winner is
f3 = function(x, Len)
{
  rslt=rep("foo", Len)
  rslt[which(x==1, arr.ind = T)]="bar"
  return(rslt)
}

Kind regards,
Mittyri
Activity
 Admin contact
20,138 posts in 4,246 threads, 1,387 registered users;
online 3 (0 registered, 3 guests [including 3 identified bots]).
Forum time (Europe/Vienna): 23:34 CET

The fundamental cause of trouble in the world today is
that the stupid are cocksure
while the intelligent are full of doubt.    Bertrand Russell

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5