ElMaestro
★★★

Denmark,
2016-12-17 20:40
(2657 d 16:44 ago)

Posting: # 16850
Views: 4,635
 

 Data frame challenge [🇷 for BE/BA]

Hi all,

I am struggling with import of a data from a file having a structure like:
[image]

Now, I would like R to understand that
1. we are not dealing with factors, the data frame is a series of numeric vectors.
2. Whenever we encounter something non-numeric such as Apple and Banana we will set it to zero.

So I have this solution:
A=read.table("SomeFile.csv", header=T)
for (i in 1:ncol(A))
  {
    A[,i]=as.numeric(as.character(A[,i]))
    A[,i][is.na(A[,i])]=0
  }


A is now effectively a matrix, or we can as.matrix or matrix it directly.
And although it works, it is painfully unreadable and it also throws warnings (coercion).

I am looking for a solution that is much more readable and not throwing warnings (and we are not going to discuss suppression of warnings, please, it sends shivers down my spine to do it). Regarding readability: I am fine with a solution that takes more coding/more lines, so this is not a competition in doing a lot of stuff in a single line. Single line = more elegance, to some = less readbility to a simpleton like me.

Can anyone help here?

Many thanks.

Pass or fail!
ElMaestro
mittyri
★★  

Russia,
2016-12-18 02:41
(2657 d 10:44 ago)

@ ElMaestro
Posting: # 16851
Views: 3,949
 

 Data frame challenge

Hi ElMaestro,

I'd like
A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
apply(A, 2, function(x){
       as.numeric(replace(x,
         !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x), 0))})


But more readable would be
A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
checkandreplace <- function(x){
  # which values are looking not as a numeric?
  isnotnumeric <- !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
  # we need to replace them with 0
  replaced0 <- replace(x, isnotnumeric, 0)
  # and to convert it all to numeric
  return(as.numeric(replaced0))
}
apply(A, 2, checkandreplace)


Enjoy!

Kind regards,
Mittyri
ElMaestro
★★★

Denmark,
2016-12-18 12:47
(2657 d 00:38 ago)

@ mittyri
Posting: # 16854
Views: 3,940
 

 Data frame challenge

Hi Mittyri,

A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")

❝ apply(A, 2, function(x){

❝        as.numeric(replace(x,
❝          !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x), 0))})


❝ But more readable would be

A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")

❝ checkandreplace <- function(x){

❝   # which values are looking not as a numeric?
❝   isnotnumeric <- !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
❝   # we need to replace them with 0
❝   replaced0 <- replace(x, isnotnumeric, 0)
❝   # and to convert it all to numeric
❝   return(as.numeric(replaced0))

❝ }

❝ apply(A, 2, checkandreplace)



My eyes! My eyes! It hurts. :lol3::lol3:

Pass or fail!
ElMaestro
wligtenberg
☆    

The Netherlands,
2016-12-19 15:29
(2655 d 21:56 ago)

@ ElMaestro
Posting: # 16856
Views: 3,898
 

 Data frame challenge

I am a fan of the data.table package. So my answer would be this:

library(data.table)
a <- fread("./test.csv", colClasses = "character")
a <- a[, lapply(.SD, as.numeric)]
a <- a[, lapply(.SD, function(x){ifelse(is.na(x), 0, x)})]


You need to do it in two steps, else the class stays character.
zizou
★    

Plzeň, Czech Republic,
2016-12-19 19:41
(2655 d 17:44 ago)

@ ElMaestro
Posting: # 16857
Views: 3,863
 

 Data frame challenge

Dear ElMaestro.

I think there should be a warning message when you have apples and bananas in your data. x)

Function as.numeric() produces Warning message because there is no numeric value in case of "Banana" (etc.) resulting in NA.

If you expect numeric values with some text known before you can use:
A=read.table("SomeFile.csv", colClasses="numeric", na.strings=c("Banana","Apple"), header=T)
If colClasses set to numeric and there is a string not mentioned in na.strings (e.g. "Orange"), then there will be an error in function read.table().

If Bananas and Apples should be managed in different way, I would read the data as character (no specification of colClasses). Then change e.g. "Banana" for "0" and "Apple" for NA and change type to numeric using as.numeric().
text=c("Apple","Banana")
#text=c("Apple","Banana","Orange")
for (i in 1:length(text)){
 if(text[i]=="Apple"){text[i]="0"}
 if(text[i]=="Banana"){text[i]=NA}
}
as.numeric(text)
#[1]  0 NA # no error and no warning

If "Orange" will be also in text/data, it will be treated as NA by coercion with the warning message.

In such data with numeric+character values in one variable I would store it as character only (with no statistics or whatever).

The principle of change all string values (in many of numeric values) to zero without warnings seems strange to me to me (simple example is to have a wrong value 1,25 in the data instead of correct value 1.25). It seems Ok. to have a warning about changing the value to NA. But of course, when all data are correct and there are many of string values which should be a zero, the warning is unwished.
To avoid supressing using suppressWarnings(as.numeric(text)) it must be done as mittyri wrote, i.e. replace all character values to "0" before using the function as.numeric(). (Nevertheless the result is similar, the values of type "1,25" or "1.25*" will be changed to zero without warnings.)

Best regards,
zizou

No one likes warnings even though they can help!
ElMaestro
★★★

Denmark,
2016-12-19 22:06
(2655 d 15:18 ago)

@ zizou
Posting: # 16859
Views: 3,826
 

 Data frame challenge

Dear zizou,

❝ If you expect numeric values with some text known before you can use:

A=read.table("SomeFile.csv", colClasses="numeric", na.strings=c("Banana","Apple"), header=T)


Thanks. Actually, what I have are vectors of times for plasma samples, and vectors with corresponding concentrations. Can use that to extract a bunch of info, including AUCt and Cmax and much else.
Sometimes unquantified (unquantifiable) concentrations for a variety of reasons are marked like "BLQ", "BLQ/2", "?", "???", "NA", "Missing", "M", "Note", "UNQ", and any other variation you can think of in English, German, Swahili, French, Inuit and 56 other languages.
I may be exaggerating ever so slightly, what I am trying to say is that it is a wee bit open-ended with those non-numeric strings. :-)

Pass or fail!
ElMaestro
UA Flag
Activity
 Admin contact
22,957 posts in 4,819 threads, 1,639 registered users;
98 visitors (0 registered, 98 guests [including 4 identified bots]).
Forum time: 13:25 CET (Europe/Vienna)

Nothing shows a lack of mathematical education more
than an overly precise calculation.    Carl Friedrich Gauß

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5