Bioequivalence and Bioavailability Forum

Main page Policy/Terms of Use Abbreviations Latest Posts

 Log-in |  Register |  Search

Back to the forum  Query: 2017-09-26 15:01 CEST (UTC+2h)
 
ElMaestro
Hero

Denmark,
2016-12-17 19:40

Posting: # 16850
Views: 1,694
 

 Data frame challenge [R for BE/BA]

Hi all,

I am struggling with import of a data from a file having a structure like:
[image]

Now, I would like R to understand that
1. we are not dealing with factors, the data frame is a series of numeric vectors.
2. Whenever we encounter something non-numeric such as Apple and Banana we will set it to zero.

So I have this solution:
A=read.table("SomeFile.csv", header=T)
for (i in 1:ncol(A))
  {
    A[,i]=as.numeric(as.character(A[,i]))
    A[,i][is.na(A[,i])]=0
  }


A is now effectively a matrix, or we can as.matrix or matrix it directly.
And although it works, it is painfully unreadable and it also throws warnings (coercion).

I am looking for a solution that is much more readable and not throwing warnings (and we are not going to discuss suppression of warnings, please, it sends shivers down my spine to do it). Regarding readability: I am fine with a solution that takes more coding/more lines, so this is not a competition in doing a lot of stuff in a single line. Single line = more elegance, to some = less readbility to a simpleton like me.

Can anyone help here?

Many thanks.

I could be wrong, but…


Best regards,
ElMaestro

- since June 2017 having an affair with the bootstrap.
mittyri
Senior

Russia,
2016-12-18 01:41

@ ElMaestro
Posting: # 16851
Views: 1,446
 

 Data frame challenge

Hi ElMaestro,

I'd like
A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
apply(A, 2, function(x){
       as.numeric(replace(x,
         !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x), 0))})


But more readable would be
A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
checkandreplace <- function(x){
  # which values are looking not as a numeric?
  isnotnumeric <- !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
  # we need to replace them with 0
  replaced0 <- replace(x, isnotnumeric, 0)
  # and to convert it all to numeric
  return(as.numeric(replaced0))
}
apply(A, 2, checkandreplace)


Enjoy!

Kind regards,
Mittyri
ElMaestro
Hero

Denmark,
2016-12-18 11:47

@ mittyri
Posting: # 16854
Views: 1,428
 

 Data frame challenge

Hi Mittyri,

» A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
» apply(A, 2, function(x){
»        as.numeric(replace(x,
»          !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x), 0))})

»
» But more readable would be
» A=read.table("SomeFile.csv", header=T, stringsAsFactors = F, sep=",")
» checkandreplace <- function(x){
»   # which values are looking not as a numeric?
»   isnotnumeric <- !grepl("[-]?[0-9]+[.]?[0-9]*|[-]?[0-9]+[L]?|[-]?[0-9]+[.]?[0-9]*[eE][0-9]+",x)
»   # we need to replace them with 0
»   replaced0 <- replace(x, isnotnumeric, 0)
»   # and to convert it all to numeric
»   return(as.numeric(replaced0))
» }
» apply(A, 2, checkandreplace)



My eyes! My eyes! It hurts. :lol3::lol3:

I could be wrong, but…


Best regards,
ElMaestro

- since June 2017 having an affair with the bootstrap.
wligtenberg
Junior

The Netherlands,
2016-12-19 14:29
(edited by wligtenberg on 2016-12-19 15:13)

@ ElMaestro
Posting: # 16856
Views: 1,360
 

 Data frame challenge

I am a fan of the data.table package. So my answer would be this:

library(data.table)
a <- fread("./test.csv", colClasses = "character")
a <- a[, lapply(.SD, as.numeric)]
a <- a[, lapply(.SD, function(x){ifelse(is.na(x), 0, x)})]


You need to do it in two steps, else the class stays character.
zizou
Junior

Plzeň, Czech Republic,
2016-12-19 18:41

@ ElMaestro
Posting: # 16857
Views: 1,340
 

 Data frame challenge

Dear ElMaestro.

I think there should be a warning message when you have apples and bananas in your data. x)

Function as.numeric() produces Warning message because there is no numeric value in case of "Banana" (etc.) resulting in NA.

If you expect numeric values with some text known before you can use:
A=read.table("SomeFile.csv", colClasses="numeric", na.strings=c("Banana","Apple"), header=T)
If colClasses set to numeric and there is a string not mentioned in na.strings (e.g. "Orange"), then there will be an error in function read.table().

If Bananas and Apples should be managed in different way, I would read the data as character (no specification of colClasses). Then change e.g. "Banana" for "0" and "Apple" for NA and change type to numeric using as.numeric().
text=c("Apple","Banana")
#text=c("Apple","Banana","Orange")
for (i in 1:length(text)){
 if(text[i]=="Apple"){text[i]="0"}
 if(text[i]=="Banana"){text[i]=NA}
}
as.numeric(text)
#[1]  0 NA # no error and no warning

If "Orange" will be also in text/data, it will be treated as NA by coercion with the warning message.

In such data with numeric+character values in one variable I would store it as character only (with no statistics or whatever).

The principle of change all string values (in many of numeric values) to zero without warnings seems strange to me to me (simple example is to have a wrong value 1,25 in the data instead of correct value 1.25). It seems Ok. to have a warning about changing the value to NA. But of course, when all data are correct and there are many of string values which should be a zero, the warning is unwished.
To avoid supressing using suppressWarnings(as.numeric(text)) it must be done as mittyri wrote, i.e. replace all character values to "0" before using the function as.numeric(). (Nevertheless the result is similar, the values of type "1,25" or "1.25*" will be changed to zero without warnings.)

Best regards,
zizou

No one likes warnings even though they can help!
ElMaestro
Hero

Denmark,
2016-12-19 21:06

@ zizou
Posting: # 16859
Views: 1,316
 

 Data frame challenge

Dear zizou,

» If you expect numeric values with some text known before you can use:
» A=read.table("SomeFile.csv", colClasses="numeric", na.strings=c("Banana","Apple"), header=T)

Thanks. Actually, what I have are vectors of times for plasma samples, and vectors with corresponding concentrations. Can use that to extract a bunch of info, including AUCt and Cmax and much else.
Sometimes unquantified (unquantifiable) concentrations for a variety of reasons are marked like "BLQ", "BLQ/2", "?", "???", "NA", "Missing", "M", "Note", "UNQ", and any other variation you can think of in English, German, Swahili, French, Inuit and 56 other languages.
I may be exaggerating ever so slightly, what I am trying to say is that it is a wee bit open-ended with those non-numeric strings. :-)

I could be wrong, but…


Best regards,
ElMaestro

- since June 2017 having an affair with the bootstrap.
Back to the forum Activity
 Thread view
Bioequivalence and Bioavailability Forum | Admin contact
17,327 Posts in 3,706 Threads, 1,069 registered users;
32 users online (1 registered, 31 guests).

It has yet to be proven
that intelligence has any survival value.    Arthur C. Clarke

The BIOEQUIVALENCE / BIOAVAILABILITY FORUM is hosted by
BEBAC Ing. Helmut Schütz
XHTML/CSS RSS Feed