ElMaestro
★★★

Denmark,
2025-11-10 13:26
(205 d 11:08 ago)

(edited on 2025-11-10 13:42)
Posting: # 24483
Views: 2,916
 

 September trouble [🇷 for BE/BA]

Hi all,

can someone try this:

Date.str=paste0("01/", month.abb[c(1:12)], "/2024")
Date.I=as.Date(Date.str,format='%d/%b/%Y')
print(data.frame(Date.str, Date.I))

The code creates some dates as string for first day of all months and then tries to interpret the strings as dates. I am getting an NA for September.

Oddly, if I abbreviate September as "Sept" rather then "Sep"
as.Date("01/Sept/2024",format='%d/%b/%Y')

then it works, in spite of the fact that month.abb[9] is Sep and not Sept on my system.:-|

Does it behave the same way on your system? Mine is Win11 installed last week along with a fresh R install.

Add.: Oh dear, is %b locale-dependent? Shoot me, please. Put me out of my misery. This is not good.

Pass or fail!
ElMaestro
mittyri
★★  

Russia,
2025-11-10 21:38
(205 d 02:56 ago)

@ ElMaestro
Posting: # 24485
Views: 2,553
 

 September trouble

Hi ElMaestro,

On my system
Date.str=paste0("01/", month.abb[c(1:12)], "/2024")
Date.I=as.Date(Date.str,format='%d/%b/%Y')
print(data.frame(Date.str, Date.I))
      Date.str     Date.I
1  01/Jan/2024 2024-01-01
2  01/Feb/2024 2024-02-01
3  01/Mar/2024 2024-03-01
4  01/Apr/2024 2024-04-01
5  01/May/2024 2024-05-01
6  01/Jun/2024 2024-06-01
7  01/Jul/2024 2024-07-01
8  01/Aug/2024 2024-08-01
9  01/Sep/2024 2024-09-01
10 01/Oct/2024 2024-10-01
11 01/Nov/2024 2024-11-01
12 01/Dec/2024 2024-12-01


but
month.abb[9] [1] "Sep"
Sys.getlocale("LC_TIME")
[1] "English_United States.utf8"
as.Date("01/Sep/2024", format = '%d/%b/%Y')   
[1] "2024-09-01"
as.Date("01/Sept/2024", format = '%d/%b/%Y') 
[1] NA


❝ Add.: Oh dear, is %b locale-dependent? Shoot me, please. Put me out of my misery. This is not good.


You nailed it down.
Yes, %b (and thus as.Date()) is indeed locale-dependent in R, as it relies on the C strptime() function, which pulls month abbreviations from your system's locale settings.
On your setup (en_GB locale?), the system abbreviated name for September is "Sept" (a four-letter). So when you feed "01/Sep/2024" to as.Date(format = '%d/%b/%Y'), it fails with NA because the parser doesn't recognize "Sep" as matching the expected "Sept"
However, month.abb (and month.name) are hardcoded constants in R base package, always using the three-letter forms regardless of your locale. This creates the exact mismatch you're seeing.
I know it is painful, locales are driving me crazy everytime (so I stick to US years ago)

Kind regards,
Mittyri
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2025-11-11 10:37
(204 d 13:57 ago)

@ ElMaestro
Posting: # 24487
Views: 2,561
 

 More months trouble

Hi ElMaestro,

Mittyri explained it already. With my locale (German_Austria) it is even worse…
      Date.str     Date.I
1  01/Jan/2024       <NA>
2  01/Feb/2024 2024-02-01
3  01/Mar/2024       <NA>
4  01/Apr/2024 2024-04-01
5  01/May/2024       <NA>
6  01/Jun/2024 2024-06-01
7  01/Jul/2024 2024-07-01
8  01/Aug/2024 2024-08-01
9  01/Sep/2024 2024-09-01
10 01/Oct/2024       <NA>
11 01/Nov/2024 2024-11-01
12 01/Dec/2024       <NA>

… because the constants month.name and month.abb give the English month names and their three letter abbreviations irrespective of the locale.
month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December"

month.abb
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

Therefore, with your code I get NAs for January, March, May, and October because they do not match my locale: Jänner (Jän), März (Mär), Mai (Mai), Oktober (Okt). With German_Germany January would match: Januar (Jan).

Does that help?

ol        <- Sys.getlocale("LC_TIME")   # save original locale
ol                                      # show it
#[1] "German_Austria.utf8"              # on my system (Win11, R 4.5.2)
loc.month <- format(seq.Date(from = as.Date("2024-01-01"),
                             to = as.Date("2024-12-01"),
                             by = "m"), "%b")
local     <- paste0("01/", loc.month[c(1:12)], "/2024")
invisible(Sys.setlocale("LC_TIME", "English"))
en        <- paste0("01/", month.abb, "/2024")
Date.I    <- as.Date(en, format = "%d/%b/%Y")
invisible(Sys.setlocale("LC_TIME", ol)) # restore original
print(data.frame(local, en, Date.I), row.names = FALSE)

       local          en     Date.I
 01/Jän/2024 01/Jan/2024 2024-01-01
 01/Feb/2024 01/Feb/2024 2024-02-01
 01/Mär/2024 01/Mar/2024 2024-03-01
 01/Apr/2024 01/Apr/2024 2024-04-01
 01/Mai/2024 01/May/2024 2024-05-01
 01/Jun/2024 01/Jun/2024 2024-06-01
 01/Jul/2024 01/Jul/2024 2024-07-01
 01/Aug/2024 01/Aug/2024 2024-08-01
 01/Sep/2024 01/Sep/2024 2024-09-01
 01/Okt/2024 01/Oct/2024 2024-10-01
 01/Nov/2024 01/Nov/2024 2024-11-01
 01/Dez/2024 01/Dec/2024 2024-12-01


I don’t know what you want to achieve but in my codes I only use the ISO-date YYYY-MM-DD (in R: "%Y-%m-%d" or simply "%F") without caring about the names of months.

format(Sys.Date(), "%Y-%m-%d"); format(Sys.Date(), "%F")
[1] "2025-11-11"
[1] "2025-11-11"


Month names and abbreviations can be interesting (check esp. the French ones):

ol   <- Sys.getlocale("LC_TIME")
from <- as.Date("2025-01-01")
to   <- as.Date("2025-12-01")
by   <- "m"
en   <- data.frame(English = month.name, abbr = month.abb)
invisible(Sys.setlocale("LC_TIME", "German_Austria"))
at   <- data.frame(Austrian = format(seq.Date(from, to, by), "%B"),
                   abbr = format(seq.Date(from, to, by), "%b"))
invisible(Sys.setlocale("LC_TIME", "German_Germany"))
de   <- data.frame(German = format(seq.Date(from, to, by), "%B"),
                   abbr = format(seq.Date(from, to, by), "%b"))
invisible(Sys.setlocale("LC_TIME", "French_France"))
fr   <- data.frame(French = format(seq.Date(from, to, by), "%B"),
                   abbr = format(seq.Date(from, to, by), "%b"))
invisible(Sys.setlocale("LC_TIME", "Danish_Denmark"))
da   <- data.frame(Danish = format(seq.Date(from, to, by), "%B"),
                   abbr = format(seq.Date(from, to, by), "%b"))
invisible(Sys.setlocale("LC_TIME", ol))
months <- cbind(en, at, de, fr, da)
print(months, row.names = FALSE, right= FALSE)

 English   abbr Austrian  abbr German    abbr French    abbr  Danish    abbr
 January   Jan  Jänner    Jän  Januar    Jan  janvier   janv. januar    jan
 February  Feb  Februar   Feb  Februar   Feb  février   févr. februar   feb
 March     Mar  März      Mär  März      Mrz  mars      mars  marts     mar
 April     Apr  April     Apr  April     Apr  avril     avr.  april     apr
 May       May  Mai       Mai  Mai       Mai  mai       mai   maj       maj
 June      Jun  Juni      Jun  Juni      Jun  juin      juin  juni      jun
 July      Jul  Juli      Jul  Juli      Jul  juillet   juil. juli      jul
 August    Aug  August    Aug  August    Aug  août      août  august    aug
 September Sep  September Sep  September Sep  septembre sept. september sep
 October   Oct  Oktober   Okt  Oktober   Okt  octobre   oct.  oktober   okt
 November  Nov  November  Nov  November  Nov  novembre  nov.  november  nov
 December  Dec  Dezember  Dez  Dezember  Dez  décembre  déc.  december  dec


Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2025-11-13 06:20
(202 d 18:14 ago)

@ Helmut
Posting: # 24495
Views: 2,389
 

 More months trouble

Hi Hötzi,
hi all,

❝ I don’t know what you want to achieve but (...)


Well, some degree of sanity is part of my general goals. But obviously, in that case I should not have gone into bioequivalence to start with.

Some days ago I received a csv file with dates like "03/Apr/2024" for ~5000 data points or so (cancer PD marker level, subjects, visits, timings, and a Brazilian other columns). There are repeats in the data and I need to extract the latest data for of such repeats as these will be the ones used for stats. So, I have to sort the data by subject and by date.
Since %b as Mittyri says uses a deep C call which queries the locale, there is no easy way of doing this.
The solution I see is to translate every month abbrev in the data to the corresponding month abbrev of my locale and only then can it be sorted. Same thing if I convert the month abbrev to integer. Both options are equally viable. Clumsy, but I will get there. :-)

I do not in any way have influence on the way the data file is generated or which date format the other party decide to use before sending it to me.

Pass or fail!
ElMaestro
Helmut
★★★
avatar
Homepage
Vienna, Austria,
2025-11-13 13:45
(202 d 10:48 ago)

@ ElMaestro
Posting: # 24496
Views: 2,374
 

 Know the month abbreviations

Hi ElMaestro,

❝ Well, some degree of sanity is part of my general goals. But obviously, in that case I should not have gone into bioequivalence to start with.


:-D

❝ […] a csv file with dates like "03/Apr/2024" …

❝ The solution I see is to translate every month abbrev in the data to the corresponding month abbrev of my locale and only then can it be sorted. Same thing if I convert the month abbrev to integer. Both options are equally viable. Clumsy, but I will get there. :-)


Do you import the data with read.csv()? I would suggest read.table() instead. Then you can makes sure the ugly column is read as characters and it’s easy to split it into day, month, year, and convert to an ISO-date.

months <- function(lang = "en") {
  if (!lang %in% c("en", "at", "de", "da", "fr"))
    stop ("language \"", lang, "\" not implemented.")
  switch(lang,
    "en" = setNames(1:12, month.abb),
    "at" = setNames(1:12, c("Jän", "Feb", "Mär", "Apr", "Mai", "Jun",
                            "Jul", "Aug", "Sep", "Okt", "Nov", "Dec")),
    "de" = setNames(1:12, c("Jan", "Feb", "Mrz", "Apr", "Mai", "Jun",
                            "Jul", "Aug", "Sep", "Okt", "Nov", "Dec")),
    "da" = setNames(1:12, c("jan", "feb", "mar", "apr", "maj", "jun",
                            "jul", "aug", "sep", "okt", "nov", "dec")),
    "fr" = setNames(1:12, c("janv.", "févr.", "mars", "avr.", "mau", "juin",
                            "juil.", "août", "sept.", "oct.", "nov.", "déc.")))
}
iso.date <- function(lang = "en", datum, sep = "/") {
  if (!lang %in% c("en", "at", "de", "da", "fr"))
    stop ("language \"", lang, "\" not implemented.")
  sep.locs <- gregexec(sep, datum, perl = TRUE)[[1]][1, ]
  if (!length(sep.locs) == 2)
    stop ("sep \"", sep, "\" must occur exactly twice in each date.")
  d <- substr(datum, 1, sep.locs[1] - 1)
  m <- substr(datum, sep.locs[1] + 1, sep.locs[2] - 1)
  m <- sprintf("%02d", which(names(months(lang)) == m))
  y <- substr(datum, sep.locs[2] + 1, nchar(datum))
  paste0(y, "-", m, "-", d)
}
# Example with random dates
set.seed(123456)
days    <- 7
mons    <- 6
day     <- sprintf("%02d", round(runif(days, 1, 28), 0))
month   <- round(runif(mons, 1, 12))
en.date <- paste0(day, "/", names(months("en")[month]), "/", "2025")
de.date <- paste0(day, "/", names(months("de")[month]), "/", "2025")
da.date <- paste0(day, "/", names(months("da")[month]), "/", "2025")
fr.date <- paste0(day, "/", names(months("fr")[month]), "/", "2025")
dates   <- data.frame(lang = rep(c("en", "de", "da", "fr"), each = days),
                      date = c(en.date, de.date, da.date, fr.date), iso = NA)
for (j in 1:nrow(dates)) {
 dates$iso[j] <- iso.date(lang = dates$lang[j], dates$date[j])
}
dates$iso <- as.Date(dates$iso, format = "%F")
str(dates); print(dates[with(dates, order(lang, iso)), ], row.names = FALSE, right = FALSE)

'data.frame':   28 obs. of  3 variables:
 $ lang: chr  "en" "en" "en" "en" ...
 $ date: chr  "23/Feb/2025" "21/Dec/2025" "12/Mar/2025" "10/Oct/2025" ...
 $ iso : Date, format: "2025-02-23" "2025-12-21" ...
 lang date          iso       
 da   15/feb/2025   2025-02-15
 da   23/feb/2025   2025-02-23
 da   12/mar/2025   2025-03-12
 da   11/aug/2025   2025-08-11
 da   10/okt/2025   2025-10-10
 da   06/nov/2025   2025-11-06
 da   21/dec/2025   2025-12-21
 de   15/Feb/2025   2025-02-15
 de   23/Feb/2025   2025-02-23
 de   12/Mrz/2025   2025-03-12
 de   11/Aug/2025   2025-08-11
 de   10/Okt/2025   2025-10-10
 de   06/Nov/2025   2025-11-06
 de   21/Dec/2025   2025-12-21
 en   15/Feb/2025   2025-02-15
 en   23/Feb/2025   2025-02-23
 en   12/Mar/2025   2025-03-12
 en   11/Aug/2025   2025-08-11
 en   10/Oct/2025   2025-10-10
 en   06/Nov/2025   2025-11-06
 en   21/Dec/2025   2025-12-21
 fr   15/févr./2025 2025-02-15
 fr   23/févr./2025 2025-02-23
 fr   12/mars/2025  2025-03-12
 fr   11/août/2025  2025-08-11
 fr   10/oct./2025  2025-10-10
 fr   06/nov./2025  2025-11-06
 fr   21/déc./2025  2025-12-21

Dif-tor heh smusma 🖖🏼 Довге життя Україна! [image]
Helmut Schütz
[image]

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
ElMaestro
★★★

Denmark,
2025-11-14 17:34
(201 d 07:00 ago)

@ Helmut
Posting: # 24497
Views: 2,315
 

 Know the month abbreviations

Hi Hötzi


❝ Do you import the data with read.csv()? I would suggest read.table() instead.


Once I understand the underlying issue I can usually solve it, that part tends to be no problem. Yes I use both read.csv or read.table when importing tables but probably not in the smartest fashion, however that is otherwise defined.

It is a little beyond me that if I want abbreviated months then they are from a single set locale which may differs from the users. And when we work with the abbreviations through %b then the corresponding abbreviations get locale-dependent. The logic escapes me and confuses me. But once the dust settles in the back of my head, I can work out a solution, clumsy as it may be, with just a modest amount of effort. :-)

Pass or fail!
ElMaestro
Ohlbe
★★★

France,
2025-11-12 16:13
(203 d 08:21 ago)

@ ElMaestro
Posting: # 24493
Views: 2,459
 

 Worse in France

Dear ElMaestro,

As expected from Helmut's post, it is worse with the French locale "French_France.utf8":

   Date.str      Date.I
1  01/Jan/2024   <NA>
2  01/Feb/2024   <NA>
3  01/Mar/2024   <NA>
4  01/Apr/2024   <NA>
5  01/May/2024   <NA>
6  01/Jun/2024   <NA>
7  01/Jul/2024   <NA>
8  01/Aug/2024   <NA>
9  01/Sep/2024   <NA>
10 01/Oct/2024   <NA>
11 01/Nov/2024   <NA>
12 01/Dec/2024   <NA>

Regards
Ohlbe
ElMaestro
★★★

Denmark,
2025-11-14 17:38
(201 d 06:56 ago)

@ Ohlbe
Posting: # 24498
Views: 2,331
 

 Worse in France

Hi Ohlbe and all,

thanks for all input.

In short, the answers to my questions are
1. "yes" (or "yes, principally")
2. "yes"

:-D

Pass or fail!
ElMaestro
UA Flag
Activity
 Admin contact
23,653 posts in 4,991 threads, 1,571 registered users;
451 visitors (0 registered, 451 guests [including 45 identified bots]).
Forum time: 01:34 CEST (Europe/Vienna)

There are in fact two things, science and opinion;
the former begets knowledge, the latter ignorance.    Hippocrates

The Bioequivalence and Bioavailability Forum is hosted by
BEBAC Ing. Helmut Schütz
HTML5