## Documents in the Internet Archive [Tips / Tricks]

Dear all,

sometimes one needs a previous version of a document which is no more available on agency’s websites. Why don’t they have the version control / audit trail they require from us?

Example: A BE study on alendronate was performed in August 2008 according to FDA’s draft guidance (January 2008). Due to improvements in bioanalytical technology in the October 2011’s revision FDA requires plasma data instead of urine. The study was submitted to Oman’s authority which required a copy of the old guidance before accepting the study. Procedure:
Problems:
• FDA redesigned their website in 2009. Now all guidances start with “ucm” followed by a six-digit number. Before mid-2009 it was a four-digit number followed by “dft” or “fnl”. You can only search the internet-archive for a URL, not the document’s contents. In other words, if you don’t know the old URL, you will not find it – although it might exist.
• If a site / directory has a low number of visits or backlinks, the Alexa crawler will visit it with a low frequency and miss intermediate revisions. Therefore, only three previous versions of EMA’s Q&A-document are archived. Furthermore, the archive has a back-lag of 6–24 months.
• The site’s owner might decide to prevent archiving. This works even retrospectively. If
  User-agent: ia_archiver   Disallow: /
is added to the site’s robots.txt, the site will not be crawled any more and previous versions will be removed from the archive. Bad luck.

Dif-tor heh smusma 🖖
Helmut Schütz

The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes