Documents in the Internet Archive [Tips / Tricks]
Dear all,
sometimes one needs a previous version of a document which is no more available on agency’s websites. Why don’t they have the version control / audit trail they require from us?
Example: A BE study on alendronate was performed in August 2008 according to FDA’s draft guidance (January 2008). Due to improvements in bioanalytical technology in the October 2011’s revision FDA requires plasma data instead of urine. The study was submitted to Oman’s authority which required a copy of the old guidance before accepting the study. Procedure:
sometimes one needs a previous version of a document which is no more available on agency’s websites. Why don’t they have the version control / audit trail they require from us?
Example: A BE study on alendronate was performed in August 2008 according to FDA’s draft guidance (January 2008). Due to improvements in bioanalytical technology in the October 2011’s revision FDA requires plasma data instead of urine. The study was submitted to Oman’s authority which required a copy of the old guidance before accepting the study. Procedure:
- Find the URL of the current guidance at FDA’s site:
http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm082421.pdf
- Fire up the
Internet Archive and paste the URL to the WayBackMachine BROWSE HISTORY
- Move in the timeline to the earliest year (2010) and click on the earliest snapshot (March 9th): http://web.archive.org/web/20100309050736/http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm082421.pdf
- FDA redesigned their website in 2009. Now all guidances start with “ucm” followed by a six-digit number. Before mid-2009 it was a four-digit number followed by “dft” or “fnl”. You can only search the internet-archive for a URL, not the document’s contents. In other words, if you don’t know the old URL, you will not find it – although it might exist.
- If a site / directory has a low number of visits or backlinks, the Alexa crawler will visit it with a low frequency and miss intermediate revisions. Therefore, only three previous versions of EMA’s Q&A-document are archived. Furthermore, the archive has a back-lag of 6–24 months.
- The site’s owner might decide to prevent archiving. This works even retrospectively. If
User-agent: ia_archiver
Disallow: /
is added to the site’srobots.txt
, the site will not be crawled any more and previous versions will be removed from the archive. Bad luck.
—
Dif-tor heh smusma 🖖🏼 Довге життя Україна!![[image]](https://static.bebac.at/pics/Blue_and_yellow_ribbon_UA.png)
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Dif-tor heh smusma 🖖🏼 Довге життя Україна!
![[image]](https://static.bebac.at/pics/Blue_and_yellow_ribbon_UA.png)
Helmut Schütz
![[image]](https://static.bebac.at/img/CC by.png)
The quality of responses received is directly proportional to the quality of the question asked. 🚮
Science Quotes
Complete thread:
- Documents in the Internet ArchiveHelmut 2013-11-21 16:45
- Documents in the Internet Archive Mahesh M 2015-12-24 13:37