Content-type: text/html Downes.ca ~ Stephen's Web ~ Measuring data rot: An analysis of the continued availability of shared data from a Single University

Stephen Downes

Knowledge, Learning, Community
To determine where data is shared and what data is no longer available, this study analyzed data shared by researchers at a single university. 2166 supplemental data links were harvested from the university’s institutional repository and web scraped using R. All links that failed to scrape or could not be tested algorithmically were tested for availability by hand. Trends in data availability by link type, age of publication, and data source were examined for patterns. Results show that researchers shared data in hundreds of places. About two-thirds of links to shared data were in the form of URLs and one-third were DOIs, with several FTP links and links directly to files. A surprising 13.4% of shared URL links pointed to a website homepage rather than a specific record on a website. After testing, 5.4% the 2166 supplemental data links were found to be no longer available. DOIs were the type of shared link that was least likely to disappear with a 1.7% loss, with URL loss at 5.9% averaged over time. Links from older publications were more likely to be unavailable, with a data disappearance rate estimated at 2.6% per year, as well as links to data hosted on journal websites. The results support best practice guidance to share data in a data repository using a permanent identifier.

Today: 0 Total: 85 [Direct link] [Share]

Image from the website


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Nov 21, 2024 06:57 a.m.

Canadian Flag Creative Commons License.

Force:yes