Site icon Medical Market Report

Digital Decay Has Claimed Nearly 40 Percent Of Webpages From 2013

Have you been looking for an article you read several years ago but just cant find it? If it was written in 2013, there is a good chance it has simply disappeared from the internet. That’s according to new research from the Pew Research Centre which found that nearly 40 percent of all webpages created in 2013 are no longer accessible because of “digital decay”.

Far from being indelible creations, the new analysis demonstrates just how fleeting online content really is. Digital decay is the gradual degradation, corruption or obsolescence of digital information over time.  

Advertisement

According to their results, 38 percent of content that existed in 2013 is not available today. When they expanded the scope of this analysis, the researchers found that a quarter of all web pages that existed at some point between 2013 and 2023 were now inaccessible. In most cases, this was because the relevant page(s) were deleted or removed from otherwise functional websites.

In this context, the team defined “inaccessible” as a page that is no longer on the host server – the type of thing that will usually lead to a 404 message or another error code.

To gather the data for their analysis, the researchers used random samples of just under 1 million webpages (around 90,000 pages per year) from the Common Crawl archives, an internet repository that periodically takes snapshots of the web as it exists at different times. They gathered this information for the years between 2013 and 2023 and then checked to see if those pages still existed.

Around 25 percent of those created in this period were no longer accessible as of October 2023. This sum is made up of two types of defunct content: 16 percent of pages were “individually inaccessible” but were on otherwise accessible root-level domains. The other 9 percent, however, were inaccessible because the entire root domain no longer existed.

Advertisement

“Not surprisingly, the older snapshots in our collection had the largest share of inaccessible links”, the report’s authors explained.

By the end of 2023, 38 percent of the pages collected in the 2013 snapshot were gone. But even the content of the 2021 snapshot suffered from this decay, with about one in five pages being lost.

There were also some interesting comparative results for different types of web pages. For instance, the analysis examined the reference links to 50,000 English-language Wikipedia pages. They found that 82 percent of the sampled pages had at least one reference link that took users to non-Wikipedia pages – however, 11 percent of “all references linked on Wikipedia” aren’t accessible anymore.

On around 2 percent of the source pages sampled, every link was inaccessible or broken, while around 53 percent contained at least one broken link.

Advertisement

Government websites also offered some curiosities. The team found that around three-quarters of the 500,000 government web pages they sampled tended to have at least one link. The median average page contained 50 links, but many contained more. The vast majority of these pages go to secure HTTP pages and 16 percent redirect to other pages.

But around 21 percent of the examined government pages contained a least one broken link as well. City government pages, it seems, were the worst offenders in this context.

Even news sites were not free from the issue. Across the news sites they sampled, researchers found that around 94 percent contained at least one link that took readers away from the website. The median page contained around 20 links, and pages in the top 10 percent had around 56 links.

The analysis shows that, like government websites, the vast majority of these links were to secure HTTP pages. Around 32 percent of the links on these news sites redirected users to different URLs than the ones that were originally used. Around 5 percent of news website links are now inaccessible and around 23 percent of all the pages had at least one broken link.

Advertisement

Finally, on Twitter (now X), the researchers found that, out of 5 million tweets posted between March 2013 and 2023, 18 percent were no longer available.

“In a majority of cases, this was because the account that originally posted the tweet was made private, suspended or deleted entirely,” the researchers explain. “For the remaining tweets, the account that posted the tweet was still visible on the site, but the individual tweet had been deleted.”

They also found that tweets were particularly prone to disappearing or being deleted if they were written in certain languages. For instance, half of all Turkish-language tweets and a smaller share of those in Arabic, were no longer available.

In total, most “tweets that are removed from the site tend to disappear soon after being posted.”

Advertisement

The report is published on the Pew Research Centre website.  

Source Link: Digital Decay Has Claimed Nearly 40 Percent Of Webpages From 2013

Exit mobile version