Is It Open Access If No One Can Find It? The loss of 100,000+ Archaeology & History Articles

Posted on November 7, 2013


Here is a question for you, if I put something up on the internet for free but no one can find it have I actually done anything?

A little background story to how I came to this question. Over the last couple of weeks I have been running some analysis for the Society of Antiquaries of Scotland on their publications. Basic web analytics about who reads their proceedings, how much gets read, etc. etc. For this I got access to the Archaeology Data Services (ADS) web stats as they host the SAS Proceedings. After going through a bunch of the numbers I found an odd trend, people who came to the Proceedings were looking for only the Proceedings. That is, they only entered in terms like, SAS Proceedings, Society of Antiquaries of Scotland publications, etc. into search engines like Google and Bing.

I thought that was a bit odd. The SASP has thousands of articles on the history and archaeology of Scotland about hundreds of topics. I thought for sure someone would have put in a term to a search engine like, ‘iron age Scotland’ or ‘brochs of Scotland’, etc. Or at least the title of an article to find a online version but there was nothing. So I went to the ADS homepage for the SAS proceedings and started taking article titles and putting them into Google. What I found was ……… NOTHING. Goolge, Bing, Yahoo, etc. do not know that the 100,000+ articles on ADS exist. Go ahead and try it. Find article titles on ADS and search for them on Google or Bing (fair warning this is a technical error that might be fixed by the time you read this) Edit- ADS has now fixed this.

I want to be 100% clear this is not a ‘ha, Gotcha’ moment for ADS, I am not trying to smear ADS. This is a technical error. All the search engines used to index ADS. They, ADS, require people to agree to their terms of service before looking at articles. However, it used to be that if you found an article through something like Google search you could bypass this step. So recently ADS set up a system to catch people, making them agree to the terms of service, BUT let through the search engine bots to index their articles. This system is currently not working how it is supposed to. ADS knows this and were the ones that confirmed this for me after I brought the problem to their attention. They have been absolutely brilliant in helping me figure out what was happening with the numbers I was seeing.

This slight problem means that bots try to search and index the articles but gets redirected so that they think the link is broken. If Google or Bing think a link is broken they remove it from their search results. Effectively, most of ADS is dead to search engines. That is 100,000+ Archaeology and History articles that are dead to the internet for those keeping count at home.

Edit- ADS explains the technical problem 10x better than I can in this post- It is a brilliant examination of trying to exposing work to Google. Read it!

The reason I bring up this technical problem is, one to make other people aware that such problems exist. Two, this got me thinking about how much stuff we put on the internet and call it Open Access or free to view. Is it really Open Access and Free to View if no one can find it? A sort of take on the whole if a tree falls in the forest and no one is around to hear it, did it make a sound? In the case of ADS it was a technical error and I imagine they will fix the problem soon. However, what happens if this occurs with other websites? What if these documents are born digital, the only copy is online?

I can think of several publications that have been put online but are only images in PDFs. That is they have no associated text files for search bots to read so they can not be indexed. In other words, search engines, and thus the vast majority of the people interested in the topics, can not read the articles and they are essentially dead to them. I worry that more and more people are digitizing their old publications and putting them online in the belief that it will reach a larger audience but effectively only they know it exists. Basically, we have moved our old publications from a box in a closet, where no one reads them, to another box in another closet, though a digital one, where no one reads them. Only now we actually think we are doing something good.

Acknowledgements- Thanks to ADS for letting me access the data and for helping me make sense of the issues I came across.

Posted in: Publishing