Data for Google Scholar paper availability for #oaweek

For Open Access week I wanted to share some preliminary data that I have gathered over the last couple months from Lazy Scholar use. While this is not strictly “open access” related per se, it says a little about the availability of research.

For those who may not know or do not use it: Lazy Scholar is a Chrome extension that I built out of frustration of checking every scientific abstract that I want to read in Google Scholar to see if a free full text is available. Based on a single click – or an automated optional popup – you can check if one exists. If you link Google Scholar to an institution, it will grab that link for you too. Finally, you can enter an EZproxy link in the extension options and quickly get links to check institutional access this way as well. You can also link to your Twitter account and send out a request to the #icanhazpdf hashtag with one click. It also can provide paper altmetrics (from altmetric.com), and I have many more plans of how to automate finding useful information about a paper that I hope to implement soon.

Each Lazy Scholar query is (anonymously) added to a database, yielding a growing database of interesting information that I mined a bit.

Total Queries (as of 10.21.2013): 13,430 (8,270 after removing duplicate queries)

Lazy Scholar consists of either clicking a button to query, or using an optional setting to automatically make queries on each page-load if it recognizes a scientific website. Here are data on the proportion of queries that result in a full text found through Google Scholar. The data from the button is more likely to be biased because the button may not be clicked if the person is already on an open access paper. But it turns out they are pretty close anyway.

Button proportion of full texts found = 16%
Button proportion of full texts found, removing duplicate queries from users = 17%

Auto-scan proportion of full texts found = 21%
Auto-scan proportion of full texts found, removing duplicate queries from users = 18%*
*This is likely the most accurate measure.

Here is another interesting one that I have been tracking: what the publication years are for the articles people are looking for. The vast majority are not listed in Google Scholar, but for those that are there is an expected trend that people are looking for newer papers, except for in 1991 which is abnormally high. This may be because I didn’t remove duplicate queries for this yet. This is for the top 10 years:

 

So the lesson is: always check Google Scholar to see if it indexed a full text. About 1/5 of papers that people are looking for can be found there, assuming no dramatic discipline biases in who is currently using the extension.

Posted in Data.