Thoughts on disrupting the triple payer system

August 31, 2020

Academic publishing is sometimes referred as a triple payer system because the publisher is paid three times over the course of the product life cycle. First by the governments who foot the majority of operating costs for a research group, then by the institutions who pay the publisher for rights to submit a manuscript, and then a third time by the readers who are paying for the work that their peers created on (mostly) the government’s dime. I believe that academics tolerate this because it has produced clear quantitative metrics by which we can judge an individual work (i.e. citations), publishing venues (i.e. impact factor), and researchers (i.e. h-index).

The biggest issue with this is that I don’t think good, novel research benefits from constantly being measured through its popularity. A citation-centric approach to research can often lead to lots of small incremental work and groupthink that is driven by those who have managed to become the loudest voices in the room. Furthermore, it reinforces an emphasis on only publishing experiments that result in some form of objective success.

I should clarify, this system is equal parts bad behavior from publishers as it is just their desire to maximize profit. It’s quite common for people to self-organize into hierarchies and defer to authority or popularity as a metric because it usually works pretty well.

But what if we could make everything but the peer review part of academic publishing completely automated?

I think I want to see if I can work on this problem over the next couple years. I’d like to see a ground-up, technology-driven approach to academic publishing whose objective is to maximize accessibility and quality, in that order.

Here is my current analysis of steps in the academic publishing process, and what I think of them:

Challenge: Make our corpus of scientific knowledge queryable and searchable. There are some really smart people working on solving this problem working on the Semantic Scholar team and many of its associated projects (shoutout to Connected Papers!). Of course, it goes without saying that Google Scholar is also an amazing tool. I’ve chosen to try and see if I can set up my own NLP pipeline which attempts to tackle this challenge in the second part of this post.
Challenge: Determine if a submitted manuscript is actually good science: This is mostly outsourced by the publisher through the peer review system. Several scientists who are much more experienced than I have discussed this at length, and I don’t think this where I should add my two cents. Dr. Yann Lecun’s notes on changes to the process here. Dr. Peter Norvig’s notes here. I think there is still some work to be done to make this process more transparent and fair.
Challenge: disseminate information to a mass audience and do it well. Every STEM field worth a damn has research that’s accessible over the Internet in some way or another. Printed media absolutely still has its place, but I’m focusing more on the conference proceedings / journal volume format for now. Furthermore, project’s like arXiv Vanity and arXiv sanity are strong arguments in favor of a new generation of publishers that rely on an open ecosystem to solve their problems at a cost which is orders of magnitude smaller than the current crop of publishers (Source).

I’ve mentioned small projects here and there that are already well on their way to making the landscape of academic publishing a better place, but I’d like to reiterate that the technology is there to automate most of the process already. It’s largely a social problem. If I came out with an open-access bioinformatics journal tomorrow, who would want to publish in it? Who wants to abandon a business model where the product’s costs are mostly covered by the users instead of the company? Hopefully, by demonstrating a better solution to each sub-problem (and just enough PR), I’ll make something other people use.

If you’d like to see what I’ve made so far, I’ve set up some code here and talked about the process of yelling that code into existence here.