Note: I don’t have a background in backend web development (or programming in general for that matter), but I’ve had to do some quick learning recently to mitigate the growing cost of running Lazy Scholar. I hope this post will serve to help others save time and expense who are running passion projects without funding sources. Sorry to those who are subscribed only for feature updates :).
Cloudant + Heroku
Although Lazy Scholar was designed from the start to be a distributed, low cost way for the browser to automatically find free full texts, citation metrics etc., I always required a backend server to store records so I could improve its ability to recognize scientific articles. Cloudant (a DBaaS) served this purpose nicely for years, especially since it didn’t charge until you hit a certain limit, which I never did with a small project. It has a user-friendly interface and great features that allow for very flexible searching of your database. Eventually I implemented a security layer by going through a flask server hosted on Heroku. I started with Heroku because similarly it has a nice interface and is well documented for beginners. But now with around 7,000 active users of the extension and new features such as a recommendation system and the ability to search paper histories (and the desire to continue adding complexity), I soon hit this limit, and Lazy Scholar started burning holes in my pockets. In addition, I setup a web and worker dyno at Heroku to handle rapid incoming requests, and setup a hirefire listener to scale the worker to ensure requests were being processed quickly and returned to the user. Coupled with the cost of a redis data store used by the worker, the costs were not sustainable. Enter solution 2.
AWS DynamoDB + AWS Elastic Beanstalk + AWS SQS
I decided to move to Amazon AWS after reading about potential cost savings there. There is a higher learning curve than Heroku, but well worth it in my case. The second iteration of my solution consisted first of moving the database from Cloudant to DynamoDB. This was no simple task- downloading and converting over 1 million records to be compatible. For example, DynamoDB can’t store float types so I had to modify to decimals and change relevant scripts to read accordingly.
Next, I set up a web server and worker on AWS Elastic Beanstalk. This involved some minor rewrites of the previous flask server code, and setting it up to work with AWS SQS instead of redis to communicate with the worker. When all was said and done, I was still frustrated at the difficulty in the lack of an easy ability to quickly scale instances in response to rapid influxes of requests. In the end, I cut my costs by ~75% by switching to DynamoDB and Elastic Beanstalk (DynamoDB reduced the database cost to $0 because of a free tier, but Elastic Beanstalk still ate away at costs), but if I couldn’t find another solution I would have to start soliciting for external funding to sustain the project, or reduce functionality.
AWS Lambda + AWS API Gateway + AWS DynamoDB – finally the perfect solution
I finally stumbled across AWS Lambda among Amazon’s many services. This is a serverless backend that simply executes functions that you write in response to incoming requests. It has the benefit of only charging for the time that your code executes (vs servers that are always on), and it can run functions in parallel to scale exactly to each request coming in. No more worrying about scaling in case someone opens 100 tabs of scientific articles at once! And, there is a generous free tier. So generous that it now costs me only about $1 per month to run Lazy Scholar- this cost only comes from using the AWS API Gateway. To use with HTTPS, you setup an API with the AWS API Gateway and link it to trigger your Lambda function. Thankfully the Lambda function didn’t require much modification from the existing flask server code that I used previously. For now, I still have some functionality on Heroku that easily exists on its free tiers that will eventually be transitioned.
Now that I don’t have to sweat worrying about costs any longer, onto new features!