Building the Shepherd.com Web Engine

Some time in late 2020 I got approached by Ben Fox, founder and entrepreneur, who needed help building his new venture Shepherd.com - “A better way to discover amazing books”. It is a website of curated book recommendations by authors and famous people organized around topics.

As a book lover, I quickly got excited and agreed to build the website and a content management system (CMS) behind it. The website launched publicly today (20 April 2021), and following is the story of building the software that powers it. Ben runs a diary and newsletter from the founder’s perspective and if you happen to be a book author interested in promoting a book or project, get in touch here.

Screenshot of Shepherd.com Home
Landing page of Shepherd.com on the day of the official launch

How it started?

When Ben reached out to me, he already had a clear picture on what he wants and had elaborate design wireframes. The initial requirements were written down in Google Docs which allowed us to quickly come up with a roadmap with estimated costs.

The first plan was quite simple:

  1. Build a headless CMS so that creating website content can be started as soon as possible.
  2. Build a “minimum viable product” (MVP) website and launch as soon as possible.
  3. Elaborate advanced website and CMS features after launch.

Research on book metadata and cover image APIs

At the beginning it seemed like we will want to have an external data source on book metadata and high quality book covers. Even though various options exist, this is a surprisingly hard problem to solve for several reasons:

Based on Ben’s original shortlist, here is a brief summary of the options we looked at:

In the end we decided not to use any of these APIs and rely on manual book data entry until later when it becomes clearer what the business demands.

Project management and collaboration

Since it is a two-men team with Ben and me for most of the time, we kept tooling lightweight. Ben writes down the initial requirements for each milestone or major group of features in Google Docs, where we use comments and collaborative editing until clarified and agreed on everything.

We initially used checklists within Google Docs for keeping track of tasks and progress but that quite soon turned out to be not flexible enough due to lack of useable long conversation threads. We now keep track of everything in GitHub Issues which perfectly does the job. I briefly also considered using GitHub project boards but found it a little awkward to use and probably an overkill for this team size. We do use issue milestones though to organize and schedule stories, bugs and other tasks into larger buckets.

Our day-to-day communication is also very lightweight. We do conversations on GitHub Issues when there is one and exchange emails regularly on other topics. We have not seen the need for any sort of real-time, synchronous collaboration so far. I also find the GitHub notification inbox very useful for keeping up with conversations happening on issues.

Software stack

I used the following inputs to decide on what software stack to use:

Given these criteria I decided for the Django web framework:

So far no regrets for choosing Django and using the built-in admin for managing the content. There are definitely a few places where I stretched how far one can go customizing the admin and had to go deeper in Django’s source code than I wished for but this has been more an exception than something common so far.

Screenshot of Shepherd.com admin
Django Admin – perfectly does the job of a simple CMS

Infrastructure

Given that the system was to be built by me in a solo fashion (collaborating with Ben on the requirements and content building side and a designer responsible for the look of the public website) spending time on anything that’s not closely related to the main goal was out of question. One such area where time and money can saved is setting up and managing infrastructure, also known as DevOps.

For this reason I decided to deploy the application to Heroku. It is not considered to be the “hottest thing” nowadays but it does one thing well: it runs applications with little to no setup or maintenance. The price can go up steeply if your capacity needs grow but that’s a good problem to have. As Heroku did not require me to write my application in any vendor-specific way (except for following the twelve factor principles which is useful on most competing platforms) it’s easy to move elsewhere once the extra effort is financially justified.

Heroku also offers managed PostgreSQL which allows me to not care about database security and backups. It even comes with a 4 days window of point-in-time rollback which drastically improves my quality of sleep.

Uploading, storing, resizing and serving images for book covers and author portraits is an important part of the software but not something worth building ourselves. Heroku does not offer any sort of static hosting itself but they have partnered with Cloudinary which is a CDN and a media management service in one. Setting up Cloudinary with Django integration took me about 30 minutes and we have working uploads and images resized to all responsible design needs.

We use Cloudflare as a CDN, caching and security layer in front of the Django application. Because Heroku offers no static file hosting and Django also recommends solving the problem externally, Cloudflare CDN came handy for serving stylesheets, fonts, images and other static assets. Another selling point for Cloudflare was their cache which we can use for saving a huge amount of Heroku resources by serving a cached version of infrequently changing but still dynamically generated pages.

I am positive that this infrastructure will serve the project for long enough after the public launch so that we can identify growth patterns and make a plan to evolving it (or not) while also building the next round of features.

Continuous delivery

I strongly believe in continuous delivery from day zero of any project. We have a decent amount of automated tests mostly of integration and unit kind, which are ridiculously fast in Python/Django. Most test exercise the whole stack from the HTTP middleware all the way down to an in-memory SQLite database and the entire test suite still completes within a few seconds.

Code quality checks (tests and Pylint rules) alongside of verifying code formatting conformity with Black is done on each commit using GitHub Actions. If the checks succeed, the code is automatically deployed to Heroku.

We do have a staging environment as a separate Heroku app with dedicated database. Just like the production environment, it is also automatically deployed on each commit. We use staging for making sure database migrations apply without issues, to play around with content without messing up the public website and sometimes for previewing features.

Alternatives considered

There are a few alternatives we have or could have considered:

There are probably plenty of others worth considering, but these were the ones that briefly crossed my mind at the beginning.

The future

There is a long list of features we will keep shipping in the near future. Time will tell how well the infrastructure holds under the load but I don’t expect surprises here. Once we start adding more dynamic or user generated content that does not play nice with heavy caching we might need to reconsider some parts and even move to a cheaper hosting provider from Heroku once we can justify the extra DevOps cost.

If everything goes well, I will write a follow-up post here. (Maybe also when things go bad, but that “should not happen”;)

Have comments or questions? Let’s discuss on Hacker News!