Moving a Website to Netlify from GitHub Pages

The Need

I recently developed a new interest in how my blog and professional portfolio (this very website you are looking at) is performing. The goal was to see some basic numbers like total visitor numbers and a breakdown by pages, maybe referrers as well. Because I removed Google Analytics a while ago to protect the privacy of my visitors (and to not feed the giant with even more tracking data) and also do not think that any client-side tracking can be accurate for a small-traffic website like mine (because of blockers) I looked into server-side traffic analytics options.

The website was hosted on GitHub Pages with a CloudFlare CDN in front of it. GitHub does not offer any sort of analytics, and even if it did, it would not be accurate as most of the requests are served by CloudFlare. CloudFlare does have analytics but it lacks page-level breakdown. I even talked to a CloudFlare sales representative and they confirmed that they only offer more traffic detail in the enterprise package, which costs several thousand dollars a month. Not my budget.

Enter Netlify

A quick ask-around resulted in several recommendations in the direction of Netlify. The service has been on my radar for a while but had no excuse to try it out until today. Netlify Analytics for $9/month/site looks like what I was searching for!

Step One: Add Site to Netlify

Because the source code for my website is in a GitHub repository and uses Jekyll (the static website generator behind GitHub pages) and this setup is supported by Netlify out of the box, the first step was very easy. I signed up with my GitHub account to Netlify and followed the wizard to deploy a website from a repository. After allowing Netlify to access my repos and selecting the one named salomvary.github.com I had the website up and running under https://some-random-name-6a18c6.netlify.app within a few minutes. Looks like we are done!

If you do not use your own custom domain, edit the site settings on Netlify and change some-random-name-6a18c6 to something you like and you can stop reading now.

Step Two: Set up Your Own Domain

If you are using your own domain on GitHub pages like me, the domain needs to be migrated too. The Domain Management area on Netlify nicely explains the options. Unless you want to switch to their own DNS (which is a feasible option) you will have to edit the records at your DNS provider, for which Netlify also offers guidance. In my case the DNS provider is CloudFlare. All I had to do was changing the salomvary.com CNAME record to point to some-random-name-6a18c6.netlify.app. This is slightly different from what Netlify recommended but CloudFlare does CNAME flattening which is kind of equivalent of adding an A record.

Almost immediately after changing the DNS records the website was already served over HTTP by Netlify, and the automatic setup of the free Let’s Encrypt HTTPS certificates did not take more than 10 minutes. One can verify the successful switch-over by looking for the server: Netlify HTTP response header using the browser’s developer tools or the output of the curl -vL http://mydomain.com command.

It is important to note that during these 10 minutes the website was not available due to browsers showing certificate errors this is expected and was acceptable in my case.

Step Three: Fixing GitHub Project Pages

This part only applies if you are switching from a GitHub user or organization site. You can recognize this from the repo being named myusername.github.io (or .com for old sites) or myorgname.github.io. In this case a less known magic is in action which just got broken by moving to Netlify.

The magic is the following. If you do not use your own domain on GitHub Pages, your user/org site is at https://myusername.github.io and your repo pages are at https://myusername.github.io/myproject.

However, if you do use your own domain with GitHub, all project pages will be automagically served under https://mydomain.com/myproject and the .github.io variants will redirect to your domain.

But since switching to Netlify https://mydomain.com/myproject no longer serves the project page and neither does it redirect to anywhere. Netlify simply serves a “404 not found” error page. Even worse, https://myusername.github.io/myproject still redirects to https://mydomain.com/myproject resulting in the same error. The only exception are repos that are neither user nor organization page repos and are configured to use their own custom domain. These will keep working as before.

As long as you are fine with the project pages being served from https://myusername.github.io/myproject with a redirect from https://mydomain.com/myproject the fix is not very complicated: Netlify allows configuring redirects using a file named _redirects placed in the root of the website repo. It should look like the example below, where myusername is your GitHub username and myproject is the GitHub repo name you want the redirect for:

/myproject/* https://myusername.github.io/myproject/:splat
/otherproject/* https://myusername.github.io/otherproject/:splat

When using Jekyll, do not forget to add this to _config.yml:

include:
  - _redirects

If you only have a handful of repos to redirect to, the redirects can be created by hand. If you have many repos, are lazy, or like shaving yaks, you can automate creating _redirects. All you need is jq and curl installed, plus obtaining a GitHub personal access token.

Run this command from a terminal window in the root of the website project:

curl -v -H 'Authorization: token  <your GitHub token>' \
  'https://api.github.com/user/repos?affiliation=owner&per_page=100&page=1' \
  | jq -r '.[]
    | select(.has_pages)
    | "/" + .name + "/* https://yourusername.github.io/" + .name + "/:splat"' \
  >> _redirects

Do not forget to change yourusername to your GitHub username! If you have more than 100 repos on GitHub (check the Link response header for the presence of rel=next) repeat this by changing page=1 to page=2 and so on.

There is one more important thing before celebrating success: you need to tell GitHub to no longer use your custom domain, otherwise myusername.github.io/myproject will keep redirecting to mydomain.com/myproject which we just configured to redirect to myusername.github.io/myproject creating an infinite redirect loop.

This can be fixed by deleting the file named CNAME from the root of the Git repo and pushing the changes to GitHub.

At this point, your website should be serving visitors as it was before.

Step Four: Cleaning Up SEO

There is one remaining problem which might or might not bother you. With this setup, the website content can be available under four different URLs:

  1. The source at https://github.com/username/username.github.com/blob/master/_posts/post.markdown
  2. The GitHub Pages site at https://username.github.io
  3. The Netlify website at https://some-random-name-6a18c6.netlify.app
  4. And https://yourdomain.com

This is OK, as long as you do not care about search engine optimization (SEO), but Googlebot and other crawlers will treat these as duplicate content with appropriate punishments.

Hiding the website source in the GitHub repo is allegedly only possible by not using the master branch unless the repo is made private, which - now with free private GitHub repos - is also an option. This is a solution for #1.

If the repo is not a user or organization repo, GitHub allows turning Pages off from the repo settings page. This can solve #2.

For a user or organization page, there is no option to turn Pages off (which kind of makes sense). Setting the repo to private also does not disable the public website. The only solution is renaming the repo from username.github.io to mydomain.com or whatever you like (from now on, the name does not matter as we do not rely on GitHub Pages’ magic).

Disabling the Netlify “default subdomain” (my-thing.netlify.app) does not seem to be possible, which means web crawlers might also discover it, resulting in duplicate content. Unless there is some robots.txt trickery I am not aware of, or domain-specific redirect rules on Netlify are possible (to be figured out) the only solution is adding canonical URLs to all pages of your website. This, with the help of the Jekyll SEO Tag plugin is not very complicated (and is a good idea anyway). Problem #3 solved.

Conclusion

As we saw the trivial task of moving a static website from one hosting provider can turn into a deep rabbit hole. The good thing is, it did not take as much time as it may seem. In fact writing this blog post has taken much more time than the website migration itself. The other good thing: I did not even have to pay for Netlify, because it’s free, including custom domains.

Well, actually, I did end up paying for Netlify. If you still remember, the original motivation was to gather page-level analytics on the website traffic, so I pulled my credit card and forked out $9 for Netlify Analytics.

Which, to my slight disappointment, is only a tiny bit better than analytics at CloudFlare, offering nothing more than the top x (5?) most visited URLs in the pages section:

Screenshot of the Top pages in Netlify Analytics

(For the curious here is a screenshot of the entire Netlify Analytics page few hours after turning it on.)

It was an interesting rabbit hole anyway :)