Setting Up CI & CD

2020-12-29

Tags: hugo, development, technology, jenkins

Finishing Holiday Projects

I spent most of the end of this year working on a CD pipeline in Jenkins. If that doesn’t sound like your normal holiday activity… well I guess it really isn’t but maybe it should be. The holidays are a unique time of year where I get longer spans of uninterrupted free time. I had about a week of vacation left over so I decided that I’d spend the time setting up something I thought would be pretty nice: Hugo seems to be lightweight and simple, so to me it seemed like the perfect target to try automating.

It took me the whole week.

What is CI / CD?

Continuous Integration and Continuous Deployment are hot topics nowadays. It may not have been the standard a few years ago (at least if you weren’t working at a FANG company), but now everyone is doing it. Beyond simple words I’ll try to summarize my (lack of) experience with CI and CD, and set the stage for the rest of this project, while setting up some guardrails to keep you on track and help you to avoid the pitfalls I seemed to find everywhere.

There is a lot of literature on the topic from actual professionals, like Red Hat. I’ll go out on a limb and say that Red Hat knows what they’re talking about when it comes to CI and CD. If you don’t want to read up too much, I’ll summarize: Continuous Integration is the process of an automated build, test pipeline which produces artifacts that a Continuous Deployment system then publishes to “the public”. What this means in practice depends primarily on the project itself. For traditional software, i.e. binaries, this means when your team pushes a change to your source control system (you are using source control, right?) then a new build is created and published. If you think of big software systems like Linux OS distributions, this is like the daily builds. If you’re familiar with modern video games, this is like when a certain company publishes an unfinished mess and later patches it within the week. Well, not precisely, but it’s close.

For websites, it becomes even more immediate. Sites with a quick turnaround and lots of users have a team of developers who are fixing various bugs or pushing new features. With a CI / CD pipeline, when one of those developers pushes her fixes or changes the CI system picks up the source code, builds it, tests it, and if it gets the green light it’s set up as the new website version immediately. It’s live in a matter of minutes, and all your users can use the new features!

If that sounds like magic, it might be. IF you’re looking to get a 0 downtime release like I described, you need to set up a lot of things to do it successfully and most of that (containerization, cloud computing, etc.) is out of the scope of this article. We’ll be looking at a single dedicated build server, which is more approachable if you’ve never done this before. I’ll probably dive into the more complicated stuff at a later date.

How Does Hugo Use CI & CD?

Hugo doesn’t come out of the box with a CI / CD platform installed. This is a good thing, since that would make Hugo an even more confusing mess. The Hugo documentation does have a few integrations with automation services, as seen here. You can also host your site on several different platforms, like GitHub, GitLab which is very convenient, or the aforementioned Azure, AWS, and Google Cloud. That’s all well and good, but I went a different route.

I (once again) decided to try something that takes a bit more hands-on knowledge. The upsides to my solution is

I get complete control of the artifacts
I have complete control of the build environment
I get to learn about CI / CD, specifically Jenkins

The downsides…

I have to maintain the build environment
I have to debug the build process
It took all week

As I would quickly discover, fixing up a CI / CD pipeline for Hugo was not as simple as it’d seem — At least for a CI / CD greenhorn such as myself. I’ve never had to set up a Jenkins box or architect a CI pipeline — which was really the reason I wanted to look into this. What I found I needed to brush up on or learn was the “new” Jenkins Pipeline system, groovy syntax, SSH authentication, Linux permissions structures, and a new distribution system (and why it sucks). Let’s crack this open.

Hugo CI / CD using Jenkins on Ubuntu

That title should satisfy the Google search goblins. My CI / CD system is as stated above: Hugo is my Static Site Generator, my CI / CD pipeline is a Jenkins installation on a junk PC running Ubuntu hosted on my home network. I’ll call it a “dedicated build server” but really it’s just another machine that I’m not using for anything else right now. Now that it is a Jenkins server, it will probably stay as my home’s dedicated build server until it totally breaks down.

The Build Server

If you’re strapped for computing platforms, you don’t need anything fancy - this machine I picked up for about $30 from a surplus store. This is pretty great: it came with a decent hard drive, enough RAM, and networking capabilities. If you’re lucky you might be able to set up your own lab on the cheap from similar surplus shops, or heck even a modern Raspberry Pi can probably do most of this. If you’re like me, you also tend to accumulate e-waste; I love using these machines as remote hosts for projects, so it never really goes to waste. I even have a DOS machine on a POS e-Machine that doesn’t even have enough power to run Ubuntu! Now that’s something.

Oh, this article is about CI / CD not recycling … back on track!

So if you have a PC lying around that can serve as your Jenkins box, great. Is Ubuntu the best distro for this? Probably not, but I am familiar enough with Ubuntu, and I had set this box up a while ago for myself to play with, so it works. I’ll come out and say, this is not an enterprise-level setup. Most CI / CD nowadays doesn’t even run on local machines and instead rely on cloud services (unless you’re running cloud services). I’d rather set up the whole system and get to know how it works, but using a cloud service is the current trend in the industry.

I’m building Hugo on a standalone Ubuntu box. At the time of writing, it’s running the current LTS version, 20.04.1 (Focal Fossa). It’s got 4 GB of RAM, and about 250 GB of space: not a powerful machine by any means. This system works pretty well for deploying a single Hugo website - as you can probably guess, running a significant enterprise CI / CD might not be possible from such a system. Good thing I’m not enterprise! The only major updates to this are a few apt packaging updates (which I explain in-depth at the end), and opening up SSH, since I’m running this machine headless.

The CI / CD Software

My build is using Jenkins, specifically 2.263.1 at the time of writing. I’ve been working near and around Jenkins for a good while now, I think it’s been 8 years now. In that time, it’s really changed quite a lot. One of the key aspects of Jenkins CI / CD is that it’s plugin dependent. This is still the case, and in general if you want to automate anything in Jenkins you usually need a plugin. Unfortunately, checking for a Hugo plugin (at least presently) I don’t get the feeling that Hugo is a supported CI / CD target. There was one, but it’s a year stale, and it didn’t inspire me with confidence.

So I built a custom pipeline, and I’ll dive into how that works in the next major section.

Hugo

If you’ve been following my Build a Website series, then you know about Hugo. It’s a Static Site generator built in Go, ‘nuff said. Hugo is actually one one of the major hangups that I ran into with this project. I’m running 0.76.3 (run hugo version if you want to know what you’re running). If you’re setting up CI/CD, it’s pretty important to keep your software versions up-to date with your development environment, for reasons that should be fairly obvious. If you’re in the cloud using K8, or Docker you’ve got this covered with containerization.

For my dedicated build service, it’s important that I match my development environment with the build server’s version. This is a problem because of the available apt packages on Ubuntu. If you look at the hugo packages on Ubuntu, you’ll probably see that the focal version is 0.68! My Semver senses started tingling when I saw the nearly 10 version (albeit minor) difference. So I panicked and installed a more recent version by wrangling the apt system. That’s a deep dive, so I’ll cover that at the end of this article.

This GitHub issue crystallizes the issue I ran into using the Hugo sanctioned install using snap packages. In short, snap packages can’t access any location other than the home folder of the user executing it. This is actually not much of a problem for the admin user that’s probably installing hugo, but it’s a real pain for Jenkins because I don’t want to give that automated user all the privileges that are apparently needed to do this. This problem crops up in the support forums, but I didn’t find the answer satisfying, link

The bane of the Snap package setup was this message for me:

Error: Error building sites add site dependencies: create deps: failed to create file caches from configuration: mkdir /tmp/hugo_cache/check-yockyard: permission denied

If you’re getting something similar, either you’ll need to figure out the snap permissions for your system account, or you’ll do what I did and install via apt. This was probably a 2-3 day time sink, but I’ve summarized my trauma in the addendum.

Source Control

I’m using GitHub to store this site’s source control. Jenkins has a pretty good generic SCM plugin. To get this working I needed an SSH key, which I did with the Putty keygen. Making SSH keys is a big out of scope for this article.

In GitHub you’ll want to go to your “Deploy Keys” and drop the public key in a new key, and name it something descriptive. Mine’s named “Jenkins”. If you’ve done it right, you can now give limited (and revocable) access to clients for your repository. With that let’s jump into Jenkins.

Jenkins: ol’ Reliable

Jenkins has been around since 2011; it’s been around the block. It’s also changed a bit since I first started working with it, but if anything it’s been streamlined to be less cryptic and more powerful. I’ll walk through the high-level stepos of getting a Jenkins CI/CD pipeline going, and pique your interest in the topic.

The setup for this system isn’t unique; most of the default plugins are enough to get us going. Make sure that Jenkins has a user and home user in Linux.

Let’s add the credentials for GitHub as a credential for Jenkins to use. RTD. We’ll go to Manage Jenkins > Manage Credentials. If you already have a credential store for jobs, add a new credential. The most important thing to add is the key: We need that private key to copy-paste into Jenkins, and if you’re using a passphrase drop that into Jenkins as well. The only other setting that’s imperative is the “scope” which needs to be Global so that our jobs can get to it. I tend to use the auto-generated ID’s but you can set it to anything that’s easier to remember later on.

We’re ready to start building our pipelines.

Making the Hugo CI pipeline

Now that credentials are set up, our build box should be ready to go. Let’s get into making our pipelines. I like to source my CI / CD scripts from source control (GitHub). This keeps our orchestration and site in-sync, which is nice: we don’t have to worry about our website and orchestration being incompatible without us knowing.

You may have used Jenkins before, and used the “Freestyle Project,” but I prefer to use the “Pipelines Project”. The benefit of this type is we get 1 script with a set of instructions saying what our build is doing inside the project we’re building. I’ve found this to be neat and tidy - plus making changes to the build process is tracked, just like the rest of the project. This makes fixing “layer 8” bugs that much easier — just revert the last change and re-run the build.

If we break down our software management process into distinct verbs, then it’s that much easier to debug errors in different parts of our build, and re-run, skip, or stop stages that we need to not execute. Let’s get started with a “Check” job.

Check: Prepare, Check, and Request Deploy

To get started, we’ll make a new job (“New Item” from the Jenkins dashboard) and make a new “Pipeline,” allowing us to use Groovy scripts in the source control. We will start with a job to check whether or not to automatically build our site since it’s relatively easy to make.

Now we have a pipeline, the only thing we need to do is give it a name, configure the GitHub SCM connection, and tell Jenkins where the Groovy file is going to be located. Give the pipeline a descriptive name, like “check-mysite”. First things first, we need to tell Jenkins to take the build script from the git repository and give Jenkins the ability to grab the code. Under “Pipeline > Definition” we set it to “Pipeline script from SCM,” and pick “Git”. Use the SSH version of your GitHub repository, e.g. git@github.com:YourName/YourRepo.git, and your SSH key that you set up before should be in a drop-down. If you don’t see any red, then Jenkins can poll GitHub and it’s good to go. Under “Branch specifier” we only want to check the main branch (“master” or “main” etc). Finally, we’ll specify a build groovy file. For simple pipelines you’ll often have one Jenkinsfile, but in this scenario I’ll have 3 scripts that run different stages of my full pipeline. I’ve put most of my build scripts at the root so I have a CheckHugo.groovy file. This script’s path is in relation to your project path.

Now, to pen the script, let’s start with a good framework

/* A pipeline to determine if hugo has any pages to publish
Notes:
only checks the master branch - development branches are not monitored
*/
pipeline {
  agent any
  options {
    timeout(time: 15, unit: "MINUTES")
    disableConcurrentBuilds()
    buildDiscarder(
        logRotator(
            artifactDaysToKeepStr: "",
            artifactNumToKeepStr: "1",
            daysToKeepStr: "",
            numToKeepStr: "350")
    )
  }
  triggers {
    cron('H/15 * * * *')
  }
  environment {
    res = false;
  }
}

Every Pipeline Script needs the first pipeline block; otherwise Jenkins will complain. agent any tells Jenkins that any configured build agent can run this script: if you’ve set up a bunch of nodes with different software, you may not be able to do this. timeout and disableConcurrentBuilds are fairly self-explanatory. For my CheckHugo.groovy, I like to only keep 1 “artifact” which we’ll configure later, I’m not sure why you’d want to know how your check 15 times back went but, you do you as they say. numToKeepStr, or daysToKeepStr are parameters to keep this job clean in the Jenkins build status page. If you done the math I’m keeping about 1 weeks' worth of checks. The triggers I’ve got set up run this check every 15 minutes, which I think works for me. If you need to check every minute, then change that to `cron(‘H/1

- - *'). I wouldn't do this, but it's your compute time. Finally, I've set up a global (to this job) environment variable, res`; this will be important later, but for now we’ve just initialized it to be a boolean false value.

Next, we’re going to set up some stages for the job.

stages {
  stage("prepare") {
    steps {
      checkout( [$class: "GitSCM",
         branches: [[name: 'origin/main']],
         doGenerateSubmoduleConfigurations: false,
         extensions: [
           [$class: "SubmoduleOption",
            disableSubmodules: false,
            parentCredentials: true,
            recursiveSubmodules: true,
            reference: "",
            trackingSubmodules: true]
         ],
         userRemoteConfigs: [
             [credentialsId: "867-5309",
             url: "git@github.com:YourName/YourRepo.git"]
           ]
         ]
      )
    }
  }
}

My prepare stage sets up the rest of the pipeline. The stages group will hold all the distinct stage groups which will execute the instructions in the steps group: pretty straightforward. You can name your stage whatever you want and this will show up on the Jenkins job status page. How do we know which steps we can do? And how do we know which variables do drop in? Well, I didn’t just pull these out of thin air: Jenkins provides a pipeline syntax generator out of the box. It’ll be somewhere like this address: https://jenkins:8080/job/check-mysite/pipeline-syntax/. To generate the checkout snippet I use the “checkout” sample step, and fill in the required parameters. Then, copy-paste into my groovy script and BAM! Rinse and repeat until the script works!

Let’s look at the next step:

stage("check") {
  steps {
    sh 'hugo list all > listfile'
    // only include these if you will use these features
    sh 'hugo list drafts > draft'
    sh 'hugo list expired > expired'
    sh 'hugo list future > future'
    copyArtifacts filter: 'listfile', projectName: 'check-mysite', selector: lastWithArtifacts(), target: 'last', optional: true
    script {
      res = sh(script: 'diff listfile last/listfile -q && diff draft last/draft -q && diff expired last/expired -q && diff future last/future -q', returnStatus: true) 
    }
  }
}

Now we’re telling Jenkins “Run this bash script: hugo list all > listfile.” If you run that in a bash shell you control, you’l get a listing of all the content in your website, and store it in a file named listfile. This is an “artifact” or in non-CI terms, it’s a set of files that results from our “build”. This artifact isn’t the source code, it’s what we want to save from our job working. I’ve also included optional code to save artifacts from other runs of hugo: if you have “publishDate” front matter on your posts, then you might push an article that is supposed to be available later than whenever you decide to get it into the main repository - in that case, you’d want to watch that file too! hugo list all captures everything including drafts, expired content, and future posts. In a later step I will save that listfile file, so let’s assume it’s available. For us we want to know if there’s any new content for us to publish: so let’s grab the last artifact and compare. I use the copyArtifacts plugin to get the last artifact from this job. The target parameter will let us move that into a sub-folder on our workspace called last. It’s important to call the copyArtifacts step “optional” because it fails if there’s no last artifact… which might happen if the last build crashes badly, or if this is the first time the job is run. Now that this step is optional, even if there’s not an artifact the job won’t fail.

Finally, we need some extra functions from Groovy to store the results in the environment variable we set up called res, so the next step is a script. Inside the script tag we can set the res variable to the result of the bash shell script diff listfile last/listfile -q. If you don’t know what diff does run man diff, and also know it compares two files for differences. If they’re the same then you get a 0 status code, if not then you get a 1 plus some text: which for the way we’ve set up Groovy here means res will be true if there’s something new from the last time we ran the check(s) and false if we haven’t published anything in 2 weeks - erm, I mean since last check. Oh, and if there isn’t a last artifact to compare and you have content, diff will give you a false-y response since you’ll be comparing against a blank file.

Remember: if you’re not using the “draft,” “future” or “expired” features in hugo, then you’d only have to run the first run.

Now let’s put in the last part of this script, even though we haven’t built the rest of this pipeline out:

stage("request redeploy") {
  steps{
    script {
      if (res) {
        build job: 'build-mysite', parameters:[string(name: 'branch', value:'main')]
        build job: 'deploy-mysite'
      } else {
        echo 'No build requested'
      }
    }
  }
}

This is pretty simple: if there’s new stuff to build (res == true) then call the build-mysite job then deploy-mysite. If we do nothing, just say we don’t do anything. The nice part about the groovy declarative syntax here is if the build fails we don’t deploy, and the check also fails: red across the board, neat! Don’t break the build!

Before we make those new jobs, we need to save the artifact we created in the check step. Add a post step after the stages group, like so:

post {
  always {
    archiveArtifacts artifacts: 'listfile, draft, expired, future', followSymlinks: false
  }
}

Decoding this, whenever this job is run, after all steps (no mater what: even if it fails) save the file lsitfile as an artifact of this build. Now that copyArtifacts step will have something to grab.

Build: Prepare, then Build

Now we’ve got a job that knows when we need to publish, we need a job to build Hugo. Luckily, Hugo and Jenkins make this pretty easy. We’ll need a new pipeline job like before, and we’ll need a new file like BuildSite.groovy with similar agent and options setup but now keep the builds: we can parameterize deployment to roll-back if we catastrophically fail, which is nice. To keep more builds, which saves the whole workspace (not just artifacts), set your options as below:

buildDiscarder(
  logRotator(
    artifactDaysToKeepStr: "",
    artifactNumToKeepStr: "",
    daysToKeepStr: "",
    numToKeepStr: "5")
)

Now, you keep the last 5 builds, and theoretically you can roll back to them. I won’t cover that in this article since it’s getting a little packed already. Let’s dive in to the steps for building our Hugo website.

First, let’s prepare the workspace (again) it’s exactly the same as the last pipeline we ran, so you know what to do! Next is a new stage, build. Let’s check it out:

stage("build") {
  steps {
    nodejs(nodeJSInstallationName: '12.18.3') {
      sh "npm install"
      sh "npx webpack --mode=production"
      sh "hugo --verbose"
    }
    archiveArtifacts artifacts: 'public/**', followSymlinks: false, onlyIfSuccessful: true
  }
}

So, if you’re running webpack (like me), then you’ve got a few things to build. If you’re only running Hugo, then you’ll only need the third line in the step, and you wouldn’t need the nodejs step. Other than that, this should look pretty simple. If you’ve ever deployed your own site you’ll know that you just run hugo to build your website. So that’s what we’re doing here, and you’ll get a public folder in your workspace, which we pick to be an artifact for our deploy script. Again, our archiveArtifacts command is nothing special: the public/** simply resolves all dependents of the public folder as in-scope.

Deploy to Firebase: Retrieve, Deploy

The final step in our CI journey, we want to get our build into production and I don’t want to have to push the button because I’m lazy (and you should be too) plus it’s nice if your stuff gets automatically published for you. Luckily, this is the shortest script, but we do need to make another secret. Read up here to get started. I’ve installed firebase on the build server and created a new credential. This credential works just like the SSH key we used for GitHub, and you’ll reference the secret as an ID like below.

pipeline {
  agent any

  options {
    timeout(time: 15, unit: "MINUTES")
    disableConcurrentBuilds()
  }
  stages{
    stage("retrieve") {
      steps {
        copyArtifacts,  projectName: 'build-mysite', selector: upstream(fallbackToLastSuccessful: true, parameters: 'branch=master')
      }
    }
    stage("deploy") {
      environment {
        FIREBASE_TOKEN = credentials('543445-4423243325-244443-42151131')
      }
      steps {
        nodejs(nodeJSInstallationName: '12.18.3') {
          sh 'firebase deploy --token "$FIREBASE_TOKEN"'
        }
      }
    }
  }
}

By this point you should be pretty good at reading groovy script; we are copying the artifact from the build (only the public/ folder based on how we configured the build script) and feeding that into firebase like we normally would if we publish from our local machine, if your firebase.json configuration is set up the same as mine.

Now, if you’ve progressed along the whole article you should be able to test out each part of the pipeline: if you’ve never run the “check” function then it’ll kick off a build, and all going well, it’ll deploy to Firebase, and push the new version to your website!

Continuing to Continuously Integrate

This is only the beginning of the game: we’re not even scratching the surface. If we review what Continuous Integration entails, usually it includes build, test, and merge: right now we’ve only covered build — in fact, I’m already assuming that you’re already merging our website to the main branch to signal that we’re ready to publish. If you’re a one-man shop, like me, then you don’t want to over-complicate your pipeline. On the other hand, if you’re running a larger shop with a bunch of people pushing to the same repository, then you’ll start to see the benefits. If you’re reading this and you’re in the position to pick the orchestration for a team like that, thank you, and also don’t use this as a template.

This is a toy CI / CD implementation, but it’s going to do the job for me. And the way we’ve built this up we can now count on our build server to pick up our Hugo site and even keep up-to-date with our Publish dates too. The best part is I don’t have to babysit it.

Yay laziness!

Addendum

Alright, so remember when I said I made “The only major updates to this are a few apt packaging updates,” now is the time we finally address the most intricate parts of my setup. I mentioned that at the time of this article that I’m running the current LTS version of Ubuntu, 20.03.1 (Focal Fossa), and my local install of Hugo is 0.76.3, and the focal version is 0.68. Not a great sign, but I’ll be honest that I didn’t check if there’s any site-breaking bugs, because my semantic versioning brain was screaming that a lot of minor revisions had happened and if I didn’t fix that discrepancy, then I’d probably be writing an article something about how the Hugo versioning system is awful. In lieu of writing that article I’m making this addenda to tell you how to pin releases from future versions in APT.

I have another disclaimer: I’m not a Linux admin, or even a Linux pro. 99% of my computing time has been on Windows XP and newer. Pinning apt packages isn’t really safe, but I found that doing this exercise was educational.

I saw that I could grab the “groovy” version of the package to get hugo 0.74. Groovy is Ubuntu 20.10, and isn’t LTS, but that semver really made me feel better about having different versions. The problem is, I don’t really want to install Ubuntu Groovy Gorilla, or worse a different OS like Debian entirely, but I do want that package. One solution is package pinning.

Essentially, we’ll tell apt to prefer the “groovy” version of the Hugo package over the normal “focal” version but only for hugo (and dependencies). Sometimes, you can get away with downloading the package directly from the package site and installing the .deb package directly on the box. In this case, that won’t work because of a few dependencies (libsass1 if I remember correctly). Those dependent packages will need to be downloaded from the newer packages too, so we might as well let apt do what it’s designed to do, and have it handle that.

I learned how to do this from the Ubuntu community (here)[https://help.ubuntu.com/community/PinningHowto], even there it’s only tacitly acknowledged as doable; really it’s not a great or permanent solution. That said, let’s do it.

The article I mentioned was written when xenial was cool; 2018-02-12, but it’s mostly still correct. To mess with any of the files it’s talking about you’ll need to be the super user, so sudo vim /etc/apt/apt.conf.d/01-vendor-ubuntu. Remember, I am using focal for my default, so mine would say something like this:

APT::Default-Release "focal";

This tells apt that I don’t want all of my packages to be from “groovy.” Next, I have to allow “groovy” packages available to apt in my /etc/apt/sources.list:

deb http://us.archive.ubuntu.com/ubuntu groovy main restricted universe multiverse

This tells apt that it’s ok to source it’s packages from here. By the way, I’m closest to the us.archive.ubuntu.com distribution network: YMMV.

Finally, to tell apt that Hugo should be the “groovy” version, I made a new preference file in /etc/apt/preferences.d/99-hugo.pref:

Package: hugo
Pin: release n=focal
Pin-Priority: -10

Package: hugo
Pin: release n=groovy
Pin-Priority: 900

and now, when I install Hugo using sudo apt-get install hugo I’ve got the 0.74 version, so I’m good to go.