The Impact of Automated Versioning Tools on the Development Process
How the choice of tooling can affect how you and your team develop, and how they can be used to adopt trunk development
I was reading The DevOps Handbook (still not done with it, btw, so I wouldn't be surprised if there was an answer to the central question I pose in this post later in the book), and I came across a chapter innocuously titled "Enable and Practice Continuous Integration".
Yes, I know those words, but what does that title even mean (just as nobody apparently knows what 'agile' means)? In this specific case, they were referring to the role that Continuous Integration (CI) plays in making trunk-based development a reality.
Trunk-based development is a style of development where you don't "stray" too far from your master branch; the idea is to not have giant feature branches (which is the defining characteristic of feature branch development, the "de facto" style of development) that stray further and further away from the master branch.
Ideally, in trunk-based development, you merge small changes and merge them often, which 1. speeds up iteration cycles, 2. reduces the cost of failure, and 3. lowers the cost of integration (a common problem with feature branch development is that you'll often have dozens or even hundreds of branches all doing separate things that are really hard to "integrate" back into master branch and keep up with all the changes that have been happening during isolated feature development - just imagine resolving merge conflicts after 6 months of development on a feature branch!).
It makes sense, and my personal experience also suggests that feature branch development is not "scalable"; however, I take problem with the book's advice to "just slap on CI to encourage trunk-based development" (paraphrased). It's just not enough to actually steer away from our natural tilt towards feature branch development; having CI is only an incomplete part of the solution.
Why Is CI Not Enough?
Let's first think about the all of the reasons we normally tend to gravitate towards feature branch development (which, again, is the "default" for so many of us).
For one, it is a logical grouping of changes. Keep all the changes in one branch, and it makes sense that every commit will simply build on the changes for the feature and only for the feature.
For another, it is an isolation of the feature. When you're developing a feature, you really want to focus on it and it alone; the costs of context switching (say, from integrating upstream changes to your feature branch regularly) are high, and said changes may very well interfere with your feature.
And finally, building upon the "logical grouping" bit, it is a self-contained unit of changes, which makes versioning and releasing the changes in context of the application very easy. You add a feature? That's a minor bump. It contains breaking changes? That's a major bump. And all of the information about the feature can be compacted down into one "release"/changelog entry easily, thanks to tooling around automated releases (more on this later).
Really, feature branch development is so "natural" to us because it is the path of least resistance (at least, before we have to start integrating): not only is breaking a "feature" down into multiple "sets" of changes (that really make no sense on their own) really not intuitive, it is also a cognitive overhead. The same with keeping up with changes from master onto a feature branch - it, again, costs us cognitive overhead in form of context switching. The same with versioning and release as well.
There are parts where changes in the process/culture can help, such as explicitly breaking down any given feature into bite-sized pieces (e.g. PRs that accumulate commits for no more than a day) before you even begin work, so that each PR is linked to that piece rather than the feature itself. This would allow for iterations that are fast enough that context switching from "integration" of master changes don't impose much cognitive overhead. Plus, you would still be able to track and "group" these subsets, as they would all be linked as sub-issues of the feature issue.
But, that still leaves behind releasing, and many of us build products for people who have to install/deploy our stuff (e.g. libraries, applications) themselves; where the stream of releases can't necessarily be mapped directly to every single users' deployments/installations.
For instance, if you work in a team where you have a library that is only used internally, your monorepo tooling can make sure that any consumer of the library is automatically using the latest version (via symlinking). If your team develops an application, and there's only one "instance" of the application, then your team can control the deployment of it, and have it release directly from master or some other deploy branch you merge master into.
But that's not feasible if you have other people using your library, if your application is of the type that needs to be deployed by other teams/users to be useful, or if you want to "productize" your library/application, either commercially, internally, or as an open source project.
That is where versioning comes into play, where the versions serve both as a "snapshot" of a set of changes pushed to master, and as a "contract" of what that set of changes entail, so that people can confidently take versioned releases and use them in their codebase/deployments, at a different cadence from your release schedules.
And this... is a hard problem to solve, especially for trunk-based development, where a "feature" is sliced and diced and spread all across the commit history. And it's not something that can be solved just by changing the process/culture (I mean, you could if you were to do all of this manually, but again, that adds to the cognitive load, which discourages people from doing releases in the first place), and the tooling around it naturally makes us lean towards feature branch development.
Thus, having CI is simply not enough.
How Tooling Affects Our Development Process
So first off, what do we need from our tooling to help support versioning and releases? What problems is it solving, and why even have automated versioning/releases in the first place?
To answer those questions, let's look at what a manual release process looks like:
- You make a series of commits, some related to a feature, some not. Some break an established "contract" between the creators and the users of the software, some have zero impact on the actual software itself.
- Every time you decide to do a release, you must communicate 1. what has changed since the last release, and 2. what the impact of those changes are.
- You'd also like to version them properly (i.e. semver), publish the releases as an artifact (whether it be pushing the library or sharing the docker image) somewhere, curate a changelog that users can easily see and search through, etc.
- And if it's a monorepo, you now have to repeat this N times.
Obviously, this is a lot, and the context tends to get lost over time (especially so if you're doing it manually, so that you have to go back in commit history to see what the changes were about), not to mention the fact that as multiple people get involved in the making of a release, the release process becomes harder and harder to get "right" manually.
So, we typically bring in tools to automate parts (or all) of the above problems:
- Tools exist to encode information about every bit of change (e.g. a commit), as we are making that change - whether it contains a breaking change, which parts it affects (if you're in a monorepo), and the nature of the change (bugfix/documentation/feature/etc).
- Then, the tools can look at all of the changes that has happened since the last release and aggregate said information to produce new information, such as whether this set of changes require a major/minor/semver bump in version; then it can tag and release the software appropriately.
- They can also use supplemental information provided at the time of making the changes to aggregate all description of changes into a nice changelog or a GitHub release.
Very nice (now let's see Paul Allen's automated release tools), but how do these tools affect the development styles/processes?
lerna (the "conventional commit" approach), and
changesets (the "breadcrumbs" approach).
In the "conventional commit" approach:
- You structure your commit messages in a specific format, "tagging" them with a commit type, description, and optionally scope (this is mostly only applicable to monorepos). Then, when you go to make a release, the tooling looks at the list of commits that have been made since the last release, aggregates the commit types to produce an appropriate semver version bump, generates the changelog based on the commit descriptions, and releases the software.
- Due to the git-centric approach, releases are often done automatically as part of a git workflow, most often on PR merge (which goes really great with feature branch development).
- In fact, since the commit with the "highest" change type is used to determine the release type (e.g. a minor version bump from a feature "beats out" a patch version bump from a bugfix), developers are encouraged to "contain" those commits in a separate branch for the feature (to prevent prematurely making a feature release, and to "bake in" all of the "lesser" changes related to the feature alongside with the highest commit), leading to feature branch development.
In the "breadcrumbs" approach:
- You commit small files, called changesets, as you are making the changes. The files contain change type, description, and optionally scope (this is mostly only applicable to monorepos). Then, when you go to make a release, the tooling looks at all of the changesets that are currently present in the repo, "consumes" them (i.e. removes all changesets from the repo), aggregates the change types to produce an appropriate semver version bump, generates the changelog based on the commit descriptions, and releases the software.
- The difference with the semantic commits approach is that change is inherently decoupled from commits and are instead coupled with the repo directly. This has some implications:
- Releases are often done manually (though in a systematic manner - e.g. the automatic aggregation of change types), as the changes are less coupled with commits, and thus, PRs.
- Streams of changes for a feature can be more easily broken up across PRs, as you control when the release goes out.
- However, as the number of features being developed concurrently increases, and as the release process consumes all changesets currently in the repo, you don't have as much of a control (when you break up the changes across PRs) of when parts of the feature are released (a release for something completely separate could consume all of the parts that you have committed so far).
- This often leads to "hoarding" of changes until you're done developing the features. And as features often contain multiple significant changes that will prematurely affect the release types, and as such changes are often hard to push till the end, developers are naturally inclined to stash them in a separate branch, where the tooling can't touch the changeset. Which brings us back to feature branch development.
As you can see, in both cases, scaling up both development models (and the loss of control over coordinating the changes that make up a feature) inevitably lead to "hoarding" of changes in one form of another. Do all roads really lead to feature branch development?
How Do We Encourage Trunk-Based Development?
There seems to be an inevitable clash between trunk-based development, which aims to merge changes into master as often as possible - regardless of the overall feature - and version/release tooling, which aims to "batch" a set of changes into a versioned release.
So where did it all go wrong?
From my experience, it's from trying to cram together two different types of changes - application changes, and feature changes - into one tooling.
At the application level, semver versioning makes sense - it communicates 1. what's roughly in a release (a bugfix, documentation updates, etc), and 2. what to expect when upgrading (will it break the contract established between the creators of the software and its consumers?).
But features (and the "streams" of changes that make up a feature) flow separately from that, running across application versions.
We'd like to be able to see a feature in a holistic manner, and see what said feature involves. Note that this doesn't necessarily mean that the whole of the feature has to go in a single semver minor release, but that due to the way changelogs are produced (i.e. they are "batched out" every version), you quickly lose picture of the whole feature. And so, people "hoard" changes until it can be merged in as a single versioned release.
Thus, separating out those two streams of "changes" ("contracts" and "features") should remove the incentive to "batch" changes related to a feature.
One possible way of doing so (which is not the complete solution for reasons I'll elaborate in a moment) is via tagging. If you tag the changes separately from the versioning as you're making the changes, you can simply "bunch up" the changes by looking at the changes that have the tag.
We already do this with Jira-style tagging in commit messages (e.g. PROJ-1234) - which allows us to see all the commits that are part of an issue. However, they don't really communicate anything about the state of such features as they are being implemented, nor are they visible to the end user (unless you're sharing Jira boards, which will not always be the case).
You can manually write release notes to indicate that a feature is "done", but manual processes mean there's a hill you need to actively climb before you can overcome the "activation energy" needed for that manual action, which means people are discouraged from doing it.
So that's not it.
But it might be possible to combine different tools (as a dirty, dirty hack) to have separate streams of changes for the application and features.
The commit-based tooling is very good for automatically keeping track of, and communicating to the users, of application-level changes, and the changeset-based tooling is good for keeping track of changes separately from the commits.
So theoretically, it might be possible to combine those two tools, using conventional commit format to communicate changes to the application (that should be automatically consumed), and separate out feature-level changes to changesets (that will be manually consumed).
Remember - the reason we had the hoarding behaviour with the changesets approach was that any release consumed all changesets, thereby tightly coupling the application level changes with the feature level changes. Having commit-based tooling handle the former automatically should free the changesets tooling to handle the other "stream" of changes (i.e. create manual versioned releases from the feature, once it is complete; this means there's no need to hoard semver-minor changes till the end), decoupling them.
Theoretically, I don't see why this couldn't work, but I have to reiterate that this is 1. a hack, 2. something I just pulled out of my ass that theoretically solves the problem (of decoupling the two different streams of changes). As far as I know, no one does this in practice, and so we would have no idea whether this would actually work in practice unless some psycho actually tries it in a codebase with multiple developers developing multiple features in a single repo concurrently.
If there's anything to take away from this, it's that versioning and releases are hard, and that trunk-based development, while ideal, clearly haven't been used enough in the real world to warrant the development of tooling surrounding it (specifically, the kind to encourage trunk-based development).
While I proposed one theoretical possibility for version/release tooling that encourages trunk-based development, I suspect that most people will simply wait until some company does trunk-based development and runs into enough of the "hard problems" to the point where they develop (and hopefully publicly release) a version/release tooling to help with such development approach.
In the meantime, for the rest of us mortals who naturally gravitate towards feature branch development (due to the lack of tooling for the other approach), the only tips are to pay down the cost of "integration" (i.e. context switching) by rebasing from master often as you keep the feature branch alive, parallel to the master branch, and to break down a "feature" as much as you can (so as to align the application and feature level changes closer together).
It seems that whether you do trunk-based development or feature branch development, you'll need to pay the piper for the privilege...