Thursday, October 3, 2019

DevOps in detail, Trunk-based Development

In this post I'm taking a short diversion to discuss one aspect not directly related to the implementation of a DevOps solution, but very relevant to the success of the DevOps strategy supported by such solution.

Gitflow vs. Trunk-based Development

As a DevOps leader in your organization, you have visibility on most SW projects being developed around you. Take a look at the development teams behind those projects and think of the processes they follow to move the code base forward.

If you work for a large corporation, most likely the teams are using Gitflow. It's been a de-facto standard for some years now and has displaced the traditional feature branch-based development processes used elsewhere during the pre-cloud era.

The problem with feature branch-based development was it just couldn't keep pace with the rate of change required in the cloud era. The standardization of platforms (Linux, Android, iOS) and advances in SW packaging (snap, VMs, containers, APKs) enable fast deployment of new SW versions to users, but a feature branch-based development process cannot exploit that ability. When you're releasing twice a week, the number of open feature branches and associated merges the dev teams need to handle becomes unmanageable. A process allowing faster progress of the master branch's head was necessary.

Hence Gitflow was invented. In Gitflow, development and release works are pipelined so work on one of them does not prevent progress on the other. The frequency of releases is limited only by the time required to go through the release process. New features can be developed while existing ones are polished to be released. The master branch is always tidy and shiny and points to the latest release of the SW. Everybody is happy and life smiles at you.

This figure illustrates a typical Gitflow process; you can browse here for a short summary.


Cool, Gitflow helps increasing the release cadence. However, it still does not allow you to achieve Continuous Delivery, neither does it tackle the main issue that arises when developers start using feature branches: merge hell.

Merge hell and 'distance' between developers

Traditionally, feature branches have been used to craft new features into an existing SW baseline without compromising the quality and stability of the main branch. Bug fixes are also introduced through short-lived feature branches, in the absence of a better mechanism. Regardless a feature-based or Gitflow-based development process, the main issue with feature branches is that when a team or developer starts working on a feature branch, it's like an army dispatching a platoon to take an enemy outpost. If the outpost is near and is easy to take, the platoon will soon complete their goal and rejoin the main army, which immediately reaps the bounty captured in the operation. However if the outpost is on a distant land and/or is a tough target, the platoon will stay quite some time away from their comrades. Many things might happen during that time: other platoons could be dispatched to take conflicting targets, bounties might have lost relevance once they're finally captured, valuable soldiers lost along the way might delay the achievement, and perhaps the worst of it all: when the platoon gets back with the bounty the army might be many miles away from where they left it.

No matter if the target remains strategically sound, the bounty is still valuable and they take no losses in the op: if the army is not there when they get back, the whole op might be a blunder. Imagine that happening to a dozen platoons dispatched every week or two weeks. There's possibly no way an army can gather all those dispersed platoons. And that's indeed the main issue with feature branches: when a feature is done and the branch is to be merged to the main code base, that code base may have moved substantially, forcing the dev team to carry out a big effort to adapt their changes, performed on a code base hundreds of lines away from the current one. And that happens to each and every feature team. That phenomenon is known as merge hell, and regardless how good the team, how valuable the feature or how complex the code, there's no way the team can get away without it.



How do we prevent merge hell? If we were in the army the answer would be "right, the outpost is a thousand miles away and is well guarded so there's no easy way to this, period". Fortunately we're not in the army. The stem of the merge hell problem is in what can be referred to as 'distance' between developers. The longer different developers work on the same code base, the more divergence between their versions of the code base. Let's call that divergence a 'distance'. The longer the 'distance', the harder it will be to walk that distance back to a common point. If we could minimize that distance and keep it to a reasonable size for the number of teams working on the common code base, we would end merge hell once and for ever. We need to pick close and weak outposts so our platoons can leave early in the morning and be back with any captured bounties before dusk comes, move the army between dusk and dawn, and start it all over the next day.

Trunk-based development

Trunk-based development (TBD) is a systematic approach to forsake merge hell and achieve Continuous Deployment. To decrease developer distance, all developers sync on a single code base, 'the trunk'. Updates to that code base are submitted in small chunks, ideally sized at one-day-worth of work, or even smaller. Everybody is aware and participates of those updates on a daily basis. That way, all developers share a single, common view of the code base, like a shared mind (sort of).



TBD can be achieved by following a few simple rules:

1) no branches: at every point in time, all developers see the same code base (the trunk)
2) single source-of-truth: the trunk contains everything (this implies what's not in the trunk does not exist)
3) short-lived changes: any update to the trunk should be crafted and submitted in one day, exceptionally two (if e.g. someone goes sick before being able to submit)
4) continuous integration: each and every update to the trunk is integrated ASAP and proper feedback is provided to the update author(s)
5) broken master goes first: if feedback indicates the master branch is broken, fixing it is the single highest priority in every developer's task list
6) code review goes second: outstanding code reviews are the second highest priority in every developer's task list

Following those rules, distance between developers is minimized. All developers are aware of what updates are integrated in the trunk every day. They proactively keep their copies of the trunk updated, eagerly checking outstanding updates to review and browsing comments to reviewed updates. Eventually, once your deployment process is streamlined, you can reach the nirvana of Continuous Deployment, having each and every update deployed to production promptly and safely. At that point, your job is done. There's little else you can do from the DevOps perspective to improve the business, so enjoy a well-deserved rest while you keep the DevOps engine humming.

Getting there and resistance to change

Okay, so you're convinced TBD is where you want to go in your DevOps strategy. Now all that's left is convincing everybody else that's the way to go. And that's the toughest part (and the reason I wrote this post in the first place). If you check the TBD list above, there're a number of well-established behaviors the developers need to change, and there're some new they need to acquire.

Starting with senior developers, people feel quite comfortable with the Gitflow process. It allows a team to keep feature branches open indefinitely, even several of them in parallel, until features are done. They don't need to check the trunk every day, neither are they obliged to check on their colleague's work at all. They can blindly move forward with features, then blame merge issues when feature integration starts causing trouble ('it works in my branch').

Moving people out of their comfort zone is not easy feat. You'll need cooperation from Managers and Product Owners in order to shift teams to TBD. The following advice may come handy when you start walking that shaky path:

- convince senior management TBD is the way to go. For this you can use business arguments. It is well documented that Continuous Deployment brings a number of benefits to a SW business, and CD cannot be achieved without TBD. Leverage articles from the Internet, e.g. this.

- tell Product Owners how CD will improve their products and squeeze more outcome from their budgets. Explain that CD is hard to achieve with Gitflow. Bring them on your side to help shift teams towards TBD.

- with the support from senior management, you can work with Managers to define goals that steer teams towards TBD. Craft concrete goals, e.g. average number of commits per day or average time a Pull Request/Merge Request remains open. Managers are good at people so ask for their help to coach developers in their path to TBD.

Conclusion

TBD is the new standard development process. It is a gateway to Continuous Deployment and brings many benefits to a SW development organization. But successfully adopting it requires discipline and motivation, and that won't change overnight. As a DevOps leader, you'll need patience, perseverance, and cooperation from other areas of the organization in order to successfully transition from Gitflow to this new process.

Friday, September 13, 2019

DevOps in detail, SW change life-cycle

In this second post in the DevOps series, we'll look carefully at the atomic unit of work in a DevOps system, the SW change.

A SW change is a set of additions, modifications and/or deletions on the code base of a single project. SW changes come in multiple sizes, but if your project adopts TBD (as it should), the smallest the SW change the better.

A day in the life of a SW change

From the developer desktop to the production infrastructure, a SW change makes a long trip traversing multiple stages along the DevOps system. But no matter how long that trip is, for benefiting from Continuous Deployment the trip should be completed in less than one day.

Let's describe schematically how a SW change spends a busy day in the DevOps system:

A busy day for a SW change
  1. newborn: the SW change has just entered the DevOps system. Its gate into the system is the SW Version Control (SVC) service, e.g. git. Just arrived from its home town Developer Desktop, it makes a humble entrance as an anonymous citizen known only to his father, Bob Developer. But don't underestimate this small newcomer, it might potentially change the world!
  2. verified: as any newcomer to an organization, this new citizen must go through a safety check. Authorities need to make sure it's a sane individual, not carrying any harmful items or illnesses. Ideally, they would also try to assess what impact this newcomer might have on the organization welcoming it. Thus a number of checks are run on the SW change to verify its quality. The outcome of the checks is stored in a report in the SVC service. A copy is sent to the change's father, Bob Developer.
  3. merged: if the report carries a 'REJECT' statement, Bob must receive back his SW change and fix its weak points before the next attempt into the organization. If the report carries an 'ACCEPT' statement though, the responsible for approving newcomers is notified, and after a quick visual inspection the responsible (usually known as "the committer" in SW jargon) cheerfully grants the SW change access to the receiving organization as an approved citizen, merging it into the master branch of the project.
  4. staged: no matter how thorough the initial scan is, and no matter how smart the committer may be, there's no way of knowing in advance how will the new citizen behave and perform in its new home. Thus, before raising its status to first-class citizen, Authorities put the SW change under surveillance in a simulated environment for some time. The simulated environment should be as similar as possible to the real one, and the SW change should receive identical stimuli as it would in the real world. The name of this experiment is staging, and its goal is to reach a high level of confidence on the expected outcome of releasing the individual under surveillance as a free, first-class citizen to its receiving organization. During the surveillance lots of data and information about the SW change and the simulated environment status are gathered and attached to detailed reports, which are stored for reference. If the SW change passes this experiment it is tagged as candidate for first-class promotion. Otherwise is is rejected and sent back to its creator together with the detailed reports generated so any problems detected can be fixed before trying to enter the organization once more. Finally the outcome of the staging experiment is posted to the SVC service for reference.
  5. released: finally, after having passed all qualifications, the SW change is ready to become a productive first-class citizen in its receiving organization. The Authorities queue the SW change up for entrance to the real world, where it is received with joy by its peer citizens. What lies ahead for this SW change and the rest of the organization nobody knows yet, but at least the organization's Authorities can rest assured they did everything in their hands to maintain a healthy, productive, useful organization making the world a better place.
The SW change should traverse all those states as quickly as possible, and in any case in less than one labor day. The owners of the DevOps system play the role of the Authorities, and must strike a balance between safety and time, guaranteeing that as many checks as possible are performed on every SW change within the available time before releasing the SW change into production, where the impact of an undetected problem is much wider (potentially unlimited!).

Conclusion

In this post we have looked at how the ideal DevOps system should handle the atomic unit of work in the system: one SW change. It's useful to keep in mind that model in order to be aware and estimate the impact of your deviation from the ideal system.

For example, your safety checks may take more than one labor day to complete, in which case you're likely unable to introduce SW changes to the production environment one at a time. That in turn means you need to gather as much data as possible from that environment in order to easily pinpoint the liable cause of a problem detected after a batch of SW changes have been released into production. Fixing you safety checks will take you closer to the ideal model hence more able to gather the benefits of the DevOps practice.


Friday, June 14, 2019

DevOps in detail, Introduction


This is the first in a series of posts explaining in detail what DevOps is and providing tips for its implementation in a SW development organization.

These posts won't deal with the organizative aspects of DevOps, e.g. how to change your company's culture to embrace DevOps or what adaptations your company structure needs to effectively leverage the benefits of DevOps. You can find multiple books and other sources analyzing those subjects.

Why DevOps?

Before embarking in the DevOps journey you probably want to know what is the purpose of such journey, right?.

Look at your SW business and craft a wish list of improvements you'd like to achieve. I'm pretty sure that many (perhaps all) of the following wishes will be on your list:
  • low risk releases
  • faster TTM
  • higher quality
  • lower costs
  • better products
  • happier teams
The DevOps paradigm, if applied correctly and thoroughly, can bring your business all those benefits. Have a look at the Continuous Delivery web site for more detailed reasoning about how that is possible. In these posts we'll focus on the technical details of a DevOps machinery for automated SW production.

Make it easy on you

Whatever your organization's current status in adopting DevOps is, there are two aspects in SW products that can make much easier the transition to a pure DevOps environment. These aspects are:
  1. Frequent releases. This aspect is more related to how your organization manages products than it is to the products' technical details. Think about some of the products your organization makes; how often do they publish a release? Once a month? Once a week? Every day? The more frequent your releases are, the easier will be on your organization to fully enable  Continuous Delivery, where basically every approved commit is released into production.
  2. Micro-service architecture. This aspect is concerned with how the different parts of a SW product are structured, coded, and deployed. Again, think of products you make or work with. Can you safely and easily replace a running version with a new/old one? Can you replace some part(s) of that product, leaving the rest untouched?. If you can, it's very likely the product in case is made as a set of micro-services (small, simple constituent parts with little coupling to each other).
If you're on the infrequent releases side (say, one release a month), think twice before start adopting DevOps principles. The no-return point of publishing a release imposes a strict (and expensive) discipline of thorough testing and verification across multiple stages until you're confident the product is production-quality, and you can't squeeze that process down indefinitely. Moreover, you'd spend a huge amount of money in doing so. Instead, remove barriers and speed up your process. Some things you might try are:
  • Design-For-Failure (DFF): introduce that discipline to your development teams. Assume the SW will fail from the very first moment. If will fail continuously and in the most exhuberant ways. Have your teams interiorize that assumption and work according to it. Check page 11 in AWS Best Practices for further information;
  • Simplify start-up&shut-down: remove as many steps as you can from your SW start-up and shut-down stages. Move somewhere else or schedule for later those you can't remove. Your SW must be able to come up and be removed in a snap;
  • Test often, test early: speed up your tests and move them as closer to the developer desktop as you can. Do your performance tests run fast? Run them on every commit then. You have mocks for most components of the system? Run integration tests instead of component tests. Can you capture real inputs to deployed systems? Apply them to systems in development to verify how they would perform in the real world.
Similarly, being on the monolith (as opposed to miro-services) side jeopardizes your organization's ability to run DevOps by the book. Sytems characterized by a regular number of complex parts tightly coupled to each other, all running in localized computing resources in order to achieve the highest throughput-per-square-meter give place to many-to-many heterogeneous interactions between mutually dependent parts, which in combination with late integration testing leads to butterfly effects and never-ending verification and fault slip-through. Instead, try to move gradually to a micro-services architecture for your product(s). Check this InfoQ post for more details. Things you might try are:
  • Start stripping away non-critical parts of your system. Instead of breaking up your system's core, start with those parts representing a lower risk to the product's success. With the knowledge and experience gained in doing so, you'll be better prepared to undertake the split of the more valuable parts;
  • Stop adding new features to existing parts. For every new functionality you want to add to the existing system, evaluate the possibility to craft it as a separate, loosely-coupled process, using network interfaces instead of IPC or dynamic linking;
  • Fight technical debt. Wrong decisions taken due to time constraits will slow you down in shifting to a micro-services architecture. Follow the good practice of devoting one sprint now and then to remove technical debt. Look carefully into technical debt warnings from your development teams and try to give them the time and tools to prevent it.

Conclusion

In the next post, we'll describe the main premises a DevOps environment should follow. Make sure your organization and products are in the right shape to adopt those premises before sinking hard-earned money in implementing them in your organization.

Go middlewares for object-oriented programmers Go language (Golang, http://golang.org ) is a very simple procedural programming language of ...