Ah! I always struggle to explain these concepts and the superiority of Policy-Driven, but I think I’ve found a good example! My Air Conditioning systems…
I’m buying a car, and found one with Automatic Climate Control feature: you set the temperature you want to get on the central console, then the system ‘makes it so‘. This is policy-driven automation.
I’ve also moved in a new flat recently, and the air-conditioning system is not centrally managed, each units has a remote control (pictured) with a small LCD screen indicating the ‘current known settings‘ in a sort of dashboard.
But that remote is using Infra Red (you need line-of-sight communication, or pressing buttons won’t work), and one of the remote is used to set two independent units. These details impact the workflow.
To turn the AC on, I have to aim at the unit and press the big yellow button on my remote. The unit emits a sound to confirm the order was received (action). I can also confirm visually that the flaps are open and it blows cold air: the unit’s state is compliant with my intent.
The order given, depends on what’s displayed on the remote. If the remote says ‘OFF‘ (the top-left ON indicator being absent), it will send the command to ‘turn on‘ the AC. If remote says ‘ON‘, it sends the ‘turn off‘ signal.
Now, mainly when we use one remote for two units, after turning the first one on, the remote says ‘ON‘, while the second is still OFF. If I aim at the second unit and press that yellow button again, the unit beeps (signal received), but doesn’t turn on… The remote now shows ‘OFF‘, and pressing again finally turns on the unit.
No rocket science here, but when I look at the remote (some kind of Dashboard) telling me the last actions performed, I have no trust in the actual state of the unit. To be fair, and as a side note, in the car’s climate control system, I’d probably not completely trust the temperature displayed, and could verify with a thermometer (see how I sneakily explained Operation Validation!).
The other problem is that by looking at the remote and the unit, I cannot see what was the intent of the last change.
Say, my wife used the remote last before handing it to me because she was busy, I still have to ask her what’s her intent to make sure the system is properly configured.
Me: “What did you want to do?”
Wife: “Turn on this unit”
Me: “Why?”
Wife: “Because the room is too hot”
Me: “Ok, let me work it out”
click a first time so that the remote shows the ‘OFF’ state, the click again to turn on the system…
You see, operating the remotes need “extra brain-cycles“: understanding the way they work, validating the previous state of the unit, how to orchestrate actions to get to the desired state… It might take only two minutes to learn, and a couple of seconds to redo the actions (transforms) each time you want a change, but think about the five AC units in my home, doing this 4 times a day at least, and the other settings (temperature, quiet mode) – it adds up, which is definitively not a big deal for my home, but highlights a scaling challenge.
Now imagine for the sake of the argument, that my home goes from 5 units to 10, 4 remotes to 8, and 2 users to 4…
We need to spend a lot more time communicating about the intents, and remember them for each unit, and time processing the required orchestration.
This effort uses cognitive load and working memory to communicate the intent and operate the system.
The car’s system is superior, from a quick glance I can see what’s the policy currently set (the intent is visible), and if it’s not what I set earlier, I can ask my wife why she changed it. The communication does not revolve about understanding the desired state, the current state and how to get from the first to the latter, but WHY did she not agree with the policy I set originally. If I need to change it while driving, I want a system that uses as little brain power as possible.
Me: “Why did you change the temperature?”
Wife: “Because the kid is freezing”
Me: “Oh, Good call! Sorry!”
In practice, that do not happen like this. Following the intent-based leadership concepts we read in “Turn the Ship Around” by David Marquet, we notifies each other of the intent to change the policy, for instance:
Wife: “I’m increasing the temperature a bit because the kid’s freezing”
Me: “Sure”
Changes the policy on the car’s dashboard
And this scenario explains some of the benefits git’s pull request can bring to Policy-Driven Infrastructure (along with many other).
If the Intent of your infrastructure is visible by defining the desired state declaratively, most likely leveraging a Configuration Management tool, you are in the right track for the practices named as (and described in the book) Infrastructure As Code (Unfortunate name, I prefer Policy-Driven Infrastructure).
Because that declarative code lives in a document (text in a file, ultimately), it is best managed using a version control system, and a distributed one such as Git, enables proven, well known and well documented collaboration workflows. Finding the worfklow that works fine for you will enable continuous improvement, quick iterations, better communication and collaboration. As you make your infrastructure intent and changes visible and collaborative, you may find you can manage your work better by “Making work visible” as brilliantly explained by Dominica Degrandis, and mapping work items to changes.
Does that explain the principle? Do you have an analogy easier to understand to illustrate the problem?