Effective Enterprise Software Development

Building the machine that builds the machine

Effective Enterprise Software Development
Photo by charlesdeluvio / Unsplash

These days, software seems to run the world.

Software is the stuff on your phone or laptop that lets you read this very article. The stuff that makes the electricity grid run. And the stuff that runs the stock market, matching countless orders at a blazing speed.

But it’s people who have to build the browser, and maintain the infrastructure making up the internet; people who control the grid, and manage the equipment that delivers power to your home; and people who figure out how to trade profitably at such speeds, without going bankrupt.

Ultimately, as digitized and mechanized our world is, underneath it all are people that drive ‘the system’ - or more specifically, groups of people. And enterprise software is the ‘grease’ that lets these groups of people organize, function, and interact with each other, helping them run the world: it’s the ITSM platform helping the ops team coordinate work needed to keep the internet running; the ERP software helping the power company track inventory and schedule maintenance; the OMS software helping the trading teams execute orders, and the RMS helping the risk department monitor risks from those trades.

And, as to be expected, these software are quite different from your regular run-of-the-mill software.

Sure, they tend to have that signature ugly-ass look (okay, maybe except for the ones I develop), but that’s just the surface-level factor. The function and role of these software lead to distinct characteristics about how they work, how they get developed, and even down to how they are conceived.

For starters, they are inherently highly business-specific, essentially encoding what makes a business a business: its people, its processes, its business model, its equipment, etc. So not only do they need to be tailored to the specific business’s functions, but also how that business (and by extension, its people) runs as well. And, of course, since they are made to address a specific business need, it means that there’s a foremost ‘domain expert’ who knows that better than anyone else in the company, and that person is (very often) not you.

Another characteristic that directly follows is that they are inherently highly complex and large; after all, businesses are highly complex entities that need to fulfill a large amount of needs (and if they weren’t, well, there wouldn’t be a business because someone else would’ve already done it). This means that there will be a practically endless bucket list of things that these software will be expected to do, especially as businesses evolve.

And the size and complexity mean that it will necessarily be too much for a single person to handle - there’s no “10x’ing” your way out of this. It will be necessary to have multiple people coordinating and sharing the burden of complexity in order to solve this. And, naturally, the performance of the team will be “blended” between those involved; of course, those higher-up in the org will be looking out with keen eyes to try to pick out the individuals’ performance from the team's.

Finally, at the other end of the software will also be a group of people (your “customers” are other organizations, after all). It may sound obvious, but its consequences are anything but; for whatever organizational dynamics you face, they do, too. Their hierarchy, organizational structure, personal relationships, etc. influence what the org and its people do to survive, which in turn affect what they want from the software, the how, and the when (not to mention the additional complications that arise when you have multiple customers, for which the software is being developed at the same time, or when your customers are your team, etc).

All of... this... leads to a really complicated landscape of requirements, interests, limitations, and relationships to navigate. I mean, how do you go about optimizing all of this, to solve for ‘X’, when we don’t even know what ‘X’ is? What do you even optimize for, when ‘X’ itself is made up of countless different things? How do you go about developing with all this in mind, balancing the interests of those involved, making the necessary tradeoffs (or what tradeoffs are even right to begin with), all the while making sure that it serves the business needs, on time, and on budget?

Well, that’s where it gets complicated, and it’s where the development process itself becomes a ‘thing’ to be solved, and not just the business problem that the software itself is meant to solve. After all, you can’t just throw a bunch of people together, tell them to ‘just do it’ and hope it works out.

One Must Imagine Sisyphus Happy

To see why that wouldn’t work, you only need to ask: what are you even supposed to do? What are you even solving for?

Whatever ‘the business’ wants? Whatever your bosses want?

But what does the business want? It's tempting to say ‘more money’, but, money is not money. Does the business want more growth? Margins? Cashflow? Market share? Stability? All of the above?

And what of your boss? Your boss’s boss? The CEO? The board? What does ‘the business’ want them to do? Is that what the board tells the CEO, the CEO your boss’s boss, and so on? Can ‘the business’ really specify what it wants its workers to do, all the way down to the trenches? Is there even an answer?

But let’s say you do have an answer. Let’s say you know exactly what the business wants of the software you and your org are to build, as a means to a specific end.

Now, consider the perspective of an individual worker that is a part of the development machine - whether they are a developer, a designer, a technician, a manager, etc.

What do they want?

Well, everybody wants different things. Everybody ‘optimizes’ for their own needs and wants - based on their own values, their own circumstances, and their own positions.

Maybe they just bought a new house, started a family, and want to keep the job for as long as possible, while minimizing the chances of getting fired or laid off (Person A - “Safety”). Maybe they derive meaning in being an important part of the team, the department, or the business, and want to climb the corporate ladder as quickly as possible (Person B - “Promotion”). Maybe they are in need of money, in an uncertain situation in life, and want to make sure that they have enough transferrable skills in case they were to get kicked to the curb (Person C - “Skill Up”).

Unfortunately, optimizing for these personal goals often comes at the expense of what’s best for the business (and depending on your professional experience, you may be very familiar with some of these outcomes).

In case of A, they may try to achieve job security by making themselves irreplaceable, making them - and only them - out to be the ‘key person’ or ‘owner’ of a critical system: intentionally obfuscating code, not writing any documentation, and fighting any efforts to have others be involved.

Person B may shoot for promotions by involving themselves in only “high-visibility” projects that will look impressive to upper management or peers (regardless of whether the projects are what that the business needs), all the while neglecting crucial but low-visibility work, and fostering an uncooperative environment.

For C, they may pick the tools and architectures for the software not based on what’s best for the project, but on what will look good on their resume, leading to subpar performance/stability/cost efficiency, and increasing technical debt that needs to be repaid by whoever is left on the team after they inevitably bounce.

But wouldn’t they need to just do their job well if they want to achieve whatever they want out of the employment, regardless of what it is?

Well, for starters, how do you even determine whether they have done a good job, let alone define what a ‘good job’ is? After all, the whole premise of enterprise software development is that 1. this whole thing - the organizations, the system, the software - is complex beyond the grasp of a single person, and that 2. the complexity, and the ability to handle such complexity, is ‘blended’ between individuals.

Is there a ‘magic formula’ you can use, or an all-seeing, all-knowing oracle that tells you exactly what the answer is? Of course not. Businesses are made up of people like you and I, and when it comes to this, how well you did ‘your job’ will be evaluated by a person higher up in the corporate hierarchy.

A person who has their own desires.

A person who has their own bosses, and their evaluations to optimize for.

So even if what the business wants overall and what the business wants at your level is clearly defined, and even if the ‘evaluator’ has perfect information and is able to perfectly judge all of the information at hand, you can’t take it for granted that your evaluation function is actually optimal for the business (and boy, that’s a lot of if’s).

And what does it even mean to optimize for those evaluation functions? How does the worker even prove to their evaluator that they have done a ‘good job’?

Well, if the evaluation function is some metric (the number of commits, the lines of code, the number of hours worked, the number of people you manage, etc), the answer is simple - to game these metrics. So not only will they need to spend time and energy optimizing for these metrics (rather than improving systems), but the metrics themselves will also quickly lose their utility as a yardstick (also known as Goodhart’s Law).

Okay, and what if the evaluation function doesn’t have explicit measures?

Well, the evaluator will need to come up with a way - either on their own, or following some ‘process’ (more on this later) - and use it to determine a person’s performance.

Maybe they determine the evaluation function based on how essential and indispensable the person has been to the team. Maybe they want to see what an impressive job they did, based on how much visible contribution they’ve made or how good they feel about the person’s work. Maybe they look for technical competence, based on how innovative and skillful their work seems to be. Maybe the C-Suite won’t shut up about “AI AI AI” and you like that the person has done something ‘with AI’ that can be passed along to the top.

But wait, all these things are mere proxies for determining how well that person did what the business wanted, which means that the degree to which you meet those proxies is being used to determine whether they’ve done a ‘good job’!

In other words, even without explicit measurements, there is still a yardstick with implicit measurements - measurements that can, and will be gamed (Goodhart’s Law strikes again)!

In fact, it was precisely the efforts to optimize for these implicit measurements that led to the undesirable outcomes for persons A, B, and C earlier!

And all of... this, is assuming good intent from all of the parties involved; without that assumption, the whole org quickly turns into a hornet’s nest, with suck-ups, backstabbing, political favors, and sabotage that would make even Putin blush.

So, as we can see, there’s a two-sided game theory scenario with numerous participants happening at not just at every level of the corporate hierarchy, but also at every organizational level within the business (“our” department, “our” division, “our” team, etc), with both sides optimizing based on what they think their evaluator wants, what they believe their evaluator think their evaluator wants, and so on.

And this affects literally everything within the enterprise software lifecycle - from a project being smothered in its crib because the evaluator couldn’t “see” the progress with something visualizable, development being stalled due to an inter-departmental Mexican Standoff, communications breaking down because no one wanted to be “the bearer of bad news”, to an application rotting from the inside out because no one wanted to fix it (“it’s not my job”).

This is why we can’t have nice things.

Engineering Alignment

So, uhh... how do you fix this?

Well, you can try to treat the symptoms on a ‘case-by-case’ basis. You can create a shiny new UI, involve a superior to break up the deadlock, create channels for communication, fix or rewrite the rotting application, write tests and create CI pipelines to enforce quality, etc.

However, because the underlying disease persists, all of the ‘fixes’ get mean-reverted away, while new problems continuously pop up. People will continue to breathe down your neck, the organizations will go back to their usual behavior, the application will rot again from people taking ‘shortcuts’, and people will still withhold information and continue to cause new problems.

Of course, this is not a surprise. After all, as I often like to say: “You have a ‘people’ problem, not a ‘tech’ problem.”

So you need to go for a cure, to excise the disease once and for all. To do that, you need to align the optimization and evaluation functions at every level.

But these functions are a product of what they and their evaluators value, and you can’t control what these people value. You can’t change their risk tolerance, and you certainly can’t change their life circumstances.

Without changing these personal factors, you can’t change these people’s functions, and without changing these functions, you can’t align the people.

While it would be possible to change the organizational factors in play that influence how these functions get set in the first place (such as the reporting structure, compensation mechanisms, or business-level objectives), you would need to wield a significant amount of authority in order to do so.

And, statistically speaking, most of you don’t have such powers.

So... are we fucked?

Well, to answer that, we need to first look at how we buy milk.

Mooooooooooooooo

In finance, there’s a... “widely discussed” theory called the Efficient Market Hypothesis. In essence, it posits that because the market encapsulates all sorts of information from rational market participants trying to make money off of it, it is perfectly efficient.

Let’s use milk as an example, with people buying and selling milk in one spot (yes, technically there’s no market for trading milk, only milk futures, which is a contract to buy/sell industrial quantities of milk at a fixed price in the future, and when the time to execute the contract comes, you don’t actually buy or sell milk but exchange the equivalent value of cash instead, but the metaphor doesn’t work as well when we’re trading these abstractions over milk electronically on the Chicago Mercantile Exchange’s servers).

According to The Hypothesis, if someone wants to buy milk at a higher price than what the supply and demand would indicate, it should incentivize someone else to rip them off by selling milk at that higher price, increasing supply and bringing prices back down. Similarly, if a buyer knows something that nobody else on the market knows (say, if all the cows in the world have unionized and are planning on establishing the OPEC of milk), they can take advantage of that informational edge by buying up all the milk they can, increasing the price, and making the market reflect that information.

So, with perfect information, perfect rationality, and perfect decisions, if trading and borrowing costs were zero, we should see zero difference between what someone is willing to buy milk for, and what someone is willing to sell it for. And in theory, no one should be able to milk the milk market by making money off of trading milk.

Lol. Lmao, even.

If you are even tangentially aware of, say, the stock market, then you know this is a complete nonsense. This is not really how things work in reality, because we are not rational, not even when money is involved (or, depending on how you see it, especially when money is involved).

Not only that, we don’t have perfect information, and even if we did, we don’t make perfect decisions all the time. Hell, we probably make subpar decisions most of the time.

In fact, there’s actually a whole field of economics that has popped up to study and deal with such irrationalities in our decision-making process (and apologies in advance to readers in places where they sell milk in bags, not cartons; but once again, the hypotheticals just wouldn’t work as well with bags of milk).

For example, when you go buy milk (at the grocery store this time, not the financial market), how do you decide? Maybe you pick the milk on the shelf that’s the closest to your eye level. Maybe you buy the one that’s on sale, even though it’s got way more than what you need. Maybe you buy the one that is almost sold out, instead of the fully stocked one. Maybe you buy the one that has some extra vitamins or whatever (and let’s be honest - you probably don’t need the extra stuff). Maybe you buy the one that stands out amongst others (in terms of colors, packaging, branding, etc).

Even in such a process, there’s a multitude of decisions involved (even "just buy the usual" is still a conscious decision); and we make these decisions not based on logic, but on feelings and subconscious influences, driven by a whole host of cognitive biases and mental shortcuts.

And while behavioral economists don’t actually walk around and watch you buy milk at the grocery store (I’m sure some do), they do study and apply this ‘gap’ between the rational and the irrational that is so pervasive throughout our economic lives.

Back to Your Regularly Scheduled Programming

So, why did I talk about all this milk stuff?

For one, because it’s funny.

But also, if we can’t even buy milk logically as economic actors, how could we possibly be expected to make purely logical decisions as corporate actors?

We think that whatever we do is the ‘best’ course of action towards achieving our goals, when in reality it’s mostly just an unsubstantiated guess. We think we’re on the right track, when in reality we don’t even know how to measure progress. We think that something should be fine, when in reality, the vast majority of us are absolutely horrible at judging and internalizing risk.

It’s not just how we optimize for outcomes; it’s also how we evaluate others, as well.

We think our evaluations are ‘fair’ and ‘objective’, when in reality we don’t even question the emotions that make us feel that way. We think that following an approach (especially if there’s numbers involved) is ‘logical’, when in reality we haven’t even thought about how to tell if one approach is better than another. We think that the conclusion that we’ve arrived to is ‘right’, when in reality we haven’t even thought about how to tell if it’s right.

We ‘think’ we know all these things, when we haven’t even questioned where that confidence came from. We base our decisions not on facts, but on our observations of facts. We judge others’ ideas not by the idea’s actual correctness, but by who is presenting them; relying on traits like tenure, authority, or how well we know them!

Of course, this does not necessarily mean that it’s a bad thing (at least, all of the time) - these ‘gaps’ allow us to function every day without having to actually stop and think about literally everything, while our brains automatically smooth over what we don’t know (to the point where we don’t even realize that there’s a gap) and allow us to go about our lives in the first place despite the lack of information.

But what it does mean is that what we think the optimization and evaluation functions are, is not exactly what they are in reality. And, just like how the behavioral economists use our irrational economic behaviors to design better economic systems, we can use this gap to help shape how people perceive their goals, risks, and evaluations so that their actions better align - even if their underlying functions don’t.

A Case Study

Let’s use an example project to see how we could identify and fix such misalignment problems by utilizing this 'gap'.

Once upon a time, there was a high-volume, high-velocity data source within the ‘jurisdiction’ of a team, whose low-level data could be very valuable across various business functions, depending on how they were delivered or packaged. A ‘sponsor’ wanted to make use of that data, and asked the team to build the project.

The team got to building the data pipeline, but as time marched on, the sponsor lost interest, in large part due to not being able to (physically) see the value of the captured data. To save the project, I was brought in to build a quick proof-of-concept (PoC); and what I snapped up was just enough for him to see and comb through the data with his very own hands.

Now, to start productionizing it, a decision was made to onboard our first user and expand from there, but misaligned priorities soon threatened the broader business goals. That single user's requests (down to the colors of buttons) began to dominate the development process, preventing buildout of core functionality and creating friction with users who were onboarded later.

And with conflicting desires and workflows, the users had a hard time coming to agreement about what to develop and how to shape the services and the interface to better fit their needs. This was further complicated by the fact that everybody wanted their requests to be handled first, and the process was threatened to be derailed by disagreements about who should be prioritized.

Finally, as the data volume grew, and more analytical use cases were thrown in, performance and stability of the pipeline degraded, impacting user workflows. And despite the clear need for fixes, the (highly senior) engineer controlling the pipeline resisted (even when the solutions were shown to be effective); preventing efforts to even measure the problem and shifting blame onto the more visible aspects of the system instead.

Why did this happen? Why couldn’t we prioritize, to build the platform that would meet the business needs? Why couldn’t we build something that worked for everybody and instead of nothing for nobody? Why couldn’t we just fix the latency, performance, and outage issues? Well, we need to look at the optimization and evaluation functions involved here.

To do so, let’s go back to the very beginning. The business had identified a source of data. It knew that it had potential value. It knew that it could be used across various business needs. It knew of at least one generic use case (debugging low-level strategies). What it didn’t know were the specifics for each of those needs, and how the data would be used.

And so, when the first user with a specific use case (and, more importantly, workflow) for the data showed up, he would serve as a demonstration of a real business use case; thus, it made sense to facilitate the project for him so that he could get up and running ASAP.

But consider the optimization function for the team lead. What was he to optimize for?

He needed to optimize for the evaluation function of the ‘sponsor’ (who had already almost axed the project). So, his optimization function involved showing the need for the project to be kept alive. But how would he determine that need, when it was fairly high-level and outside the team’s usual business specialization? By relying on the user’s perceived importance of said need.

With that, the function to optimize for became “maximize the user’s attachment to the project” - down to the development process - leading the project to become pigeonholed, diverting the necessary resources for the development process, and making it just that much more difficult to make the data useful to others as well.

And when the broad business needs and the treasure trove of value was combined with the strictly hierarchical and segmented organizational structure, it led to a struggle for control, with everyone out for themselves. And with a strong “top-down” power dynamics of the company, the team couldn’t risk crossing the chain of command and determining the priorities itself; instead, it could only keep the discourse contained so that no one would ‘look bad’ from the outside.

Finally, because of the engineer’s life circumstances (he had just bought a new house, settling in with his family), and the deeply-engrained culture of not ‘losing face’, his optimal strategy involved resisting anything that might allow the team lead to discern his performance from that of the team (such as a bad database design and a lack of understanding in how that data is physically written, stored, and read in a distributed environment leading to user-facing disruptions).

So, what can we do to align everyone and fix all of this?

Armed with the knowledge of the cognitive gap and the misalignment problem, what we can do is to ‘solve backwards’ from the problem: we know the dynamics and organizational forces that led to these optimization and evaluation functions, but what’s actually in them? What affects how people perceive them? And how can that gap be shaped so that the perceived functions align with each other?

Let’s start with the first user. His optimization function was based on the perceived (vs real) importance of his specific need, but that perception was not based on the overall business value. Why? Because the broader business need was not "his".

But what was “his” and what was his “need”? For the former, it was stuff that he was involved in and important to, and for the latter, while it certainly included the need to address his specific business use case, it also included making sure that what was “his”, remained “his”; that he be able to make decisions around what is “his” (which is why he felt it important that he dictate the colors on the buttons, for example).

So, we need to convey the importance of “his” involvement in the creation of the overall business value, and reinforce his perceived importance of his involvement.

For that, we emphasize his personal role in the broader picture; specifically, the way in which his specific use case is important not just in and of itself, but also in the role it can play in unlocking adjacent and more general versions of the use case that “branch off” from his. This also allows us to focus on building workflows, not features, for which his domain expertise is critical.

Furthermore, we direct his perceived importance of his involvement towards the more productive avenues, such as by having personal catchups before the regularly scheduled meetings to allow him to convey his (personal and workflow-related) needs, and presenting multiple versions of demos based on them during the regular product meetings.

This way, he feels his importance in directing potential workflows, and in ensuring that the product is best able to support the workflows (e.g. which version is best, what works and what doesn’t about the workflows shown on demos, etc), not in making sure that his ideas for implementation specifics don’t get drowned out by the others’ during the regular meetings.

It’s the same process for the customer orgs with conflicting priorities. Their optimization functions were to optimize for their own business needs, but the way they perceived progress on that front was by looking at whether their needs were prioritized.

Of course, it is a reasonable enough of an estimation (especially given the power dynamics of the company, and the need to constantly be on the lookout), but that heuristic clearly missed out on the fact that tussling over this and preventing us from building out the foundational aspects of the application would lead to an overall slower path for them.

So, their perception of the optimization process needs to include the bigger picture, which means we need to boost their perceived priority (even when going with an approach that allows their most common needs to be fulfilled first), and to make sure that they don’t feel sidelined.

How do we boost their perceived priority? To show observable progress towards their own needs at every opportunity, even when the “common approach” is being taken.

Specifically, if we can break down all of the orgs’ needs into specific functionalities and see which efforts can be deduplicated, we can get a ‘tree-like’ path towards everyone’s goals, task by task, where the trunk involves the common work needed to get to the bits specific to their individual requests.

By sticking to the ‘trunk’ of the tree while occasionally ‘branching out’, we can demonstrate that each task is a step towards their priorities; that even as we’re building the foundational features, we’re actually moving towards their goals.

We can further the visual effect by releasing bite-sized releases with high frequency (CI/CD and trunk-based development helps), communicating these changes as they come up, and even regularly scheduling demos, so that they can see the progress towards ‘their’ goal.

We can actually lean into this further by making the structure of the software stack and of the overall product align with the organizational forces at play (as a form of Conway’s Law). In this case, we can adopt a flexible interface and loosely coupled logical boundaries, which helps with the balancing act.

In particular, the loose architecture allows the functionalities that make up a given team’s requirements to be broken down and added, removed, or changed piecewise. This lowers the switching costs, and allows us to juggle the requests more easily, while being able to pivot at a moment’s notice without having to rip anything out (which is particularly important at this stage of development).

In addition, the flexible interface (which comes at the cost of ease of use for general workflows) facilitates the ‘tree-shaped’ development process mentioned above. By having a capable but generic, ‘stock’ product that can be tweaked to accommodate any specific workflow, it allows development efforts at the ‘trunk’ to show up in the users’ own ‘versions’ of the product (and customization efforts mean that the interface has minimal frictions for the users’ specific workflows).

Now, how do we help the customer orgs feel that no one is taking over them in the proverbial pecking order? We can’t control the need for tradeoffs (political forces, fundamental resource limitations, etc), but we can focus on the ‘no one’ part of the previous sentence, by removing any individual or organization as a cause for any particular decision.

For starters, we need to get everyone on the same table and on the same page. Specifically, we need to use this opportunity to show everyone what everybody else wants, and the fundamental limitations that all of this brings - complexity, timeline, budget, etc (you can’t speed up a childbirth by bringing 9 women to labor at the same time). This demonstrates the need for tradeoffs, as it’s due to factors outside of anybody’s control.

We also need to do this on a regular enough of a basis and prevent ‘side channels’ so that no one party feels that someone else might be ‘cutting in line’; this way, everybody feels that they’re getting a fair chance to present their case (and ultimately make tradeoffs) when everyone else is also present, going through the same motions and making their own tradeoffs.

To further depersonalize the tradeoff decisions, we can bring in ‘impersonal’ decision makers - data. Adding some quick product analytics can help reduce the users' collective behaviors into dispassionate numbers, which makes it far easier for them to stomach the decisions (figuring out how to approach analytics, what to track, and what conclusions to draw from the data can be a whole topic on its own; but start off by tracking the who).

And finally, the engineer. His optimization function was to minimize his personal risk, and his ‘risk management strategy’ involved shifting risk away from him and his subgroup, building a wall around his subgroup, and preventing the evaluator from being able to ‘pick out’ his personal performance.

Of course, this missed out on the fact that such a strategy prevented critical fixes, degrading the user experience and raising everyone’s risk (especially given that the ‘sponsor’ had already displayed willingness to axe the whole project on a whim).

So, we need to change the way he perceives risk (to include the bigger picture and team-wide risk), and convince him to go with a different risk management strategy (such that even if he is optimizing for his personal risk, it doesn’t lead to all of these problems).

How do we do that? By demonstrating the vulnerabilities of the castle (if the walls don’t protect you, putting up the walls just becomes a pointless endeavor), providing him valuable deals (if the drawbridge is pulled up, no merchants can come in), and helping him turn the risks into personal opportunities (so that it doesn’t make sense to deny or deflect potential problems anymore).

Seeing is believing, and seeing firsthand the kind of performance and stability issues end-users were facing, along with end-to-end tracing data (originally instrumented as part of an effort to “prove” that the frontend was the problem, ironically) - which showed almost no time was being spent at the service and interface levels - would help convince him the shortcomings of his approach (in practice, it meant getting him to sit down with the users who were experiencing these problems while we were both present).

We can lower the walls further by requesting to attend the subgroup’s meetings, listening, and being part of it, and also by helping him communicate the hard-to-see improvements at the data layer to the end users by tying these changes to some quantifiable improvements of specific workflows. This helps dissolve the tribalistic mindset that was so entrenched within the subgroup, and lowers the drawbridge for information to flow.

This, in turn, allows us to kill two birds with one stone by helping him make decisions based on the information he’s been withholding (which could be used to improve workflows for some business-critical use cases), and present these changes as his personal triumphs and achievements (which helps shift his perspective on risk).

In particular, we can teach him the prerequisite knowledge (database and data structure fundamentals, distributed systems, common problems with high-volume ingest systems, etc), how to make use of the resources at hand (documentation, ingest metrics to watch out for, queries to gather information about the system and typical usage patterns, etc), and basically anything else he needs know in order to come to the conclusion himself (the root causes, the fixes, how to even know if the fix is working, etc).

This incentivizes him to make the necessary improvements, as he can combine his decisions with the information he’s been holding for himself (sharing not even with the other members of the subgroup) and the newfound knowledge of how to measure improvements in order to present them as his success.

This... is a lot.

But hey, I never said that it was easy.

Just like how knowing that the stock (or milk) market is inefficient doesn’t make making money off of it easy, knowing that people’s optimization and evaluation functions are different to how they perceive and act on them doesn’t make the process of alignment any easier.

It just means that it is possible, and as we’ve seen, with much effort and perseverance, we can do it.

In fact, for this project, through these actions and more, we were able to steer the product and the development process in a more productive direction; keep the stakeholders satisfied; improve end-to-end performance and reliability; onboard new data sources and business cases; scale up for the ever-increasing volume and variety of data and queries; find and expand the business use case through users’ experimentation with their workflows and their word-of-mouth recommendations; and become a trusted partner for the users themselves.

It’s been a great demonstration of what is possible, and how you can drive positive, lasting changes at a deeper level with better alignment, leading to drastically improved outcomes for the business.

But not for me.

Because none of this mattered.

Because all that mattered at this company was keeping up appearances; playing politics; rubbing shoulders with the ‘right’ people; and building up fiefdoms.

It Doesn't Have To Be This Way

History books are littered with failures of soviet-style central planning.

Even with all their power and intelligence, even with all their talented people, willing to do whatever it takes, these systems crumbled under their own weight, struggling even to keep the grocery stores stocked with food.

Even the unchecked authority and resources couldn’t help the central planners beat the informational disadvantages and the layers of game theory all the way down. No matter what they tried, they couldn’t get the people to care; or rather, people did care, but about surviving the political apparatus, not the shared prosperity that they were promised.

And unfortunately, we see echoes of their failures in today’s organizations - especially when it comes to enterprise software development - where progress is crippled by misaligned incentives at every turn.

Just like the soviet systems, without the incentive to care about these things, the organizations end up setting an implicit set of incentives - one encoded by the political order. And just like how metrics get gamed, people end up optimizing for these implicit incentives, leading to the continuous sabotage of efforts to achieve organizational-level goals.

This is why culture is top-down, and change is bottom-up.

This is why you need a top-down effort to foster such an environment, where these things matter: an environment with the org-wide incentive to care about the overall value we create; the culture to get ourselves to care, and to question whether we’re on the right track; and the trust necessary not to devolve into a free-for-all.

And with such an environment, the people in your org who do care and think about the alignment (and its implications on the software they develop) will naturally drive bottom-up changes; slowly, but surely, reforming the organization into a state of total alignment.

This is not a foreign concept. This is not a theoretical capability for change. We’ve seen this realized successfully across many engineering orgs, that we now take it for granted that we should follow their playbook: DevOps.

DevOps was a paradigm shift: by involving both the developers and the operations team early in the process, feedback loops shortened, reliability improved, and innovation thrived, allowing us to ship more with less.

And it succeeded because it tore down the Berlin Wall of mistrust between dev and ops, replaced the top-down bureaucratic controls with a culture of collaboration, and incentivized the engineers to optimize for the system as a whole, not their individual silos.

It succeeded because it proved that alignment is not just possible, but necessary for success.

With the right culture, the right leadership, and the right people - who understand alignment and the bigger picture, care about driving meaningful change, and have the conviction to pull it off - your organization can reach the same kind of transformational success.

One where you aren’t constantly fighting fires, and where software, cultural, and organizational rot are a thing of the past.

One where the system self-corrects, problems are resolved before they fester, and people default to doing the right thing.

A place where things just work.