The truth about your “source of truth”

Too often when we design our microservices architecture*, we find ourselves wrapped around the axle, trying to define a single source of truth (or SoT) for our domain entities. Common wisdom has it that every domain entity in our enterprise should live in exactly one, centralized location. If you want to fetch an instance of that entity, you go to that location.

* note: I loathe the term “microservices architecture”, but that’s for another discussion

As with much common wisdom, this isn’t necessarily so. It’s a good idea to keep track of where our data resides, sure. But bending over backwards to define a single SoT for every piece of data is the wrong approach. At best, it’s often simply not necessary. Worse, it can cause more problems than it purports to solve. In fact, the very notion runs counter to the event-based systems that power many enterprises today. And as we’ll discuss, a true source of truth for our data is, by and large, mythical.

What’s wrong with a single source of truth?

Before we get into the problems of single SoTs throughout our microservices architecture, let’s revisit why we build microservices in the first place. We’ll start with one of the most fundamental relevant patterns: the Bounded Context. Much has already been written about the pattern. But the general idea is that each business domain should be thought of as its own self-contained system, with judiciously-designed inputs and outputs. And each is owned by a single, cross-functional team. The team builds, deploys, and maintains the services and applications that it needs to get its job done—with minimal dependencies on any other teams.

And this leads us to perhaps the most fundamental benefits of a microservices architecture. While we can rattle off the technical benefits of microservices, the true benefits are organizational. Teams have the control and responsibility over their own projects. They can make as many changes to their code as they want to, with little fear of breaking other teams. They can release as frequently as they need, without requiring coordination with any other team. And they own the entire lifecycle of their code, from design and development, to deployment, to production monitoring.

Enforcing a single source of truth for our domain entities sabotages those benefits.

Instead, single sources of truth re-establish dependencies across teams and across bounded contexts. They force teams to break out of their bounded context to perform work. If the team requires a change to an entity, it is now at the mercy of some other team’s schedule to get the changes made. And consider production issues. Teams will start finding themselves awoken at night, paged because some other teams’ service is misbehaving.

Single sources of truth also equate to single points of failure. Requiring all services to fetch User data from a single location means that our entire enterprise may grind to a halt if the User service goes down. SoTs also introduce performance and scalability bottlenecks. As the number of applications that need to access that User data grows both horizontally and vertically, the load on the User service will effectively grow exponentially.

But perhaps most importantly, adherence to single sources of truth severely hampers our ability to move forward with a workable architecture. This becomes apparent as teams wrap themselves further and further around the axle, arguing about the SoT for such-and-such entity. Have you ever found yourself pulling multiple different teams together to debate the design of your microservice’s API? Trying to meet every team’s requirements? Negotiating compromises when those requirements contradict each other? Yeah, me too. That’s a clear smell that we’ve been doing something wrong.

Why again did we bother with a microservices architecture?

And, what are we to do now?

Relax. And stop worrying about our “source of truth”. Because in all likelihood, the “single source of truth” we’re seeking doesn’t even exist.

The quest for the holy source of truth

At least, it probably doesn’t exist among our microservices. Let’s build a simple inventory management system. Our system will contain a service that tracks physical inventory items, stored in a physical warehouse. Every time a new item is delivered to the warehouse, its barcode is scanned, and the service updated to reflect the new amount. Thus, the inventory item service will be our reliable, single source of truth for our enterprise’s product inventory… right?

Except, how accurate will the database be after warehouse workers walk out with their pockets stuffed full of items at night? Or when a leaky pipe in a warehouse corner causes packages to slowly disintegrate? Or when a palette of new items is simply missed by the scanner?

Of course, many organizations have domains that don’t model their physical entities. So instead, let’s try another example. Say we’ve built a company that aggregates deals on hotel rooms. We partner with hotel chains, ingesting underbooked rooms from the chains, and presenting the best deals to interested would-be travelers. To do this, we have a HotelPartners bounded context that ingests room data from our partner hotels, and store it in a database.

So… is that database a SoT for that data? Not really. We’d sourced that data from its partners’ databases, which in turn are representations of the availability of their physical hotel rooms. Any “source of truth” for this data certainly doesn’t reside within our organization.

The point is, obsessing over the “one true source of truth” for our entities can be a fool’s errand. Often, there isn’t any such thing—at least, not in our enterprise system.

Embrace the fact that “sources of truth” are relative.

Instead of searching for the elusive absolute, single source of truth for anything, we should instead embrace the fact that “sources of truth” are relative.

Recognizing this fact is remarkably liberating. And it frees us up from the burden of trying to ensure that each entity in our domain has exactly one home, to which every service in our architecture needs reliable access.

Think in terms of Canonical Views and Scopes

Instead of a single SoT for our data, we can think in terms of the canonical view of data within a given scope. Within the scope of any system that stores data, there will be a data store that represents the system’s most up-to-date view of that data. That is the system’s canonical view. There may be additional data stores in the system that also provide views that data. Maybe the data is cached for quicker read access, or enhanced with data from another canonical source. But those additional data stores are always subservient to the scope’s canonical data source.

As an analogy, think of materialized views in a database system. The original table(s) from which the materialized views are derived represent the canonical view of the data, within the scope of the database schema.

Organizational and industry scopes

Let’s revisit the hotel room aggregator from a few paragraphs back. As engineers in this semi-fictitious organization, we fetch information about hotel rooms from our third party partner sources. The data enters our system, is transformed into our domain entities, and stored, all within our HotelPartners bounded context. Other product-focused bounded contexts in our organization then use that data for their own purposes.

Figure 1 – Our organization’s scope

So in the scope of our organization, this HotelPartners bounded context contains the canonical source of hotel room data.

New let’s zoom out and look at the industry as a whole. We’ve sourced this data from hotel chains. So in that sense, those chains’ databases become the canonical source for the data in the scope of the industry as a whole.

Figure 2 – Zoomed out to the industry’s scope

We can also zoom in and look at our organization’s specific bounded contexts. Much like our entire organization sources data externally and stores its own canonical representation, so can our bounded contexts. Specifically, they can source the data from the HotelPartners bounded context and store their own local copy.

Figure 3 – Zoomed in, to demonstrate that each vertical in our organization represents a scope, each with its own canonical source

Our product teams now have their own local canonical source of hotel room data. They are each free to enhance the data as they need. For example, the Search bounded context might also source data from a Reviews bounded context, in order to mix customer reviews with hotel information.

They can also store it in whatever format they see fit. They are not reliant on the HotelPartners team to make changes for them. Nor are they reliant on that team to maintain a particular SLA for their RPC services. The product teams are also free to create other secondary data stores for the data. For example, the Search team might set up a secondary data store, say an ElasticSearch index, to support searching data across various axes. This secondary data store would still be sourced from Search‘s canonical data store.

Meanwhile, the HotelPartners team is not burdened with creating and maintaining a “one-size-fits-all” data model, in which they try to make every product team happy in terms of the data fields that are stored.

Events shall set you free

If you’re not familiar with event-based systems, you might be wondering how a product-oriented bounded context (e.g. Search) is supposed to derive its data from its enclosing scope’s canonical source (i.e. Hotel Partners). Wouldn’t it still need to make calls into Hotel Partners‘ API services?

As it turns out, it doesn’t. Instead, the canonical source publishes its changes as events. Generally, we use an event log like Kafka for this purpose, but the details don’t matter here. What does matter is that the product-oriented bounded contexts are able to subscribe to those events, ingest them, and store the results in their own data stores.

Figures 1 through 3, then, are a bit too simplistic in depicting how the Booking and Search bounded contexts derive their data. Figure 4 provides a more accurate look.

Event passing
Figure 4 – Bounded contexts using events to derive data from our organization’s canonical source(s)

The Hotel Partners bounded context ingests data from the external partners and saves it into its Rooms database. Once it saves the data, it creates events that describe the saved data, and publishes those events to an event log. The Booking and Search bounded contexts subscribe to that event log and, as the events come in, consume them and populate their own data stores.

There are a few other items of note. First, the Search bounded context uses the same event-based mechanism to propagate changes from its own canonical source (titled “Room Details” in the diagram) to its secondary search index. Also note that the Search team is subscribing to another event log, titled “Event Log: Reviews” in the diagram, in order to ingest hotel reviews. Although not depicted in the diagram, we can assume that hotel reviews are published to the event log by a different bounded context (perhaps that bounded context allows users to enter reviews, or it ingests reviews from third-party sources).

We commonly refer to such systems as event-based systems. They are distinct from the more traditional request-response systems that are powered by synchronous API calls, and that tend to drive the desire for single sources of truth. Event-based systems also inherently imply eventual consistency. What this means is that at any given point, data across our organization may be in different states. Absent any new changes, the state will converge. Since change is generally a constant, this means that for some periods of time (generally measured in milliseconds) a given entity might be more up to date in one context (say, Hotel Partners) than it is in another context (e.g. Search).

The good thing is that in our well-designed systems, with proper bounded contexts, eventual consistency is perfectly acceptable. Back to our example, the Hotel Partners bounded context will first ingest data. The data then flows—at roughly but not exactly the same time—to both the Booking and Search bounded contexts. At this moment, a given entity may be out of sync between each bounded context. However, since each bounded context represents a separate business domain—with its own applications and functionality—brief inconsistencies become mostly unnoticeable.

If you must obsess, then obsess about your data’s Originating Source

Whenever we attempt to define a “single source of truth”—that is, a single location from which to internally fetch an entity—I suspect that our brains are subconsciously seeking a single “originating source”—that is, a single location from which to externally ingest an entity.

In other words, any given entity should enter our organization via a single location. It follows naturally that the data will flow through our organization from that entry point. Moreover, it will flow in a single direction.

Let’s take Figure 3 from the previous section (with the understanding that we’re using an event log like Kafka to propagate data). There, we see that room availability data is ingested into our organization from external sources by the Hotel Partners bounded context (perhaps via batch file ingestion). The data is distributed to our org’s various bounded contexts (e.g. Search, Booking, etc).

Now let’s say that Search has added a new web application, allowing registered individuals to add room inventory into our system. Suddenly, we have two locations in our system in which we ingest the same data.

Figure 5 – It gets confusing when the same data comes from multiple sources

Why is this a problem? The complexity of managing the flow of data in our system has dramatically increased. If we have data coming in through both Hotel Partners and Search, both bounded contexts will need to publish their incoming data as messages. And of course, both will need to consume each others’ messages and make appropriate changes. For example, Hotel Partners will need to consume messages from Search, and update its database. Should it then publish that change as a message, which Search would subsequently consume? If we’re not careful, we’ll create an infinite loop of messages. What about Booking, which now needs to consume messages from both Hotel Partners and Search? Is it now responsible for sussing out from which service the data originated?

Next, consider conflict resolution. If someone uses Search’s web application to push data that conflicts with other data that we’ve ingested from industry sources, who decides how to resolve those conflicts?

Similar to conflict resolution, we have the issue of deduping. If we receive data from multiple sources, odds are that we’ll routinely ingest duplicate data. If this data enters our system in multiple places, where would this deduping take place? We’ll discuss deduping a bit more in the next section.

So we should be cognizant of where our data originates. And if at all possible, we should limit any given entity to a single originating source. If we truly need to allow users to add room availability data to our system, we should allow that only within the same scope in which we ingest bulk purchase data (that is, the Hotel Partners scope). That way, our data flows in a single direction, and propagation becomes much easier to reason about and much less error prone.

Figure 6 – Much better!

And now for some FAQs

As I discuss the concept of relative SoTs and context-based canonical sources of data, a few questions tend to arise. Let’s discuss them here.

What about Master Data Management?

I’ve been asked whether this idea of canonical data sources runs counter to the industry practice of Master Data Management, or MDM. Put simply, the use of MDM helps organizations ensure that they do not have duplicate representations of the same piece of information floating around various internal groups. Importantly here, MDM implies that an organization must have a single canonical source of every entity in the company’s business domain.

Despite first appearances, MDM doesn’t run counter to the idea of relative canonical sources. As discussed above, we will still have a canonical source of its data, within the scope the organization as a whole. Here, for example, we would dedupe records and assign them a unique ID. In turn, the records stored within the other bounded contexts can (and should) retain the entities’ canonical IDs.

Meanwhile, the MDM data store can and should remain relatively lightweight. In fact, it’s perfectly feasible for such data stores to house little beyond entity IDs, and perhaps some other basic identifying information.

What about single versions of truth?

Sometimes we find ourselves with different applications that perform their calculations on their own view of common data. In such cases, wouldn’t we be leaving open the possibility of differences between the calculation algorithms? We might wind up displaying different results for the same data across different applications.

This is related to the concept of a single version of truth (SVoT). This concept states, roughly, that if multiple systems have their own view of the same data, there must be one agreed-upon interpretation of the data. Often SVoTs are referenced in the context of business analytics and decision-making, but is also applicable when discussing distributed systems as we have been here.

The truth is that while we don’t often need to worry about creating a single source of truth for our data, we sometimes need to define a single SoT for our algorithms.

For example, our hotel-room-aggregation organization might provide ranked recommendations for would-be travelers. If the recommendations can appear in multiple places, then we’d want to show the same recommendations for a given user, no matter where they appear, or from where the raw data was sourced. From that standpoint, while the data can live in multiple locations, we need to ensure that a single algorithm is used to perform the calculations.

How do we ensure a single SoT for a given algorithm or calculation? We have a few options.

Distribute it in a library

Our first thought might be to write the algorithm, package it up into a library, and make it available for import by all of the services that need it. That way, all of the services have a consistent, local mechanism with which to calculate their local data.

Calculation library
Figure 7 – Here, each bounded context that needs to perform a particular calculation has imported a common library

While this approach may sound simple and appealing, reliance on libraries in a microservices architecture has some major drawbacks:

  • Coordinating changes is difficult. Let’s say the algorithm changes. How do we distribute that change to all of the dependent services? We need to build and deploy a new version of the library. Then, we need—in a coordinated fashion—to redeploy all of those services. This breaks a fundamental tenet of microservices: independently-deployable services.
  • We’ve tied ourselves to a single language. We may have decided to allow our teams to use different programming languages or platforms in our enterprise. However, we can only write a single library in one language. If we repeat the algorithm in different languages, then we no longer have a single SoT for that algorithm.

Deploy a microservice to perform the calculation on the fly

We can deploy a microservice to perform the calculation. In this case, we’d require applications from various bounded contexts to synchronously call in to this service to get the calculated data.

Of course, with this approach, the raw data might as well reside solely within a data store managed by this microservice. So we could write this service as a consumer of the messages produced by Hotel Partners.

Calculation Service
Figure 8 – Here, we’ve adopted the typical approach of requiring each bounded context to synchronously call a microservice that will perform the calculation for them.

While this approach may be preferable to a library, it gets us back to the original issue that we’ve looked at in this article. We now have enforced a single source-of-truth for our calculated recommendations.

Deploy a microservice to publish the results of the calculation

So then, why can’t our new microservice simply perform calculations as hotel room availability changes, and publish those calculations for different bounded contexts to consume?

That is, after all, how event-based systems work.

With this approach, our new microservice becomes a consumer of the data produced by Hotel Partners, just as our other bounded contexts’ microservices are.

Calculation Events
Figure 9 – Here, we’ve adopted the same event-based pattern that we’d discussed earlier. Our Calc service consumes data, performs calculations on it, and publishes the results for other bounded contexts to consume.

Are we saying that a single source of truth is always a bad idea?

We’ve seen that the enforcement of a single source of truth is often unnecessary, detrimental, and in many cases, a fallacy that we’re trying to make reality. Still, we may encounter some data for which we want to enforce a single SoT. As we discussed above, eventual consistency may not be acceptable for certain data. We may have services that are infrequently accessed, are not in any critical paths, and have simple, infrequently-changing APIs.

Authentication is a common case. Typically, we want a single location to manage a user’s login credentials, roles, permissions, and tokens. Generally these activities happens (relatively) infrequently, but we want to be sure any updates are reflected immediately. We therefore might opt to define a single source of truth for our authentication data. Even in this case, however, we would want to be sure that we keep the authentication model as light as possible. User details such as name and contact information, for example, would go elsewhere.

But this is not a universal, golden rule to which we must adhere. Despite common thinking, single sources of truth for our data are not an automatic requirement. Moreover, they are often a hindrance in terms of both productivity as well as application performance and scalability.

Designing our systems instead with canonical sources and scopes in mind will help us avoid bottlenecks, allow us to design more flexible, scalable event-based systems, and allow our teams to focus on getting stuff done.

The Role of a Designer in an Engineering Organization What can HGTV teach engineers about engineering teams?

Awhile back, I had an interesting conversation with a product designer. She was, at the time, relatively new to our company and wanted get my opinion on a few things. We met for coffee, and talked a bit about our thoughts on the company in general. Then she posed a question to me: “What do you think the role of a designer is in an engineering organization like this?”

I paused for a moment, trying to come up with a useful answer. I’ve partnered with a number of designers throughout my career. And while there have certainly been similarities between these working relationships, each one has been fairly distinct. Apparently I was taking awhile to mull over the question, so she prompted me again:

“I talked to another engineer, who used the metaphor of building a house. He described engineers as being the architects who design the house, and the construction team that builds the house. So then, design’s role is to paint the house afterwards.”

“No,” I immediately told her, “that’s not right at all.”

The look of relief on her face told me that I’d given her the response she was looking for. Still, I wanted to answer her original question. And, not being one to mix metaphors, I wanted to stick with the home-construction theme.

Maybe TV can teach us something

So of course I turned to Love It or List It, a television show that airs on HGTV. Love It or List It has a simple premise: each episode features two (usually married) homeowners. Both are dissatisfied with their current home, but they disagree on how to solve the problem. One invariably wants to sell their home and move into another; the other insists on remodeling their current home. And thus, a sort of competition arises between the two “stars” of the show: real estate agent David, and designer Hillary. David tries to find the couple a brand new home to move into (and thus, “list” their current home), while Hillary remodels the couple’s existing home–or as much of it as she has budget for–to try to convince the couple to stay (aka “love it”).

It is Hillary that we are, of course, interested in here. From the outset, she works with the homeowners to identify their current situation: their problems with the current home, the requirements each would have in order to stay, their budget, etc. From there, she begins to generate ideas on how to renovate the home.

Importantly, she always collaborates closely with contractor Eric. While one might expect a contractor to simply carry out the designer’s vision, that’s not how Hillary and Eric typically interact. Instead, they work together to refine the designs and come up with the ultimate plan. For example, Hillary might float the idea to build a laundry room in a certain unused corner, only to told by Eric that the house’s existing plumbing would make that unfeasible… but hey! If we removed this other wall, then that would open up the perfect spot for a washer and dryer. And so on.

Once the designs are finalized and approved, we see glimpses of Eric and his team performing the actual demolition and construction. But collaboration between the two rarely ends there, as unexpected problems are encountered and they work together to resolve them. The end of every episode results in a stunning redesign, one of the homeowners invariably exclaiming “I can’t believe this is the same house!” and–at least so it seems to me–the homeowners typically electing to once again love their home.

Design is more than making things pretty

Painting a house
Not really the role of a designer

Or course, Love It or List It isn’t perfect metaphor for engineering organizations. For example, we might assert that Hillary’s role encompasses product management as well as design. But it’s a far cry better than the notion that designers are there to simply make a nearly-finished product look pretty.

Let’s take a step back and examine why I chose the Love It or List It metaphor over the slap-a-coat-of-paint-on-it metaphor:

  • First, designers don’t wait for engineers–or any other functional role–to decide what to build and how to build it. Instead, they help to drive the product definition before any line of code is written.
  • This means, of course, that designers don’t just come in at the end of a project. Instead, they collaborate with the entire team from the outset of the project.
  • Finally, design is about far more than just making things look good. While creating visual appeal is generally a big part of design (and for sure, Hillary’s remodels look stunning!) designers are there to contribute their understanding of user behaviors, and to help the product team to push the boundaries of how to solve their needs.

The designer and I riffed on this for a bit. By the end of our coffee break, we were both pretty happy with this metaphor. Weeks later, she and I soon found ourselves on the same team. We pushed this sort of cross-functional collaboration from the beginning. Our entire team–design, front/back-end and mobile engineering, product management, research, QA–would routinely meet to iterate over mockups, building on each others’ ideas until we’d found the best possible solution.

The end result? Solutions that were innovative, feasible, and I daresay in many cases, stunning. And for what it’s worth, team members who chose to love their jobs, rather than shopping for new ones.