In two previous articles we have focused on considerations for data-driven design; and specifically, around how business data represents latent business value. The key message has been that a structured data architecture is the technical foundation that enables extracting that inherent value. However, those articles focused on introducing the relevant topics at the conceptual level. Today, I would like to demonstrate the concepts in the context of a more tangible and relatable example; a story about how one might think about architecting an application consistent with a data-first mindset.
Before jumping straight to the story, though, let’s recap why data is more important today than it has been in the past. Collecting and storing data has historically been done by many businesses, primarily for its own sake, for reasons of governance, such as auditing and compliance. As such, data collection and storage has historically been viewed as a sort of “tax” on business operations, with little perceived direct operational business value.
What has changed now is the fuller appreciation that the collected data can be mined to optimize business processes and improve customer experience. For example, according to a recent survey of digital retail businesses, these two goals—upgrading the business processes and the customer experience—were the primary drivers of digital transformation for 57% of all surveyed businesses. The critical observation, the essence of “why it matters,” is that data-driven workflows impact the business in both externally-facing domains (such as for customer experience) and in the internally-facing domains, for core business processes. This is why a thoughtful and deliberate data strategy is fundamental to enabling the quality and cost-effectiveness of the most important business workflows. Further, when the workflows are instrumented to transmit their observed data exhaust to a data collection and analysis infrastructure, the workflows themselves can be continuously analyzed and improved, resulting in constantly adaptive and optimized business workflows.
As a side note, these same businesses’ most serious anxiety around digital transformation was ensuring the cybersecurity of these same digital processes—which, as it turns out, is another area where this same data telemetry and analysis approach has a key role to play—though I will save that for another article.
Moving on to our thought experiment, I have chosen a story that many of us can probably relate to in today’s Coronavirus-adapted lifestyle—an application that provides an online service for restaurant food ordering and delivery. The meals are ordered online from a customer-specified restaurant, and the user can choose to have the order be picked up by the customer directly, or to have the service perform the delivery as well.
In this story, we will play the role of the Application Owner. In that role, we need to address many different concerns, which we will divide into two buckets—first, required operational activities, and, second, forward-looking strategic concerns.
The first set—required operational activities—include concerns such as:
The second set of concerns are less day-to-day operational, but no less important. These issues—if thought about up-front—will enable the business to be agile, adaptive, and continuously improving. Examples of these sorts of concerns are:
This is, of course, a subset of the full richness of concerns we would have, but even this smaller set suffices to enable a good discussion that highlights the importance of structured data architecture in support of an extensible data processing pipeline.
In our imagined role of an Application Owner, as we consider our overall data strategy, we can start by enumerating our business workflows, identifying the data processing needs of each. An example is the workflow that locates nearby open restaurants, and then presents menu selections and item prices for each—it would need to filter restaurants by location and business hours, and then look up menu selections for a specifically selected restaurant, perhaps also filtered by the availability of delivery drivers. And we could do this for each workflow—payment processing, matching drivers with deliveries, and so on.
Or, equally reasonably, we might instead start our considerations with the basic data “atoms”—the data building blocks that are needed. We would identify and enumerate the important data atoms, paying particular attention to having uniform representation and consistent semantics of those data atoms along with any metadata vocabulary needed in support of our business workflows. Examples of data atoms in our sample application would include: location data for restaurants, customers, and drivers; food items needed for menus and invoices; time, used for filtering and tracking quality of delivery; and payment information, associated with customer and driver payments workflows.
Which of these two potential jumping off points—the workflows’ data processing pipeline, or, alternately, the data “atoms”—to use to for our data strategy is a chicken-or-egg question. Both perspectives are useful, and more importantly, are interdependent. We cannot reason about the data processing pipeline without thinking about the underlying data atoms, nor can we develop the data architecture without considering the needs of the processing pipeline. That said, however—in general, I would recommend an approach where one makes a first pass across the workflows to enumerate the data atoms, but then approaches the structured design of the data architecture before doing the detailed design of the data pipeline. This is because the workflows are more dynamic; workflows get added and modified as the business evolves, while the underlying data has more history and inertia—and therefore the data architecture benefits more from forethought.
Going back to our example, let us assume that, as an Application Owner, we have a fairly developed view of the key business-critical workflows and the data atoms that are needed to support them. Earlier in this discussion, we had identified a few of the foundational data elements needed for our workflows: location, time, food items, and payment information. And, to recap from earlier articles, the data architecture should enable 3 key objectives: uniformity of syntax, consistency of semantics, and a metadata vocabulary for reasoning about and governing the data. So now, we can apply these principles to discuss data architecture considerations for the specific data atoms enumerated in today’s example.
Zooming in on the location data atom, consider each of the 3 key data architecture objectives.
We have only talked about the requirements for location and could now walk through a similar exercise for each of the other data fields. However, rather than enumerating all the areas of concerns for each of the data atoms, I will instead highlight a few notable observations:
While this example has still only skimmed the surface, it has highlighted many of the concerns associated with real-world scenarios. In the real world, however, the process of thinking through the data strategy would not end here but would be iterative and ongoing. As the data elements are fed into the data processing pipelines that embody business workflows, iterative adjustments and enhancements would be made to the data architecture. As existing workflows are enhanced and new workflows added, we would discover additional data architecture requirements for existing data atoms along with new data atoms.
Although this example has been streamlined for brevity, it still demonstrates the key principles around mapping workflows to their data architecture implications. The process always begins by considering the business needs—the customer experience and the business processes required to meet those needs. The business processes, in turn, define the elements of a business vocabulary. At the next level of specificity, the processes map to a data pipeline, which leverages a data vocabulary.
The key principle is that the robust data architecture—one that defines syntax, semantics, and annotations—builds a foundation that allows the data pipeline to be efficiently enhanced and new workflows to be readily added. At the business layer abstraction, this means that existing business processes can easily be modified, and new or emerging business processes can be quickly brought online. Conversely, failure to make considered decisions in any of these areas—data vocabulary, data architecture framework, or linking these back to the business needs—will ultimately lead to a brittle and fragile system, one that will not be agile to new business requirements.