Principles of a well designed data architecture
Data is a valuable resource, and all of the disciplines that go into managing data as a resource make up Data Management. This includes the practice of adopting principles, rules, strategies, and methodologies that ensure optimal utilization of an organization’s data. Often this includes data capture and ingestion, data storage, data security, quality management, availability and more. All of this goes into delivering the right data to the right consumers at the right time.
Most organizations have a data strategy that ends in a multi-cloud environment hosting a fancy data lake. This is not a strategy. Components of data strategy include data integration, data quality, how metadata is managed, data modeling, organizational roles and responsibilities, performance and measurement, security and privacy, database selection (storage), business intelligence tools, and, ultimately, the business value of data and return on investment.
There are five core areas where most organizations struggle in the development of data architecture principles to help build a modern architecture that successfully meets the data management and analytics needs of today’s complex high-stakes business.
1. Multi-cloud environments are the norm
In a previous article, we discussed the popularity of multi-cloud environments (Understanding the benefits and limitations of multi-cloud). Multiple clouds allow an organization the ability to select the cloud provider that best aligns with both an organization’s IT strategy and, more importantly, business goals. This tailors the approach to the organization’s needs.
The reasons organizations are increasingly deploying and implementing cloud services in an effort to trade capital expense for variable operational expense, range from massive economies of scale, to increasing speed and agility, and accelerating innovation across their organizations. The implications for data management are significant.
Multi-cloud is great for forward thinking organizations wanting to work with best-of-breed cloud applications and services, and those who consider innovation critical to their business success. However, many find that deploying services across multiple cloud platforms means that their approach to data management and governance becomes significantly more complex, as is security.
Cybersecurity provides significant challenges for organizations who maintain their data in on-premises data centers. The complexity in housing critical business data in multiple data centers and cloud environments makes protecting data significantly more challenging. Thus, it is critical to have a security by design approach to data management.
2. Data governance is not compliance
Data governance should be a consideration in the architecting of any modern data architecture. Unfortunately, what has been called the first principle of good customer data management, often comes after the fact. Data governance dictates what data you will collect and how it will be collected. Data governance is all about the quality and reliability of the data. And this is established through the rules, policies, and procedures that ensure data accuracy, reliability, compliance, security, and a centralized, single source for enterprise data (a singular version of the truth).
Data compliance, while similar to data governance, in that it too has processes that include processes to ensure data protection, security, storage, and other activities; serves to establish policies, procedures and protocols to ensure that data is safeguarded from unauthorized access, malware, and other cybersecurity threats. Additionally, data compliance ensures that your use data follow the laws, such as HIPAA, CCPA in the U.S., or GDPR for Europe. There are other standards bodies that complicate this further, for example, ISO, or NIST. Internal rules to an organization also provide compliance guidelines that must be followed that make staying in compliance complex in practice.
A solid data governance program requires a set of policies and processes be defined in advance, followed in action, and auditable in retrospect. These include not only data architecture best practices, but should also include business practices and controls.
All that said, it is possible to be compliant with regard to data without necessarily being good data governance in place - while unlikely, you may find that you pass by sheer luck rather than good judgment. Or you explain away missed controls, kicking them down the road to the next audit. But that is not really a sustainable approach. Data governance and compliance go together, with governance as the more fundamental data architecture component.
3. Storage as a commodity
One of the key considerations to architecting performant systems and business continuity strategies, includes storage format, backup strategies, data replication, and disaster recovery. Today, with the cloud comes inexpensive storage, and even cheaper near line storage.
It was becoming a commodity not so long ago, and it is there today. Consider the laptops that are available today from Apple. Most MacBook Airs, the thin, light laptop comes standard with 2 TB of storage; an 8 TB MacBook Pro is available for a few hundred dollars more! That is a lot of storage. Safe to say that storage whether in the cloud, on-premises, or on your desktop is now a commodity.
The result of the reduced cost of storage allows us to rethink how we architect data solutions. Archiving of supplementary data sets, for example, was not just an IT storage concern but a legal consideration - laws pertaining to data retention drove the conversation and requirements. Likewise, we can now take advantage of several benefits as a result of the commoditization of storage. These include the decoupling of data (storage) from the server (compute) infrastructure that allows us to provision storage as needed to perform transformation calculations or processes. As well as to provide all consumers with a single view of the data (single source of truth) and provide better end-user clarity by removing data redundancy (reduce table sprawl).
The truth is, data architecture best practices still consider data storage despite the low cost. We still want to store data close to where the data is to be processed because we still care about developing performant systems. Storage considerations span the cloud and on-premises. And though we want all of the data in raw form in our data lake, it still may make sense to offload large volumes of historical data; we need to be smart about data regardless of the cost as it will take longer to recover large volume sets should a disaster occur.
4. Analytics as a catalyst to digital strategy
Data and analytics both are critical to the modern data architecture as they can improve decision making across a spectrum of decisions. Additionally, analytics can unearth questions and innovative solutions to questions and opportunities business leaders had not yet considered. Business Intelligence (BI) systems leveraging data in today’s Machine Learning pipelines are a catalyst for digital strategy and transformation, enabling faster, more accurate, and more relevant decisions in complex high-stakes business contexts.
From an architecture perspective, it has always been generally more effective to deploy BI tools close to the data source. Part of the reason is to reduce latency through the network. Another approach to reducing computational effort is to reduce the data set to only the required elements. One of the first things BI tools will do is to reduce the data to only the fields we require.
Most BI tools moved to the cloud more slowly than other cloud services, primarily because we stored and transformed the data on-premises until recently, and it made sense to host the BI tools on-premises as well. Performance is always better when there is less data movement, also the architecture is less complex when all of the components are either all in the cloud or all on-premises. The reasoning is simple, fewer environments to manage, and data governance is made easier with fewer user access controls required.
5. Data not analyzed is a wasted asset
You have implemented a data lake, and you have petabytes of structured and unstructured data. Data is a business asset; we hear this from all of the analysts. We hear from Gartner that “data as an asset on your balance sheet has to happen”. But if the data is not analyzed, then it is a different sort of asset on your balance sheet, one that is a drain. Value is realized when we start to use the data in meaningful ways and especially in new ways.
Monetizing data is not a simple thing for most organizations. It is important to have your data house in order before considering monetizing your high-value consumer data. BI tools can help to unlock new business insights and thus new value that can lead to monetization.
BI can be used to help executives plan, create business strategies, and track performance metrics. It is absolutely worth modernizing your data architecture to support the analytics process.
Consider what you might do to streamline your data and analytics pipeline. Should you keep everything on-premises or are there advantages to moving your data to the cloud? What are the data governance and compliance implications associated with storing your data in a state like California or in a different country? There are many factors that might play into the architectural decisions that work best for your organization.
Join the conversation - leave your thoughts at the comments section below.
Comments