The pandemic accelerated digital
Every company, in every industry has data, needs clean data, and and needs to take action on data quickly and effectively. However, many organizations find themselves struggling with data, primarily because other initiatives that could quickly unlock new revenue streams tended to get prioritized above data infrastructure projects. Data science and analytics initiatives with the promise of unlocking insights, could also make sense being prioritized over pure data infrastructure initiatives.
Implementing a new data strategy with revamped end-to-end data and analytics architecture is no easy feat, no matter how you tackle it (preferably incrementally and iteratively vs Big Bang Approach).
Enter the Modern Data Stack
The ‘Modern Data Stack’ became more prominent over the last few years, with a proliferation of cloud technologies and vendors fueled by high demand and VC funds. While there is nothing necessarily new about what the modern data stack is set out to accomplish (empower your organization with the effective use of data), the ease of use, speed and potential capabilities are new.
Some of the more popular vendors in the modern data stack include-
• Fivetran and Stitch for data ingestion & ETL
• Snowflake, Big Query, and Redshift for Warehousing
• PowerBi, Tableau, and Looker for analytics & BI
• Airflow for orchestration
• DBT for transformation and
• Soda for observability,
to name a few. Databricks, Snowflake, Alteryx, and others fit into more than one category.
Fundamental Data Management Stack
In light of the new vendors, tools, and approaches to data delivery, operations, and management, it is important not to forget the fundamentals. The fundamentals as technologists and as data professionals.
Here are just a few frameworks, processes, and books that cover the foundations that help me from time to time and are worth revisiting.
BRM Business Relationship Management
Every technologist does some form of business relationship management. They work with stakeholders to understand their problems and then deliver solutions. Data professionals are no different. They work with stakeholders to provide the right platforms and data and provide insights to stakeholders based on that data.
Since stakeholders, developers, and analysts are all people, strong relationships need to be cultivated. Without establishing meaningful relationships, you will struggle with finding the root problems that need to be solved. The more meaningful problems you can solve for your stakeholders, the more value you can add. The more effectively you can communicate, set, and meet expectations, the better your relationships will be, the more successful you will be in your role, and the more successful your initiatives will be.
The BRM Institute’s House of BRM framework is an incredibly helpful guide to what it takes to build strong relationships within your organization, to become a trusted technology partner, and advance your organization. The closer technology and business strategy converge, the better.
TOGAF (The Open Group Architecture Framework)
Data truly is an enterprise asset and should be managed as holistically as possible across the enterprise. Your data architecture’s capabilities and its costs need to tie back to enterprise goals and objectives.
TOGAF is an enterprise architecture framework with roots in the late 1980s and has evolved into an extensive approach to enterprise architecture. While TOGAF can feel ‘heavy’, and many feel it is outdated, its pillars and key concepts are essential.
No matter the technology or resources deployed- it is critical to align with and be appropriate for the target state business architecture (i.e., the organizational structure, goals, objectives, business functions, services, processes, roles, and capabilities).
TOGAF’s Architectural Development Method (ADM) is an iterative and cyclical approach that includes setting the vision, defining the current state for each domain (starting with business and then information systems, which includes data architecture), the future target state, and then the gaps, and roadmap. ADM also includes maintaining requirements in an artifact repository (which originally were tons of static diagrams -now most modern data tools can produce automatically).
Big “A” architects hiding away an ivory tower, then publishing static artifacts of the ADM in a big bang approach, without delivering value-adding solutions along the way is a bad idea.
Fortunately, many tools in the modern data stack can be spun up pretty quickly and relatively cheaply (at least in the beginning), are easy to use, and incredibly powerful. So it is an understandable choice to select toolsets that can solve perhaps a local problem, and cycle throughsome of the ADM steps quickly (or skip some entirely).
But data is an enterprise asset, so for solutions that can and should impact the enterprise, it is important to have applicable processes, standards, principles, and policies in place. TOGAF can help as a starting point. Lean IX is also a fantastic approach worth revisiting.
DAMA DMBOK (Data Management Association, Data Management Body of Knowledge)
Even though the DAMA DMBOK was last updated in 2017, and a lot has changed since then, it is still an excellent and relevant framework. Data is an enterprise asset, so data governance, data quality, and data privacy always has been and always will be important.
Especially in a distributed ecosystem, there are a higher quantity and diversity of data producers and data consumers, so managing ownership of the data supply chain can be challenging. Those in a data engineering role, then try to turn application exhaust into reusable, scalable data products. In an attempt to combat this quickly, organizations add new, more powerful data technologies and vendors with the intent to simplify and/or add more capabilities to data engineering. However, this can add even more complexity!
Having a response for each segment in the DAMA wheel, along with enterprise architecture, will set you on the right path forward. Fortunately, many tools in the modern data stack can provide good answers, especially for metadata management, observability, security and access controls. Leverage the DAMA fundaments to identify, understand, and communicate how each tool helps or challenges your efforts to manage your data as an enterprise asset.
Supplement existing frameworks
Many fundamentals of business relationship management, enterprise architecture, and data management are timeless. Aspects of each can be applied no matter the situation or department. However, there are many aspects of more modern, domain driven, and distributed architectures that the legacy frameworks do not address, exactly and should be supplemented with more timely, relevant and technical deep dives.
Below are a few recent books that have helped supplement my understanding of the fundamentals-
Fundamentals of Data Engineering by Reis &Housely is an excellent overview of the stages of data engineering in an easy-to-read format. Very well written and will prepare aspiring data engineers and remind seasoned data engineers how to handle legacy AND modern data technologies.
Data Management at Scale by Strengholt, provides more detail, context and reference architecture around the patterns, principles, and ‘gotchas’ managing the full data supply chain. From RDS architecture, and how to link data to ownership via golden data sets, to interoperability and eventual consistency, CQRS and service contracts, this book bridges the gap from legacy frameworks on data management.
Software Architecture: The Hard Parts by Ford, Richards, Sadalage&Dehghani. While there is only one chapter on managing analytical data, the book provides a fantastic framework for evaluating the trade-offs with different architectural approaches (primarily concerning distributed application architecture). But not only is it a great idea for data engineers to understand and empathize with their upstream friends, but it also provides excellent guidance and structure data engineers can and should apply.
Gavin Hupp’s views and opinions are his own, gathered from technology, data, and product leadership experience across multiple industries, working for some of America’s most well-known brands, startups, collaborating with peers, vendors, and solution providers. He is an official member of Forbes Technology Council and Vation Innovation Council.