This article is a much modified and expanded version of a topic I published for First San Francisco partners. https://www.firstsanfranciscopartners.com/blog/data-debt-data-management-metric/ Since that article I have been asked numerous times to expand on the concept.
The pursuit of a “business case” for data governance or data management has been elusive. Leadership cannot seem to connect data use and management to cash flow. (Which is not that hard, but that is another topic) Data people (architects, analysts, scientists, et al) have, so far, failed how to convey to leadership that monetizing data means embracing the disciplines of data management and governance. What has been needed for a long time is a better “lingua franca.”
That is why people lean in when you talk to them about data debt. It resonates with anyone who is thinking more or differently about data than they used to. If you talk about data debt in the right context, it becomes a slam dunk motivator to get serious about data management and governance.
“Data debt” is a term based on the concept of “technology debt,” which comes from the Agile software development world. Technology debt refers to the cost of deferring a software feature, or choosing an easy and/or quick solution instead of a more thoughtful one that would take longer (or be more difficult) to achieve. The same concept should be applied to decisions around creating, managing, and using data. Rather than base decisions where data is involved around traditional process-oriented rules, like functionality and deadlines, there is a consideration of the future impact of current decisions on the cost and risks of using and managing data.
“Data debt” can be discussed and addressed in two contexts. First, it is a message. It serves as a metaphor for business and IT to communicate and prioritize decision around data-intensive efforts. Basically, it is a “pay me now” or “pay me later” conversation without real numbers. As a message, data debt can be a concise, relevant metaphor for data governance and business and IT areas to communicate and prioritize decisions around data-intensive efforts. “Should we fund ‘Project ABC’ or not … and why?” “Are we going to increase our data debt by deferring a decision about ‘XYZ’ and having to pay for it later to make this data more leverageable?”
Here’s an example: During a planning meeting for a large application that is getting some much needed modifications, it’s clear that the current design calls for a new item master file (i.e., a record that lists key information about an item). The entire team knows this will be a duplicate item master. It will share similar data with another important business application. There is also a broad agreement there will be issues with synchronization of the various item files. And these will lead to errors in reports and data analysis.
In the best-case (using data-debt) scenario, the team has a process to elevate concerns to ensure someone knows about this duplication of master data. If they are lucky, the program manager will authorize resources to address this data issue. Or, there will be an agreement to address that accumulated debt in the future. Sadly, the typical reaction is often “The deadline is important, so we will try and fix it later.” Regardless of actually saying you will acknowledge the debt or not – your organization has signed onto higher costs of ownership for item data and incurred possible risks. In this scenario, you have signed the promissory note.
But if there was a formal data debt policy — or even if it was discussed conceptually — then there would likely be no issue. The conversation would go something like this: “Does it increase data debt?” “Yes.” “Then we will not do it or figure out how to pay for it.”
To be clear, this is NOT a conversation for a data person to declare that leadership is inept. Your leaders are pretty smart, but this is new territory for most leadership. That line of “water-cooler” talking among data-types needs to stop. You, the data person, however, needs to provide clarity and education. The dialog needs to go like this:
Business leader – “We need to understand our operational position better. It takes too long for us to gather the data to make an informed decision.”
Data delivery – “We need to add operational data to the data lake, then you will get your earlier data. It will take us 3 months to curate, clean then load the data”
Business leader – “I can’t wait 3 months – throw the data into a database so I can get these reports out. My own people will deal with any issues”
Data delivery – “I understand, but if we take that course we need to quantify the risk and either acknowledge the risks or figure out how to reduce the risk over time.
Most of us know that this situation usually ends up with the data thrown into the ad hoc database. Some pressure is put on the boss, the boss tells you that service levels are more important, and you build a data base knowingly riddled with inaccuracies and inconsistencies. Sometimes the business area manages to get the useful data. But eventually, some agent exposes the costs to mitigate the risk that has been created. A data-debt discussion gets to the heart of this issue. It gets management to consider the risks, and perhaps adjust their priorities.
The second context is data debt as a metric. Right now, this means an unofficial measurement depicting what an organization “borrows” when it chooses to not pay for something that is (or will be) needed. The debt could typically be avoided by funding and executing basic data governance and management activities.
Consider this example: Data scientists are starting to use a data item in transaction data that has not been used before. It isn’t recorded in a corporate glossary, and no one can nail down a single view of the business meaning of the new data item. Each time someone wants to use that data, they’re spending time (and time means money) in looking it up — and their labor is a cost. Since you know you will incur cost in the future every time, you’ve created a debt. The data debt metric is the anticipated amount of time that will be wasted multiplied by a rate for the cost of that time. If you’d have taken the time to document the new field or table, you’d save your organization countless, wasted hours.
Data management people have known for years that enormous costs are incurred the longer you delay even the simplest and most basic levels of data management. Data debt now provides an actual number and rationale for that discussion.
In our example conversation above, the difference between doing it correctly (the cost of the 3 month effort) and the cost of doing the task hastily, represents the debt. If, based on calculating using internal labor rates the difference is $40,000, then the data debt is $40,000. The metric usage starts when you sit down and plan next year’s work. You need to formally decide how to reduce that $40,00 obligation to the future. Even if this notional metric is not an official KPI, it crates good communications. E.g. Is your organization “putting off” data management because “there isn’t enough time” or “it will slow down development?” This notional metric will support an intelligent conversation of where to spend time, as opposed to the current conversations from applications areas that state all forms of data management are interference. Do you have issues with sustainability of initiatives like data quality or data governance? Again, putting some numbers on the table can focus the discussion. Lastly, are you having problems getting data quality to improve? Data debt takes the costs of data quality and puts it in a perspective that leadership can’t really ignore.
Data Debt in Action
As you can see with these examples, data debt is incurred when data is managed informally or casually. But bear in mind we may need to consciously accrue and acknowledge the debt. Even then when we use data debt in an organization — either as a metric or message — it can be quite effective.
Consider these applications for using data debt in your organization:
- As a governor for analytics projects
- As a way to value data assets (or liabilities)
- AS a means to sustain enterprise information management initiatives
Here’s an example of data debt in action using simple numbers:
Assume (for discussion) your IT budget is $100. You track any data debt you’re racking up. At some point in the future, you discover your average spend on dealing with data debt items is $10 of your $100 budget (or 10% of your IT spend). Now extrapolate this to a large company that spends a $1 billion a year on IT. The annual “debt servicing” is $100 million!
Data Debt Quadrant
Let’s leverage another aspect of technical debt concept (specifically, its quadrant framework – (see the figure below) to demonstrate how organizations accumulate data debt:
*Software development thought leader Martin Fowler’s technical debt quadrant concept inspired this representation of data debt characteristics.
Data illiterate – For expediency’s sake and without realizing the full extent of the cost, we do something that will be expensive to re-do later (e.g., standalone and redundant master data). We recklessly make a decision about data and do so without acknowledging the impact of data debt. Moving away from the “ignorance” level of managing data debt will require some education and a good bit of sponsorship.
Resistance debt. We know, full well, this is not the best way, but politics, resistance to change or other attitudes instill a “ready, fire, aim” approach. We know the cost and do it anyway and make no allowances for remediation of the debt. This is called Resistance Debt because it is the most common form experienced – deliberate accumulation of debt by choosing departmental objective over enterprise spending on data management.
Realization debt. We learn our lessons from one or more projects and end up knowing the cost of our mistakes. We recognize the debt post facto. While it still shows an immature organization, realization of the data debt starts the organization on a path to start or reinforce data governance.
Acknowledged debt. This is when you consciously choose to accrue data debt. In effect, you borrow money to get something done, and are willing to absorb the future cost and risk. We know the cost of accruing data debt, but it’s our best choice right now — and we formulate a plan to lower the debt later. Remember, data debt is not the magic bullet to always do data management — it is also a means to make prudent business decisions.
Managing Data Debt
Here are scenarios of how companies might manage debt using the data debt quadrant.
Data illiterate – It is probably obvious, but this is the where organizations are unaware of the impact of data decisions. A department or development team sees the need for a file or data store and creates it without any consideration for the ramification to data assets. Common problems (and future costs) from this scenario are:
- Someone else uses the poorly designed data source – this increases risk of errors or compliance risks
- The originating party actually believes their data even though it is in error – illiterate departments , dazzled by tools or expertise with Excel, think that their source and data movement are fine. Even later, when they find errors, they tweak the data to a point where it is only useful to one area, but at much higher cost to maintain.
- Additional definitions and semantics enter the organization’s vocabulary – Local knowledge becomes intractable. Then when a future effort tries to integrate data, the local dialect around data becomes a barrier.
The lack of awareness of data as a type of enterprise asset sets the stage for continuing confusion and increasing costs.
Resistance debt – Once there is a realization that “data work” should have some sort of discipline, organizations try and implement standards, governance data quality or some other sort of oversight. At this point the perception that any new rules or standards will take too much extra time becomes a common refrain. Applications development is notorious for this (even though there is NO DATA WHATSOEVER to support this claim. )
- Accept the debt without any allowance – A data scientist doesn’t want to wait for consumer data to be cleaned up, and neither does the marketing person he’s working with. They recognize there is a margin of error with this approach, and that they are assuming some risk and potential reduced value of the marketing campaign. But they feel compelled to move forward. If an organization is experiencing this scenario, some sort of training is required to support a process to analyze the difference between proceeding as planned vs. taking the time to clean up the data sources
- Ignore new policy – Often a new data governance program will meet resistance. The first evidence is when n area asks for an exception to brand new policy, or flat out ignores it.
Realization debt – Organizations realize that their data debt is pretty steep. Then steps are taken to reduce debt. At minimum, steps are outlined to not increase the debt anymore.
- Communicate the issue to management – A department wants to add a new BI/Analytics application but are told they must defer the decision until next year. They decide to include both the application cost and the cost of not doing anything as two separate line items on next year’s budget. They do this to signal to leadership that the company is spending excessive money dealing with the existing BI application.
- Modify planning processes – A new application is proposed with great fanfare of functionality, but no mention of data sourcing and management. The PMO stops the project until they prove they will not increase data debt.
Rational debt – A mature organization uses data debt as a metric and decision-making tool.
- Manage ERP expense – A company decides to proceed with installing a release of its ERP software, but acknowledges that more will need to be spent on ensuring integration of the new package with existing interoperability requirements
- Regulatory burdens – A small company is confronted with a regulatory data burden -such as California’s Consumer Protection Act (CCPA). Since they are small but want to comply, they decide to postpone any investment in expensive data lineage tools and do data lineage manually. This means they are willing to assume some risk, for a period of time, around being compliant with the CCPA. They estimate the data debt and set up an allowance to pay it off (and reconciling the noted risk) next year.
What We Can Do About Data Debt
Whether deliberate or inadvertent, reckless or prudent, the mismanagement of data creates debt for an organization. Like all debts, they must be paid eventually — either slowly over time (and with interest) or in a big chunk that pays off the debt. It is not sufficient to say “we never really recognized the debt – so we don’t need to deal with it.” The fact is all organizations have data debt. The outstanding balance continues to grow.
Your data management program needs to address data debt – how to pay for it and set a policy for how much data debt is tolerable and how to educate the organization about this powerful tool.
The concept of data debt addresses a long-standing issue in data governance and management. Data debt helps resolve a long-standing barrier between the data side and the non-data side. It can be used to convey a message or create a metric that supports decision-making. All organizations are strongly encouraged to place data debt in their vocabulary.