data inventory from talend

Automating a data inventory for better data management

I’m often surprised how many organizations simply do not have a good handle on the data assets that they have. Several organizations that I’ve worked with do not have a data inventory. I believe it’s a problem that plagues far more organizations than we think. It is true that organizations collect far more data than is needed. One early assumption is “just collect it – we’ll figure out what to do with it later”. Well, it’s later now. It isn’t even that hard to collect data. There’s such an abundance of it, that now, we wrestle with different kinds of questions: “what the hell do I do with all of it” and “how do I get value out of it?” and “where is it”. Thankfully, there are solutions to these questions – one of which is the idea of data audits and data inventories.

A data inventory is a complete listing of the data assets that organizations have. Having a good handle on the universe of data is an early step in the process of better data management and better data governance. For a data inventory, I’m focused first on the strategic, high-level perspective where we don’t need to even know every single attribute within each data set. We can focus first on gathering the different domains of our organizations’ data. A data inventory should be developed and kept up-to-date. Further, the data inventory is only as good as the information contained in it.

Examples of data domains: human resources, accounting data, finance, customers, sales, opportunities, products, services, and, if you’re in the education space like I am, you probably have student data, faculty/teacher data, assessment data, and much more. If your organization can get a good handle on the universe of domains, then you are well on your way to better data management.

Old Ways vs Modern Ways of Creating Data Inventories

In the past, we conducted data inventories and data audits very manually, by interviewing end-users, exploring systems, and manually documenting the various domains and data entities. This was incredibly tedious work. However, we have more modern ways of doing this now. We can use automated systems and AI to perform this function for us. This is more efficient for capturing the types of domains and entities that are available across organizations. Informatica’s Enterprise Data Catalog (EDC) solution has an automated agent that captures metadata and allows end-users to tag the data within that catalog. Instead of a data architect spending weeks on this, the system can paint a picture for us within hours.

Some of these automated solutions also include data lineage features. This means that we want to know where information lives, where sensitive information lives, how it’s used, how it’s changed and what frequency, and how it is shared. Using automated methods to capture metadata and shows lineage is a great step forward for architects responsible for this type of thing. Thus, instead of manually creating lists and lists of entities within the data domains, automated tools are the way to go. The other benefit that many people don’t discuss with the automated system is that they are less likely to make a mistake, less likely to miss data, and the quality of the metadata also improves.

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *