DATA MODEL DRIVEN DATA GOVERNANCE

A Data Model is a visual representation of data in an organization or a part of it. ER diagrams, Object Oriented modelling, Data flow modeling and Physical Data Modeling are some common types of Data Models.  


Metadata is data about data. It can be viewed as a data labelling system. Examples include date, source and owner of a piece of data; or the geographical coordinates of a photograph; the business users’ name for a computed metric; the change management log against a single column in an RDBMS table; or statistical information like min, max values of a column.  

Data Governance is the set of processes, structures, and policies that ensure that an organization’s information security risks are managed effectively. It makes sure that data is used appropriately, complies with regulations, and aligns with organizational goals. Data assets ownership, Data quality management, compliance and regulations, data lifecycle management, change management, data cataloguing and metadata management come under the purview of Data Governance. 

Data Governance leverages metadata while framing policies and rules around data usage. For example, access is usually restricted for sensitive information like Personally Identified Information (PII). Defining roles, user groups and access rules is only one part of the task. Implementation is another. Attaching metadata like “privacy level” while defining the Data model can be of great support in automating Governance enforcementOther examples include data lineage traceability to external data sources (e.g. Primary key in the external data source), business entity names and variants (eg. customer, consumer, client). 

Data Model Driven Data Governance (DMDG)

Data Model Driven Data Governance approach adds Governance related metadata to the Data Model itself. By defining data standards and policies, DMDG can help to ensure that data is accurate, complete, and consistent. In addition to the technical requirements like data type and size, data model shall have metadata that is required to steer Data Governance policies. This approach supports automation of data governance. Improved data quality, data usability and improved security & compliance are the main advantages. 

You can define the structure, semantics and relationships of data assets. For example, adding depart. For example, department name can add organizational structure. Storing corresponding business entity name against a table or an attribute helps in adding semantics. Adding primary key of external source table in the migration target table enables data lineage traceability, which is one kind of relationship. 

Evolution or Revolution?

Traditionally data models have been capturing the technical metadata like the data type, length, default values and check constraints (Y/N/Null). Comments are supported by almost all popular databases and designer tools to add business meaning to the data objects. Consistent naming standards, usage of domains (e.g. Name is a String of 20 characters) have been the best practices that stood the test of time. Adding business level metadata beyond a one liner comments makes logical sense and simplifies data governance. In short, using additional metadata to implement data governance is an evolution rather than revolution. Rules implementation is can be tied to metadata. For example, automated data archival workflows can filter based on the retention period meta data. 

Common Governance rules addressed using DMDG

  1. Data policies
    • Procedures and processes for data management
    • Data access, quality, security and life cycle management
  2. Data standards
    • Rules and guidelines regarding data collection and usage
  3. Data dictionaries and catalogues
    • All the required metadata shall be stored along with data object definitions like table definitions
  4. Access roles
    • In addition to the traditional roles like End user, Administrator, Developer and Tester, new roles come into picture to enforce data governance
    • Data owners (e.g. departments), Data stewards (corresponding MDM team) and Security owner (corresponding security team)
  5. Automated Governance Workflows
    • Automation scripts can be added to carry out governance related tasks. For example, data life cycle management, auto-assigning operations tickets for master data
    • Trigger based alerts generation or kicking off pre-defined workflows when resource quotas reach a warning threshold can be useful in effective cost management
  6. Reporting and compliance
    • Access, resource usage, master data changes are some of the commonly monitored governance reports. These can be generated based on organizational structure as defined by the metadata.