Data is said to be of high quality when it meets the needs and expectations of the consumers of the data. In the retail industry, the main purpose of data quality is ensuring an accurate profile of a customer’s purchases and behaviors. Quality data in the retail industry should be able to paint a picture of the customer and help guide retailers on how to meet the needs of the customer. Quality data should also represent the real world accurately. For example, quality retail data on me should reveal that I am a female who mostly purchases female products but once in a while purchases male products for a friend or relative.
The dimensions of quality data can be used as a guide to ensure that quality data is being provided. One source listed as many as sixteen dimensions of quality data, but I have condensed the many results into six key characteristics or dimensions of quality data (Pipino et al, 2002):
Accuracy refers to data being precise and free of error. It is correct and reliable and represents existing values.
Accessibility refers to the required data being available.
Consistency means that similar data is shown in the same format in order to reduce confusion.
Completeness refers to data being whole; there is no missing part. All the information the team needs to do their job must be present.
Timeliness refers to data being up to date and useful for the designated task. Data must evolve with time.
Security refers to data being restricted to keep those without clearance from accessing it. Only give users the access they need to do their jobs.
MIT’s Total Data Quality Management (TDQM) is a framework that was developed in the 1990’s to ensure data quality. Define, Measure, Analyze and Improve are the cycles of this framework (Madnick et al, 2009). Data is treated as a product and data quality is seen as being fit for use from the consumer’s point of view. Under the Define cycle, the dimensions of data quality are identified and categorized into four: accessibility, representational, contextual and intrinsic (Madnick et al, 2009). The Measure cycle involves selecting metrics for the data quality dimensions identified in define, and measuring the data using the simple ratio, min or max operations and weighted average to determine the quality of the dimensions selected. The Analyze phase makes sense of the results from the preceding cycle. This phase seeks to find the root causes for the problems in the dimensions. The Improve phase seeks to generate solutions, select the best solution, develop an action plan and monitor progress. The goal is to refine the processes to ensure quality data is being collected.
Data quality can affect many areas of a retail business. There have been instances where a customer is trying to make purchases online and clicks on a particular item. If this item is missing a picture or product information like ingredients for food or materials used for clothing, the data is said to be missing the completeness dimension. This is because all the information needed to make a decision is not present. Data quality can also affect the organization when data is not unique- there are data duplicates and the company spends more than necessary by sending marketing material to the same person multiple times. This duplication of data is likely to annoy customers and cause them to refrain from sharing more data with the retailer.
Some best practices for ensuring data quality are (Profisee, 2018):
- Get buy-in from top management
2. Manage data quality activities as part of a data governance framework
3. Integrate the data roles with business and IT staff
4. Use business glossary for metadata management
5. Have a data quality issue log
6. Link data quality key performance metrics to business KPIs
7. Spread awareness of the importance of data quality
8. Automate the process
9. Improve critical data first
10. Manage data throughout its lifecycle
11. Prevent data errors
Many brands have shifted from marketing just to create brand awareness to purposeful marketing using data. Data quality is important in retail so that retailers can understand customers better and be able to provide useful products to customers in a timely manner in order to increase revenue. Many retailers have instituted loyalty programs to help them accurately collect customer data. Data quality also reduces risks and costs. Gone are the days where retailers have to speculate on what customers might like. With effectively tracking customer spending and preferences, retailers can now use data to accurately predict what customers want. Retailers can now eliminate the cost associated with wrongly guessing the preferences of consumers.
Additionally, bad data can lead to bad analysis and this can be very costly. Therefore, ensuring quality in data is the foundation for good analysis and avoidance of costly mistakes. Quality data is also necessary to be able to automate the analysis of data which can help reduce costs associated with manually checking and correcting data. Lastly, data quality is necessary in order to meet regulations like GDPR which enforces privacy and data protections. The security of data is among the dimensions for data quality. If there is a breach in security, an organization’s reputation will be damaged and there will be loss of trust with customers.
Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data quality assessment. <i>Commun. ACM</i> 45, 4 (April 2002), 211–218. DOI:https://doi.org/10.1145/505248.506010
Profisee. 2018. Data Quality — What, Why, How, 10 Best Practices & More! <https://profisee.com/data-quality-what-why-how-who/>
Stuart E. Madnick, Richard Y. Wang, Yang W. Lee, and Hongwei Zhu. 2009. Overview and Framework for Data and Information Quality Research. <i>J. Data and Information Quality</i> 1, 1, Article 2 (June 2009), 22 pages. DOI:https://doi.org/10.1145/1515693.1516680