All the Ways Bad Data is Holding You Back

Digital transformation initiatives are challenging to complete in many organizations. When data issues block the path forward, progress stops.

Many companies are thriving in spite of their data issues. However, the continuing pressure for cost reductions and competition for capital is increasing the priority on resolving data issues. They have escalated into significant impediments that increase costs and risks while reducing net income.

Data issues are easy to identify, and solving them will put your business back on track to realize the many, many benefits of digital transformation.

The data issues that engineering leadership can eliminate, or at least reduce, include the following.

Inconsistent data

Inconsistencies can occur in reference, master and transaction data. For example:

  1. Multiple identifiers for key data such as vendor or product across IT systems.
  2. Component specifications are not used everywhere in project data.
  3. Variations exist on $1000, such as 1,000, 1000 USD, USD 1000, 1000.00 or one thousand dollars.
  4. Variations are found in units of measure abbreviations such as kg, Kg, kilogr, and KG.
  5. Numbers are not left zero-filled.
  6. Text is right justified as opposed to left justified.
  7. Multiple date formats are used, such as March 25, 2023; 2023-03-25 or 03/25/23.
  8. The letter O is used instead of zero.
  9. Multiple state abbreviations, such as CA, Cal, or Calif, are used.
  10. Incorrect conversions between EBCDIC and ASCII are evident.

Inaccurate data

Data inaccuracies can be traced back to several factors, including human errors, data drift and data decay. Examples of inaccurate data include:

  1. Designs with mixed units of measure.
  2. Drawings for buildings using erroneous or multiple elevation values.
  3. Nulls found where that’s supposed to be impossible.
  4. Incorrect formulas are used for calculated values.
  5. Latitude or longitude values are invalid or associated with a wrong or no datum.

Ambiguous data

Ambiguous data occurs when the end-user isn’t sure what the available data means. Examples of ambiguous data include:

  1. Missing pieces of drawing revision history.
  2. Spelling errors.
  3. Data format issues such as dates with multiple possible meanings.
  4. No units of measure.
  5. Time with no time zone or AM/PM indication.
  6. Misleading column headings and names.
  7. Use of various languages.

Semantic differences

Semantic differences refer to subtle differences in meanings of data elements that are named similarly and appear to have the same set of valid values. Examples of semantic data differences include:

  1. Material codes that imply different alloys or finishes in other IT systems.
  2. Work-in-progress status codes are identical but are invoked differently in various business processes.
  3. Vendor codes may or may not include subsidiaries that buy and sell to each other.
  4. Rules for setting start and end dates vary from region to region.

Incompatible data structures

Incompatible data structures refer to instances where different systems, from which data is to be integrated, represent the same data differently. Such situations make joining the data more complex. Examples of incompatible data structures when comparing systems include:

  1. Compound keys are composed of multiple but different values such as part number, source vendor and paint color code.
  2. The use of delimiter characters like commas, dashes or slashes varies.
  3. Other embedded or appended data, such as a year or warehouse identifier, is inconsistent.
  4. One system includes contractors as employees, while another does not.

Incomplete data

Incomplete data exists when data elements contain partial or no information. Examples of incomplete data include:

  1. Partial descriptions of mechanical components on drawings.
  2. Missing parts of product masters.
  3. Non-existent address or contact information on customer masters.

Lack of data integrity

Examples of data integrity issues include:

  1. Type of steel not available for the pipe diameter specified.
  2. Wall panel dimensions are larger than the opening.
  3. Address and zip code don’t match.
  4. Country and state combination is not possible.
  5. Some of the sales transactions occurred before the vendor approval date.

Human errors

Human errors refer to data entry errors. Examples of human errors include:

  1. Lack of system familiarity leading to input errors.
  2. Simple keying errors.
  3. Picking the incorrect value from a dropdown list.
  4. Accidentally tabbing past input fields.
  5. Creating a misleading report by incorrectly selecting a date range.

Duplicate data

Duplicate data occurs when staff or software makes data processing mistakes. Examples of duplicate data include:

  1. Duplicate ladders on the side of a pressure vessel diagram.
  2. Duplicate customer master data with different identifiers.
  3. Duplicate but different contact details.
  4. Duplicate sales or returns transaction data.

When used as AI training data, duplicate data can produce skewed ML models.

Too much data

Many organizations accumulate too much data while paradoxically using too little of that data to support their decision-making processes. Examples of too much data include:

  1. IIoT/SCADA data that has not been reduced to contain only records of changes.
  2. Excessive history or granularity that adds no value to the analytics.
  3. Subscribing to multiple data services for highly similar data.

Too much data impacts performance and double-counting events, leading to misleading analytics.

Haphazard data management

Some companies work with haphazard data management practices. Examples include:

  1. Data has gaps and inconsistencies that require cleanup before real analytical work can begin.
  2. Data quality varies from vendor to vendor, such as component and material vendors.
  3. Data quality varies from application to application and from region to region.
  4. Data model differences across software packages thwart data stewardship efforts.

Data management practices can be strengthened by:

  1. Supporting the work of data stewards.
  2. Describing expected data standards in vendor agreements.

Recognizing poor data quality

Poor data quality first manifests itself in these IT technical issues:

  1. Hinders integrating data from multiple sources.
  2. Creates summation errors.
  3. Causes software crashes.
  4. Causes system performance problems.

Then poor data quality leads to these business issues:

  1. A lack of confidence in reports and charts.
  2. Uninformed or misinformed decision-making that adds risk.
  3. Inaccurate problem analysis that adds cost.
  4. Poor customer relationships that reduce sales and market share.
  5. Disappointing product launches that slow growth.

Solutions to poor data quality

The solution to these frustrating data consequences consists of the following actions that engineering leadership can champion:

  1. Educate employees that accurate and complete data is a prerequisite to superior customer service and data analytics.
  2. Appoint a data steward for every IT system and provide that person with reports and charts that monitor data quality in their assigned application.
  3. Set data standards for the organization.
  4. Ensure data integration processes produce exception reports that reveal data problems that need correction.
  5. Design software to validate data more thoroughly.
  6. Test software thoroughly.
  7. Enact data governance processes.
  8. Create data quality measures and share status and trends using dashboards.
  9. Follow reasonable data modelling practices.
  10. Introduce processes that reduce identical IIoT/SCADA data.
  11. Create summary data marts.
  12. Offer a data catalog that helps end-users better understand which databases, data marts and tables are most suitable for their queries.

One or more of these actions can correct the specific data issues described above to advance digital transformation and ensure that only superior data powers data analytics.

Overcoming data issues will advance digital transformation and reduce operating costs. Engineers can lead in addressing data issues through many actions. The most important is for engineers to communicate that data is a valuable asset that needs to be cared for, just like other asset categories such as buildings, equipment and intellectual property.