Data Cleaning IS Data Analysis

On March 16, 2017, in HRExaminer, by John Sumser

2017 03 15 hrexaminer data cleaning is data analysis photo img cco via pexels photo 325229 server room data computer it 544x190px.jpg

“As a result, they are producing tools that will generate reams of data that is easy to collect, easy to summarize and easy to clean. And, in the process, they will squeeze the life out of their investigations.” – John Sumser

I nominate the title of this note as the least Buzzworthy headline of the month. It offers no silver bullet or easy button. It infers that mind numbing detailed work is strategically critical.

Somehow, in the race to be relevant to the top tier of the organization, we’ve lost sight of where the answers lie. Arm-waving, prognostication of simple answers to complex questions feels like the holy land. We want our seat at the table, with cushions, bonuses and no critical thought.

The bad news is that real data, entered by human beings, is messy, messy, messy. The foundation of workforce analytics, people data, is clumsy, error laden and biased. The questions we use to collect and understand are worse.

These days, I am a part of four different, committee led, data collection juggernauts. They are remarkable for their sameness. Each hopes to simplify the data collection process by limiting the scope of their questions. Each wishes to minimize the amount of ‘data cleaning’ at the end of the process. Each, very unconsciously, wants the data to support the underlying goals of their initiative.

As a result, they are producing tools that will generate reams of data that is easy to collect, easy to summarize and easy to clean. And, in the process, they will squeeze the life out of their investigations. They will preclude the discovery of alternative views. They will confirm the views that they began the process with.

Data may not be like other natural resources. In general, plants, minerals and liquids gain in value the more they are processed. While we are beginning to doubt that idea in the food supply, it remains true of most resources that have extraction and refining processes.

That’s how we tend to think about data. It’s a refining process. Each step forward  produces increasing value. The remnants are ‘waste.’

That may not be true with data.

In the process of cleaning the data, great discoveries are made about the nature of the investigation. We learn the imperfections of our questions and the things that the data sources want to tell us. Hands on experience cleaning the data produces great wisdom. The material that we shed in the cleaning process creates the first layer of real insight.

Tidy processes are inexpensive and very efficient. The things that we want to learn may not be.

 
Page 1 of 11
Read previous post:
HRx Big Ideas Radio: Episode #12: Five Most Interesting Recruiting Tools with Kyle Lagunas

What are the five most interesting recruiting tools right now? John Sumser talks with Kyle Lagunas, the research manager and...

Close