2020-09-15 HR Examiner article data makes its own gravy photo img AdobeStock 340020987 red part 2

“The majority of information consumed by intelligent tools are various forms of text. Over time, that will shift into a heavy prevalence of machine generated or monitored data like movement patterns, keystrokes, social interaction, vocal intonation, environmental measurements, and communication patterns.” - John Sumser


HR Data Makes Its Own Gravy

Types and Attributes of Data in AI

Part 2


Did you read part one in this series? HR Data Makes Its Own Gravy (Part 1).


The categorization of HR data types and attributes in AI and intelligent tools are still evolving. Examining the characteristics of the data will help us better understand them while also serving as a common frame of reference that will allow us to refine and expand the catagories as we learn. These are early days.


2017-04-21 HRExaminer photo img sumser john bio pic IMG 3046 black and white full 200px.jpg

John Sumser is the Principal Analyst for HRExaminer.

Before we jump into the categories, there’s an elephant-sized caveat in the room that we need to address. Sometime around March of this year, the coronavirus pandemic hit a metaphorical reset button and forever changed the historical data that machine learning and intelligent tools rely on for their predictive accuracy. We’ll treat the pandemic’s impact on machine learning and predictive accuracy as a separate topic for the discussion today.


The majority of information consumed by intelligent tools are various forms of text. Over time, that will shift into a heavy prevalence of machine generated or monitored data like movement patterns, keystrokes, social interaction, vocal intonation, environmental measurements, and communication patterns.


The following categories are intended to illuminate the breadth of the issues in data without claiming to be comprehensive. Increasingly, each kind of data will be assembled into data sets that are inputs to models and algorithms. Let’s dig in.


Personal Identifying Information (PII)


You should consider PII a radioactive data set. As various governments around the world attempt to pin down the definition, the practical meaning grows. In a world of global digital business, the only sensible way to manage PII is by fully complying with everything you can. It’s not really possible to know if you are doing business under one set of geographical laws or regulations when you’re online.


According to Sierra-Cedar’s 2019 Systems Survey1, 41% of HR Departments are responsible for PII in their company. The problem of knowing where PII is located is tougher than the question of how to keep it secure and well maintained. In any organization, PII lives in manager’s in baskets, succession planning documents, internal mobility plans, and recruiting workflows.


Worse still, the PII that migrates beyond central control is hard to keep up to date, virtually impossible.


Here’s a scenario:

Imagine that a supervisor is making a personnel decision and pulls some PII into her email. The next time she needs it, is she more likely to search her inbox or to go back to the system she uses infrequently to get updated info. It’s a sure bet that she’ll grab it from her email archives. Finding, maintaining, and managing PII is an overlooked component of the intelligent tools era.


Text / Language


On one level, the organization’s value is more or less the contents of all of its documents and written communications. Virtually all of this asset is digitized. Many of the current class of tools is concerned with the processing, categorization, and understanding of text. From resume matching systems to bias reduction tools, from knowledge management assemblers to sentiment analysis, from conversational interfaces to taxonomies (and dynamic ontologies), the tools manipulate, parse, index, dissect, and illuminate text.

It’s worth noting that text and language analysis is a sharply rising stock at present due to the loss of historical data caused by the coronavirus pandemic and the resulting downgrading of machine learning’s predictive accuracy.


Rate of Change


The rate at which data changes is a critical element. Every bit of data has some sort of use-by date. Some move quickly, like the ambient temperature of a workspace. Some seem almost permanent like the birthplace of an employee. Categorizing the rate the data changes is a central part of data governance and allows a clear picture of the shelf life of models and algorithms.


Data Flows


As data moves between and through workflows, it gets transformed. The meaning of the data changes as its state shifts. It comes from some provider, is transformed one step at a time through a workflow, gets distributed for further refinement and decision making, and then heads into its next workflow. These very processes also create data about themselves as the data gets transformed.


Machine Data


As offices and human networks become a part of the internet of things, the amount of data generated by equipment and monitoring devices grows logarithmically. This data, which can largely be understood as surveillance, changes very rapidly and often has a limited useful life.


Where an employee’s birthplace may never change (error correction being the exception) location monitoring data changes at the speed of the monitoring device. There’s a great deal of variability in the rate of change in machine data. But, it’s regularly much faster than the pace of change for text.


Where text-based information changes only as fast as it can be maintained, machine data changes at digital pace. More and faster data means more opportunity for distilled insight.


Survey Data


Survey and other forms of workforce measurement can also change rapidly. It is a working precursor for machine measurement. While survey data evolves quickly, the way it is collected can add error and bias to the output. There are offerings today that perform the survey function but bypass the data collection process. Keen Corp, one of our Watchlist Companies, uses text communications data flows to measure the tension in workgroups as an alternative to surveying.


Network Analysis


This is an example of a technology on the verge of being useful. Network Analysis is sometimes called Organizational Network Analysis. It is the mapping of interactions between network members (employees). It is done with digital communications (TrustSphere, Polinode), physical behavior (Time Clocks and Badging Systems), or a combination of both (Humanyze).


The data itself is readily available and growing in scope and density. The current problem involves figuring out what it means and how to act on it. Network Analysis provides insight into organizational patterns that we don’t have names for yet. It’s data that can be extracted by understanding patterns in existing data or by supplementing with additional (usually physical) measures.


Transactional / Behavioral


Payroll and benefits are the largest bit of transactional data held by the HR Department. Some intelligent tools (PhenomPeople) allow HR to look at the behavior of employees and potential employees as they interact with the company website. Variations in transactional data give deep insights into questions like where people are and what they are doing.


Transactional data also includes elements that can be a subset of network data. Speed of email responses to specific people can be counted as evidence of the status structure in an organization. An understanding of the status structure my give better insight to and modeling of a decision-making process.


Connecting Data


This is the data that comes from connecting the bits. Much of the analytical process that drives intelligent tools involves merging data from multiple sources into something richer and more complete, at least in the specific application. That integrated dataset can be the foundation of decision making tools that straddle workflows and departments or give deeper and more potent real time insights.


Data Cleaning and Maintenance


There’s a data science joke about machine learning: 80% of machine learning is cleaning data. The other 20% is complaining about cleaning data. The same can be said for much of people analytics and intelligent tools.


Data has a shelf life. While all of the elements age at a somewhat variable rate, they all age. One of the toughest jobs in organizational data management is keeping the core data up to date. While it’s very early, there are companies in the Recruiting sector who are making progress with the idea of automatically refreshing data on an as needed basis.


The nuance these companies discovered is that not all data need to be perfect all of the time. The trick is predicting which subsets of the data need refreshing. Swoop, Crowded, and RChilli each have useful tools in this area.


There’s another layer. Algorithms and models depend on the underlying data for their meaning and health. They go out of date in a way that’s directly related to the underlying data. It’s the cleaning and maintenance of the models that is the current unsolved problem.


Putting It All Together


That’s the landscape of ideas that are at the heart of intelligent tools in HR Tech at the moment. In many ways, intelligent tools are attributes of the underlying data. There is a reciprocal relationship between the tools and the data.


Did you read part one in this series? HR Data Makes Its Own Gravy (Part 1).

Read previous post:
2020-09-14 HR Examiner article data makes its own gravy photo img AdobeStock 340020987 red part 1 sq 200px.png
HR Data Makes Its Own Gravy (Part 1)

There’s a saying, ‘data makes its own gravy.’ Using data creates data about usage. Interestingly, the meta data created by...