photo of Rob May of Talla on HRExaminer.com in July 2016

“Let’s say you couldn’t remember the capital of
Belgium. You could ask the computer what it is by saying ‘I want Paris, but subtract France and add Belgium.’ The computer would return Brussels. Now imagine you could do that with employees. What if you could say ‘I want someone like my best engineer, but with more experience in management?’” – Rob May

By Rob May, CEO and co-founder of Talla

What Are Word Vectors And Why Do They Matter?

 
In late 2013, Tomas Mikolov, who was then working at Google, released a research paper on “word2vec.”  In the two and a half years since, the impact of word2vec has rocketed through the A.I. world.  In the next two years, it will start impacting some of what you, as an HR executive, do in your day to day job.  So I want to take some time and explain what is word2vec?  What are word vectors?  And most importantly, how will they impact human resources at most companies?

Word vectors will make computing on natural language just as easy as computing on numbers, so you can do all kinds of different comparisons on natural language data.  For instance, what if, instead of doing a keyword search on resumes to find a new engineer, you just fed the resume of your best engineer into a system and out popped the most similar candidate on the market?  That can be done using word vectors.

There are three key concepts you need to understand to make sense of this new technology.  The first is the idea of a representation.  A painting of a car is a representation of that car, for example.  It isn’t the real car, but, you can tell a lot about the car from the painting.  In artificial intelligence, a lot of the processing work is done on some sort of representation.  Word vectors are a representation of language.  You need to convert words to some sort of new representation because we don’t have a way to process the words themselves.

The second concept is the “syntax” vs “semantic” distinction.  Syntax is the grammatical structure of language, while semantics is the individual meaning of the words in the sentence.  Historically, there have been many types of representations of words that have been tried.  Some captured the syntactic structure of language (e.g. a verb applies to a specific noun), others captured semantic properties of words (e.g. “boat” and “ship” mean roughly the same thing).

The third concept is that of a vector.  You may remember vectors from an advanced math class.  They have a length, and a direction.  If you have a vector of 3 bits, (for example, 3,3,4) you can graph that in three dimensions.

Now that we have our key three concepts, let’s put them together in a word vector.  A word vector is a representation of a word in vector format that captures both syntactic and semantic relationships between words, just by ingesting a large corpus of words in everyday usage.  Each word becomes a vector, and the great thing about vectors is that you can do math with them.  For example, in the Mikolov paper referenced in my intro, the machine was given the equation:

King – Man + Woman = ?

And the machine answered “Queen.”  Yet, the machine knows nothing about gender.  The reason this works is that the vectors connecting related words tend to be parallel.  So, the vector offset between “king” and “man” is the same as the one between “queen” and “woman”.

To give another example, let’s say you couldn’t remember the capital of Belgium.  You could ask the computer what it is by saying “I want Paris, but subtract France and add Belgium.”  The computer would return Brussels.  Now imagine you could do that with employees.  What if you could say “I want someone like my best engineer, but with more experience in management?”

The reason word vectors will revolutionize HR is because so much HR data is in natural language form.  Imagine taking all those resumes, cover letters, performance reviews, and turning them into word vectors so you could process them.  Imagine taking all your training materials and being able to make a vector match between an employee’s vectorized work history and the courses he or she should take next to reach a certain goal.  Imagine the time savings from being able to convert all the commonly asked HR questions into word vectors so that a computer could easily identify and answer them, saving you lots of time and effort.

At Talla, we are one of the many startups working on solving these types of problems using word vectors.  As an experiment, we recently applied the vector concept to resumes very successfully.  The practical value with this approach is that, you don’t have to do any specific keyword screening on a candidate because the vector approach finds the best composite match.

But wait, nothing comes for free right?  Yes.  There is a downside to word vector use in HR.  Like all A.I. systems, the output is only as good as the input.  The input here is natural language descriptions of people captured through various HR workflows.  If employees don’t take that capture seriously, and use frivolous, incomplete, or buzzword-laden language, then the results of any word vector analysis built on that data will be weak.

Despite the potential for some problems, word vectors are one of the most powerful inventions of the recent A.I. resurgence.  Over the next few years I expect to see dozens of tools on the market, targeting HR departments, that make productive use of word vectors, and all the natural language data sets describing employees, knowledge, and projects inside a given company.  These are latent assets that will suddenly become much more valuable and productive.

While there will be many other A.I. tools coming to market, with many different approaches among them, my bet is on word vectors as one of the most popular, and most powerful.  As these technologies start to permeate HR departments everywhere, the companies with the best people operations, and in particular data sets about those people, will find themselves at a significant advantage in a vector driven world.



Tagged with:  
Read previous post:
HRx Big Ideas Radio: Episode #10: Recruitment Marketing Scorecard with Chris Brablc

Chris Brablc is the Director of Marketing at SmashFly Technologies, an omnichannel recruitment marketing software company. Chris has been recognized...

Close