kMeans clustering with HR-related data

One of the new kids on the analytics block is the HR function within businesses. There are loads of courses popping up that focus on People Analytics, and is becoming quite popular. HR departments can get started in analytics by using simple data visualization techniques. However, the real power of analytics comes to HR when some of the more advanced techniques are used, such as classification or clustering methods.  There doesn’t appear to be high adoption of some of the more advanced analytical methods yet in HR, probably because HR professionals don’t necessarily have this skill set, or are in the process of building this skill set.  HR departments, in order to gain value from predictive analytics, may have to outsource this or use consultants to assist.

In this article, I demonstrate how some of the more advanced analytical methods can be used with HR data. Clustering methods can be thought of as one of the tools in the predictive analytics category.  By placing employees into clusters (or groups, if you will) based on similar characteristics, we can make educated guesses as to which cluster a new employee might belong to.  In order to demonstrate this, I created an example of using the kMeans clustering algorithm on an HR data set.  I chose the kMeans algorithm for its popularity in clustering tasks. The example uses a data set from Kaggle (available here) that Dr. Carla Patalano and I created for an HR analytics case study.

Some of the People Analytics questions we can tackle with kMeans predictive approach include:

  • Which groups of employees share similar characteristics in terms of diversity? Are these groups showing up in only certain departments?
  • Are there any groupings of employees to differentiate performance?  Which employees are our best performing employees?

You can look at the details along with all the code here. Let me know if you’re interested in learning more!