Title: | Data Sets for Keith McNulty's Handbook of Regression Modeling in People Analytics |
Description: | Data sets for statistical inference modeling related to People Analytics. Contains various data sets from the book 'Handbook of Regression Modeling in People Analytics' by Keith McNulty (2020). |
Authors: | Keith McNulty [aut, cre] |
Maintainer: | Keith McNulty <[email protected]> |
License: | MIT + file LICENSE |
Version: | |
Built: | 2025-03-06 02:46:03 UTC |
Source: | https://github.com/keithmcnulty/peopleanalyticsdata |
Fictional data on the demographics and donation behavior of donors to a wildlife charity
A dataframe with 354 rows and 8 variables:
The total number of times the individual donated previous to the month being studied
The total amount of money donated by the individual previous to the month being studied
The number of months between the first donation and the month being studied
Whether or not the individual donated in the month being studied
The number of months between the most recent previous donation and the month beng studied
The gender of the individual
Whether the person resides in an Urban or Rural Domestic location or Overseas
The age of the individual
Fictional data on employee performance evaluation metrics for a group of salespeople.
A dataframe with 366 rows and 5 variables:
The annual sales of the individual in millions of dollars
The number of new customers acquired by the individual
The region the individuals works in - North, South, East or West
The gender of the individual
The performance rating of the individual - 1 = Low, 2 = Middle, 3 = High
Fictional data on the results of an engagement survey among company employees on a four-point Likert scale indicating increasingly positive sentiment
A dataframe with 2833 rows and 14 variables:
The employee rating on their overall happiness
The employee rating on three questions related to employment benefits
The employee rating on three questions related to general work environment
The employee rating on three questions related to perceptions of management
The employee rating on four questions related to perceptions of career prospects
Data on graduate salaries in the United States
A dataframe with 173 rows and 5 variables:
The specific subject major
The broad subject discipline
The number of graduates of working age in the US
The proportion of graduates currently unemployed
The current median salary of those employed in US dollars
Fictional data on the choice of health insurance product by employees of a large company
A dataframe with 1453 rows and 6 variables:
The choice of product of the individual - A, B or C
The age of the individual when they made the choice
The number of people living with the individual in the same household at the time of the choice
The individual's position level in the company at the time they made the choice, where 1 is is the lowest and 5 is the highest
The gender of the individual as stated when they made the choice
The number of days the individual was absent from work in the year prior to the choice
Fictional data on the retention of employees in various fields of employment over a 12 month period
A dataframe with 3770 rows and 7 variables:
The gender of the individual studied
The field of employment of the individual at the beginning of the study
The level of the position of the individual in their organization at the beginning of the study - Low, Medium or High
The sentiment score reported by the individual on a scale of 1 to 10 at the beginning of the study, with 1 indicating extremely negative sentiment and 10 indicating extremely positive sentiment
A score of 1 to 10 reported by the individual at the beginning of the study regarding their intention to leave their job in the next 12 months, where 1 indicates an extremely low intention and 10 indicates an extremely high intention
A binary variable indicating whether or not the individual had left their job as at the last follow-up
The month of the last follow-up
Fictional data on feedback from participants in a set of learning programs
A dataframe with 4974 rows and 8 variables:
The unique ID code of the participant
A binary value indicating whether the participant would recommend the program to others
A rating from the participant on the relevance of the program to their work, where 1 is Very Low and 5 is Very High
A rating on how enjoyable and fun the participant found the program, where 1 is Very Low and 5 is Very High
A rating from the participant on the clarity of the content and teaching in the program, where 1 is Very Low and 5 is Very High
A rating from the participant on the quality of the homework or project work in the program, where 1 is Very Low and 5 is Very High
A rating from the participant on the quality of the overall class who attended the program, where 1 is Very Low and 5 is Very High
A rating from the participant on the quality of the program faculty and instructors, where 1 is Very Low and 5 is Very High
Fictional data on the performance and other characteristics of a group of managers in a large company
A dataframe with 571 rows and 13 variables:
The unique ID number for each manager
The performance group of each manager in a recent performance review: Bottom performer, Middle performer, Top performer
Total length of time employed by the company in years
Whether or not the individual was hired directly to be a manager (Y) or promoted to manager (N)
Score on a test given to all managers
The number of employees in the group the manager is responsible for
Whether or not the individual has been the subject of a complaint by a member of their group
Whether or not the individual works mobile (Y) or in the office (N)
The number of customer accounts the manager is responsible for
Whether or not the manager has entered unusually high hours into their timesheet in the past year
The number of transfer requests coming from the manager’s group while they have been a manager
Whether the manager works part time (Y) or full time (N)
The current office of the manager
Fictional data from a survey conducted by a political party on a Likert scale of 1 to 4 indicating increasingly positive sentiment
A dataframe with 2108 rows and 23 variables:
The respondent's overall intention to vote for the party in the next election
The respondent's sentiment on three questions related to the policies of the party
The respondent's sentiment on three questions regarding prior voting habits in relation to the party
The respondent's sentiment on four questions related to their interest in local issues
The respondent's sentiment on two questions related to their interest in environment issues
The respondent's sentiment on two questions related to their interest in international issues
The respondent's sentiment on three questions related to their perceptions of the personalities of local and national party leaders
The respondent's sentiment on three questions related to their interest in national issues
The respondent's sentiment on two questions related to their interest in economic issues
Fictional data on promotions in a retail company
A dataframe with 1134 rows and 5 variables:
A binary value indicating membership of a diversity group at the company
A binary value indicating whether or not the individual worked part-time for at least 6 months
A binary value indicating whether the individual joined in a position working in the retail stores
A binary value indicating whether or not the individual was promoted
The year of the last record of the individual, where the date they joined was year 0. If the individual was promoted, this will be the year of the promotion.
Fictional data on applicants to a graduate recruiting program in a financial services company
A dataframe with 966 rows and 8 variables:
The gender of the applicant
The SAT score of the applicant
The GPA of the applicant
The result of an aptitude test given to the applicant
Applicant rating given by two line manager interviewers, on a Likert Scale of 1 to 5 indicating increasing positivity
Applicant rating given by a human resources interviewer, on a Likert Scale of 1 to 5 indicating increasing positivity
Binary indicating whether the decision was Hire (1) or No Hire (0)
Fictional data on promotion and performance for salespeople in a technology company
A dataframe with 351 rows and 4 variables:
A binary value indicating 1 if the individual was promoted and 0 if not
The sales (in thousands of dollars) attributed to the individual in the period of the promotion
The average satisfaction rating from a survey of the individual’s customers during the promotion period
The most recent performance rating prior to promotion from 1 (lowest) to 4 (highest)
Fictional data on disciplinary measures by referees in soccer games
A dataframe with 2291 rows and 7 variables:
A record of the maximum discipline taken by the referee against the player in the game. “None” means no discipline was taken, “Yellow” means the player was issued a yellow card (warned), “Red” means the player was issued a red card and ordered off the field of play
The total number of yellow cards issued to the player in the previous 25 games they played prior to this game
The total number of red cards issued to the player in the previous 25 games they played prior to this game
The playing position of the player in the game: “D” is defence (including goalkeeper), “M” is midfield and “S” is striker/attacker
The result of the game for the team of the player - “W” is win, “L” is lose, “D” is a draw/tie
The country in which the game took place - England or Germany
The skill level of the competition in which the game took place, with 1 being higher and 2 being lower
Fictional data on a sociological survey related to income levels in various regions of the world.
A dataframe with 2618 rows and 9 variables:
The annual income of the individual in PPP adjusted US dollars
The average number of hours per week worked by the individual
The total number of months spend by the individual in formal primary, secondary and tertiary education
The region of the world where the individual lives
Whether the individual works in a skilled or unskilled profession
The gender of the individual
The size of the individual’s family of dependents
The distance between the indivdual’s residence and workplace in kilometers
The number of languages spoken fluently by the individual
Simplified version of the Columbia University speed dating experiment data set
A dataframe with 8378 rows and 11 variables:
An id number for the individual
The gender of the individual with 0 as Female and 1 and Male
Indicates if the meeting resulted in a match
Indicates if both the individual and the partner were of the same race
The race of the individual, with race coded as follows: Black/African American=1, European/Caucasian-American=2, Latino/Hispanic American=3, Asian/Pacific Islander/Asian-American=4, Native American=5, Other=6
The reason why the individual is participating in the event, coded as follows: Seemed like a fun night out=1, To meet new people=2, To get a date=3, Looking for a serious relationship=4, To say I did it=5, Other=6
A binary rating from the individual as to whether they would like to see their partner again (1 is Yes and 0 is No)
The individual’s rating out of ten on the attractiveness of the partner
The individual’s rating out of ten on the intelligence level of the partner
The individual’s rating out of ten on whether they believe the partner will want to see them again
The absolute difference in the ages of the individual and the partner
Fictional data on examination scores of undergraduates on a four year biology degree program.
A dataframe with 975 rows and 4 variables:
Score in the first year examination on a scale of 0-100
Score in the second year examination on a scale of 0-200
Score in the third year examination on a scale of 0-200
Score in the final year examination on a scale of 0-300