In every organization, the Human Resource (HR) team spends more time while doing resume screening. For a long time, recruiters have been screening hundreds of resumes manually. In this process, they go through every candidate’s resume and evaluate based on the candidate’s skillset, education details, work experience, etc. To evaluate and select desired candidates based on their company requirements, recruiters would take a long time.
So to reduce the time for this process, recruiters follow two kinds of ways
In both cases, recruiters don’t get the desired candidate’s resume effectively. Because in the first case, they may lose skilled persons in the remaining set of resumes. In the second case, recruiters may not focus on all essential fields in every resume.
To avoid this issue in the HR field, we need to automate this process to focus on more important sections in less time. How can we automate the resume screening process? By using Natural Language Processing techniques, we can create a model/system to automate the process. If we want to build an accurate and effective model, we need a proper Resume Named Entity Recognition annotated dataset. You can get such a proper dataset at Predictly.
Input Type :- Text
Number of data points/resumes:- 2500
Number of entities / labels :- 13
Labels:- Name, Phone Number, Email, Skills, School Name, College Name, Degree, Major, Location, Year of Passing, Grade, Organization Name, Years of Experience.
Before you create a Resume NER dataset, we need to know what kind of data and labels/classes we need. Here we need a bunch of resumes. Nowadays, candidates upload their resumes to organization/company websites. All resumes are stored in organization databases; we can extract or collect most of the data by using web scraping techniques. We can get the required amount of resumes from various companies’ job portals. After getting a resumes list, we need to extract the text effectively from each and every resume.
Methods:- Web Scraping, Data Collection, Data Extraction, Data Storage, Data Management, Data Preprocessing.
Technologies/Libraries used :- Python, Pandas, Selenium, BeautifulSoup, Requests, JSON, CSV, PyPDF2, Docx.
Every organization or domain skills, education details, job requirements are different from other domains/organizations. For example, software skill sets are different from healthcare domain skills. So we need to scrape resumes based on organization domain/ job requirements.
And one more thing is, not every job portal or organization does not give their resumes to others. So we need to do research on that and find the best sites for your required resumes.
Another important task is extracting data from resumes. Resumes are in the different formats such as .pdf, .docx. We need to apply different techniques on resumes to extract data based on resume format. We can perform this task by using Python, PyPDF2, Docx libraries.
After extracting data, we need to store that data for further processing tasks and apply a few preprocessing techniques to better qualitative data.
Methods: Data Labeling, Data Visualization, Model Development, Machine Learning/Deep Learning, Model Evaluation, Word Embeddings(Glove, FastText, etc), Active Learning
Technology/Library used: Python, CSV, JSON, Regex, Pytorch, Numpy, TensorBoard, Fast.ai, Scikit-Learn, Matplotlib, Seaborn, and Predictly Text Annotation Platform
Here you will know How Predictly performs different tasks to create Resume NER dataset effectively?
Using this Resume NER dataset, we can predict resume named entities such as personal details (name, email, phone number), education details, skills, etc. from every uploaded Resume.
Here’s how the typical resume screening system will look like
Here we can also get resumes with corresponding related skills of filter skills from uploaded resumes with corresponding download links of each qualified resume.