If you are preparing to get a new job
So I made a résumé generator service that for developers who find it more difficult to introduce themselves in writing than in codes. It was possible by using Teachable NLP that trains GPT-2 with a text file of résumé. It is super easy if you follow below.
My résumé was written in only a few minutes.
- Web Frontend with HTML, CSS - Creating Web RESTful API - Taking part in preprocessing steps of machine learning mainly missing value treatment, outlier detection, encoding, scaling, feature selection. - Testing machine learn algorithms in python. optimizing of existing algorithms.
Isn’t it interesting? Let me show you how to make the résumé generator!
In Kaggle, I acquired up-to-date resume dataset which is used for training GPT-2 in Teachable NLP. The file format is .csv and there is a table containing 2 columns, ‘Category’ and ‘Resume’.
I used python package
Pandas , and preprocessed in
Jupyter notebook .
First of all, I made the table to
DataFrame, and checked the basics. There are sufficient resume data for developer. And there’s no null value in the table. If you find out null value, please remove it. Fortunately, in my data, two columns are non null(Not Null)
import pandas as pd import numpy as np # Read File data = pd.read_csv('/opt/notebooks/UpdatedResumeDataSet.csv') # Check categories print(data['Category'].unique()) """ ['Data Science' 'HR' 'Advocate' 'Arts' 'Web Designing' 'Mechanical Engineer' 'Sales' 'Health and fitness' 'Civil Engineer' 'Java Developer' 'Business Analyst' 'SAP Developer' 'Automation Testing' 'Electrical Engineering' 'Operations Manager' 'Python Developer' 'DevOps Engineer' 'Network Security Engineer' 'PMO' 'Database' 'Hadoop' 'ETL Developer' 'DotNet Developer' 'Blockchain' 'Testing'] """ # Check the numbers of data print(data['Category'].value_counts()) """ Java Developer 84 Testing 70 DevOps Engineer 55 Python Developer 48 Web Designing 45 HR 44 ... """ # Check null value data['Resume'].isna().sum() """ 0 """
And then, following below, you can get appropriate data specialized for developers.
A) Remove Unnecessary Words
B) Extract Resume Specialized For Developer
In cleaning stage, numbers, stopwords(meaningless word tokens) or extremely short words are usually removed. However I omit the steps. When I omitted the data and trained GPT-2 with the file, all formats of resume are gone and readabliity became poor. For example, the sentence,
HTML Experience - Less than 3 months , becomes
html experience less than months after cleaning. It sounds a little bit weird. Also given the lots of abbreviation for developers(e.g. nltk, api), it was unfit to simply cleaning the data because of the length of words.
For example, I’ll show you first resume in DataFrame. I have to remove
* noticing the unordered list, and words generating encoding error.
I considered to remove parenthesis and comma, but I didn’t. Because I thought the meaning of library, package, framework is gone by removing them. So I kept them.
Rather, I thought
, will let users know the format of resume.
The preprocessing is implemented in
import re import string def clean_text(text): text = text.lower() #remove any numeric characters #text = ''.join([word for word in text if not word.isdigit()]) #remove *(asterisk) text = re.sub('\*','',text) #replace consecutive non-ASCII characters with a space text = re.sub(r'[^\x00-\x7f]',r' ',text) #extra whitespace removal text = re.sub('\s+', ' ',text) return text data['cleaned_text'] = data['Resume'].apply(lambda x : clean_text(x))
You can clean the text with regex, regular exprerssion. It looks complicated, but let me explain it easily.
I added the preprocessed data to DataFrame as a new column,
cleaned_text using function
There are several jobs including HR, Arts, Mechanical Engineer in the
Category column. I filtered out
Resume of which
Category belongs to Developer. And then I saved them to text file.
java = data['Category'] == 'Java Developer' testing = data['Category'] == 'Testing' devops = data['Category'] == 'DevOps Engineer' python = data['Category'] == 'Python Developer' hadoop = data['Category'] == 'Hadoop' etl = data['Category'] == 'ETL Developer' block = data['Category'] == 'Blockchain' dt = data['Category'] == 'Data Science' database = data['Category'] == 'Database' dn = data['Category'] == 'DotNet Developer' network = data['Category'] == 'Network Security Engineer' sap = data['Category'] == 'SAP Developer' cleaned_data = data[java|testing|devops|python|hadoop|etl|block|dt|database|dn|network|sap] # Make the resume as one text result = "" for idx, row in cleaned_data.iterrows(): result = result + row['cleaned_text'] + " " # Save the text to a file f = open("/opt/notebooks/developer.txt","w") f.write(result) f.close()
Teachable-NLP is a GPT-2 Finetuning program with a text(.txt) file without writing NLP codes. After training by uploading the preprocessed text file, you can fine-tune the GPT-2 model. I worried the size of data isn’t enough, so I chose medium size of model, and epoch to 3. In TabTab, you can test the model and generate resume.
Write your own perfect Résumé by choosing the most appropriate expressions out of 5 candidate sentences. And then show me your résumé in the Forum