This is a simple ATS(applicant tracking system) checking script for job description keyword. Using package nltk with natural English stopwords inside the package.
Action Verb : Verbs regard as required action verb for job
Noun Word : Nouns regard as main checker for HR
Adj Word : Adj / Adv regard as descriptive word
Noun Phrase : NN + NN + ... + NN or VB + NN
Noun Phrase is always the key qualifications that is requred for HRs, and ATS uses these words
as the keywords to pick up resume from resume pool. Like SEO, that means, the more these keywords
resume has, the more probabilities to be selected by ATS.
When using functions, need to save job description as text file(TXT) on local PCs or URLs.
The result is not so accurate, since :
1. some words have different parts of speech, like team, is a noun and also a verb.
So it will show up in both classification
2. there are some diffculties to figure out if a word is adj or Past participle, like written.
in sentence like "I have written...", it's a p.p., while in sentence like "we need written skills",
it's adj.
3. for some unknown reasons, nltk regards word like "manage","identify" as an adj instead of verb.
I will do further study for this.
Environment :
OS : Windows 10.0.16299 X64
Language : Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
GUI : Spider 3.2.4 / IPython 6.1.0
The scripts and graphs are under MIT License
As a demo here, I use news from cbc :
https://www.cbc.ca/news/canada/toronto/ford-mulroney-toront-gun-violence-funding-1.4778899
and try to extract useful verbs, nouns, adj/adv and noun phrases
@author: Ha
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import string
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk import RegexpParser
from nltk import Tree
def delete_non_callable(x) :
for ch in ["?","/","(",")",",",";","–",":","-"] :
if ch in x :
x = x.replace(ch," ")
return x
# Action Verbs
def action_verbs(path) :
with open(path) as f :
description = f.readlines()
stopwords = nltk.corpus.stopwords.words('english')
keywords = []
for item in description :
item = delete_non_callable(item)
for word in item.split() :
if word.lower() not in stopwords :
keywords.append(word)
# Pos tag words
word_tag = nltk.pos_tag(keywords)
# consider action verbs
ch = ["VB","VBG","VBD","VBN"]
verbs = [i for i,j in nltk.pos_tag(keywords) if j in ch]
Verb_count = pd.DataFrame({
"counts" : pd.Series(verbs).value_counts()})
return Verb_count
Verb_count = action_verbs(path = "../1.txt")
# nouns
def noun_word(path) :
with open(path) as f :
description = f.readlines()
stopwords = nltk.corpus.stopwords.words('english')
keywords = []
for item in description :
item = delete_non_callable(item)
for word in item.split() :
if word.lower() not in stopwords :
keywords.append(word)
# Pos tag words
word_tag = nltk.pos_tag(keywords)
# consider nouns
ch = ["NN","NNP","NNS"]
nouns = [i for i,j in nltk.pos_tag(keywords) if j in ch]
Noun_count = pd.DataFrame({
"counts" : pd.Series(nouns).value_counts()})
return Noun_count
Noun_count = noun_word(path = "../1.txt")
# Adj & Adv
def adj_word(path) :
with open(path) as f :
description = f.readlines()
stopwords = nltk.corpus.stopwords.words('english')
keywords = []
for item in description :
item = delete_non_callable(item)
for word in item.split() :
if word.lower() not in stopwords :
keywords.append(word)
# Pos tag words
word_tag = nltk.pos_tag(keywords)
# consider adj / adv
ch = ["JJ","JJR","JJS","RB","RBR","RBS","VBN"]
adj = [i for i,j in nltk.pos_tag(keywords) if j in ch]
Adj_count = pd.DataFrame({
"counts" : pd.Series(adj).value_counts()})
return Adj_count
adj = adj_word(path = "../1.txt")
# noun phase(NP) cheker (Qualification)
def noun_phrase(path) :
# Defining NP grammer (NN + NN + .... + NN) or VN(VB + NN)
NP = "NP: {(<V\w+>|<NN\w?>)+.*<NN\w?>}"
# grammar from http://www.aclweb.org/anthology/C10-1065
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
chunker = RegexpParser(NP)
# Define function
def np_chunks(text, chunk_func=ne_chunk):
chunked = chunk_func(pos_tag(word_tokenize(text)))
continuous_chunk = []
current_chunk = []
for subtree in chunked:
if type(subtree) == Tree:
current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
elif current_chunk:
named_entity = " ".join(current_chunk)
if named_entity not in continuous_chunk:
continuous_chunk.append(named_entity)
current_chunk = []
else:
continue
return continuous_chunk
with open(path) as f :
np = []
description = f.readlines()
for sentence in description :
splited_description = sentence.split(". ")
#stopwords = nltk.corpus.stopwords.words('english')
for item in splited_description :
item = delete_non_callable(item)
np.extend(np_chunks(item,chunk_func = chunker.parse))
#break
np_word = pd.DataFrame({
"count" : pd.Series(np)
})
return np_word
np_word = noun_phrase(path = "../1.txt")
print("Verb",Verb_count.head(15))
print("-"*50)
print("Noun",Noun_count.head(15))
print("-"*50)
print("Adj&Adv",adj.head(15))
print("-"*50)
print("Noun Phrase\n",np_word)
Verb counts
said 5
getting 2
charged 2
going 2
asked 2
assigned 1
targeted 1
gratifying 1
asking 1
revoke 1
related 1
love 1
supposed 1
pushed 1
according 1
--------------------------------------------------
Noun counts
gun 9
Tory 8
Ford 8
Toronto 5
bail 5
police 5
violence 4
firearms 3
hearings 3
government 3
combat 3
offenders 3
part 3
--------------------------------------------------
Adj&Adv counts
also 7
provincial 4
bail 4
federal 3
new 3
likely 2
legal 2
criminal 2
related 2
bad 2
already 2
responsible 2
charged 2
legal 2
solely 2
--------------------------------------------------
Noun Phrase
count
0 Toronto Premier Doug Ford
1 Ontario Premier Doug Ford
2 funding Thursday
3 Toronto money
4 SWAT teams
5 firearms offences
6 Crown attorneys Ford
7 gun criminals are denied bail
8 Additional bail compliance officers
9 areas welcome police
10 Tory urges Trudeau
11 prevent repeat gun offenders
12 policing gang activity
13 time for action
14 Attorney General Caroline Mulroney
15 Michael Tibollo
16 community safety
17 Toronto Police Chief Mark Saunders
18 police force
19 love boots
20 love having police
21 police chief
22 years Ford
23 Toronto mayor
24 Mayor John Tory
25 increase cash flows
26 help combat gun
27 Prime Minister Justin Trudeau Tory
28 revoke bail opportunities
29 repeat gun offenders
30 statement Tory
31 gun offenders
32 move today
33 city staff
34 re-elected October
35 defence lawyers
36 SWAT teams
37 bail hearings
38 one type
39 said Toronto
40 defence lawyer Shane Martinez
41 Supreme Court
42 bail hearings
43 addition Crown lawyers
44 detention order
45 firearms cases
46 Justices Peace
47 be granted release Martinez
48 part Tory
49 hand guns
50 Ford said Thursday
51 handgun owners
52 month Tory
53 violence reduction plan
54 police officers
These are what I know from the AtS selection before reading news.
a. There are 5 said here, shows that this news came from some interviews or conference meeting.
b. It is announced by Tory Ford, about gun problems or violence happened in Toronto, invoved in government and police.
c. Related to federal and provincial government
d. Keywords :
SWAT teams (maybe a team to deal with gun violence)
Additional bail compliance officers
prevent repeat gun offenders
policing gang activity
community safety
increase cash flows
handgun owners
violence reduction plan