  Example on one Torono News with NLP Algorithm

Posted by Haby on July 1, 2018

This is a simple ATS(applicant tracking system) checking script for job description keyword. Using package nltk with natural English stopwords inside the package.

Action Verb : Verbs regard as required action verb for job
Noun Word : Nouns regard as main checker for HR
Adj Word : Adj / Adv regard as descriptive word
Noun Phrase : NN + NN + ... + NN or VB + NN

    Noun Phrase is always the key qualifications that is requred for HRs, and ATS uses these words
    as the keywords to pick up resume from resume pool. Like SEO, that means, the more these keywords
    resume has, the more probabilities to be selected by ATS.

When using functions, need to save job description as text file(TXT) on local PCs or URLs.

The result is not so accurate, since :

1. some words have different parts of speech, like team, is a noun and also a verb.
   So it will show up in both classification
2. there are some diffculties to figure out if a word is adj or Past participle, like written.
   in sentence like "I have written...", it's a p.p., while in sentence like "we need written skills",
   it's adj.
3. for some unknown reasons, nltk regards word like "manage","identify" as an adj instead of verb.
   I will do further study for this.

Environment :

  OS : Windows 10.0.16299 X64
  Language : Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)]
  GUI : Spider 3.2.4 / IPython 6.1.0

As a demo here, I use news from cbc :

and try to extract useful verbs, nouns, adj/adv and noun phrases

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import string
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk
from nltk import RegexpParser
from nltk import Tree
def delete_non_callable(x) :
    for ch in ["?","/","(",")",",",";","–",":","-"] :
        if ch in x :
            x = x.replace(ch," ")
    return x
# Action Verbs
def action_verbs(path) :

    with open(path) as f :
        description = f.readlines()
        stopwords = nltk.corpus.stopwords.words('english')
        keywords = []
        for item in description :
            item = delete_non_callable(item)
            for word in item.split() :
                if word.lower() not in stopwords :
    # Pos tag words
    word_tag = nltk.pos_tag(keywords)
    # consider action verbs
    ch = ["VB","VBG","VBD","VBN"]
    verbs = [i for i,j in nltk.pos_tag(keywords) if j in ch]
    Verb_count = pd.DataFrame({
            "counts" : pd.Series(verbs).value_counts()})    
    return Verb_count

Verb_count = action_verbs(path = "../1.txt")
# nouns
def noun_word(path) :

    with open(path) as f :
        description = f.readlines()
        stopwords = nltk.corpus.stopwords.words('english')
        keywords = []
        for item in description :
            item = delete_non_callable(item)
            for word in item.split() :
                if word.lower() not in stopwords :
    # Pos tag words
    word_tag = nltk.pos_tag(keywords)
    # consider nouns
    ch = ["NN","NNP","NNS"]
    nouns = [i for i,j in nltk.pos_tag(keywords) if j in ch]
    Noun_count = pd.DataFrame({
            "counts" : pd.Series(nouns).value_counts()})    

    return Noun_count

Noun_count = noun_word(path = "../1.txt")
# Adj & Adv
def adj_word(path) :

    with open(path) as f :
        description = f.readlines()
        stopwords = nltk.corpus.stopwords.words('english')
        keywords = []
        for item in description :
            item = delete_non_callable(item)
            for word in item.split() :
                if word.lower() not in stopwords :
    # Pos tag words
    word_tag = nltk.pos_tag(keywords)
    # consider adj / adv
    ch = ["JJ","JJR","JJS","RB","RBR","RBS","VBN"]
    adj = [i for i,j in nltk.pos_tag(keywords) if j in ch]
    Adj_count = pd.DataFrame({
            "counts" : pd.Series(adj).value_counts()})    

    return Adj_count

adj = adj_word(path = "../1.txt")
# noun phase(NP) cheker (Qualification)

def noun_phrase(path) :
    # Defining NP grammer (NN + NN + .... + NN) or VN(VB + NN)
    NP = "NP: {(<V\w+>|<NN\w?>)+.*<NN\w?>}"

    # grammar from
    grammar = r"""
        {<NN.*|JJ>*<NN.*>}  # Nouns and Adjectives, terminated with Nouns

        {<NBAR><IN><NBAR>}  # Above, connected with in/of/etc...
    chunker = RegexpParser(NP)

    # Define function
    def np_chunks(text, chunk_func=ne_chunk):
        chunked = chunk_func(pos_tag(word_tokenize(text)))
        continuous_chunk = []
        current_chunk = []

        for subtree in chunked:
            if type(subtree) == Tree:
                current_chunk.append(" ".join([token for token, pos in subtree.leaves()]))
            elif current_chunk:
                named_entity = " ".join(current_chunk)
                if named_entity not in continuous_chunk:
                    current_chunk = []

        return continuous_chunk

    with open(path) as f :
        np = []
        description = f.readlines()
        for sentence in description :
            splited_description = sentence.split(". ")
            #stopwords = nltk.corpus.stopwords.words('english')
            for item in splited_description :
                item = delete_non_callable(item)
                np.extend(np_chunks(item,chunk_func = chunker.parse))
    np_word = pd.DataFrame({
        "count" : pd.Series(np)
    return np_word

np_word = noun_phrase(path = "../1.txt")
print("Noun Phrase\n",np_word)
Verb             counts
said             5
getting          2
charged          2
going            2
asked            2
assigned         1
targeted         1
gratifying       1
asking           1
revoke           1
related          1
love             1
supposed         1
pushed           1
according        1
Noun             counts
gun              9
Tory             8
Ford             8
Toronto          5
bail             5
police           5
violence         4
firearms         3
hearings         3
government       3
combat           3
offenders        3
part             3
Adj&Adv              counts
also              7
provincial        4
bail              4
federal           3
new               3
likely            2
legal             2
criminal          2
related           2
bad               2
already           2
responsible       2
charged           2
legal             2
solely            2
Noun Phrase
0             Toronto Premier Doug Ford
1             Ontario Premier Doug Ford
2                      funding Thursday
3                         Toronto money
4                            SWAT teams
5                     firearms offences
6                  Crown attorneys Ford
7         gun criminals are denied bail
8   Additional bail compliance officers
9                  areas welcome police
10                   Tory urges Trudeau
11         prevent repeat gun offenders
12               policing gang activity
13                      time for action
14   Attorney General Caroline Mulroney
15                      Michael Tibollo
16                     community safety
17   Toronto Police Chief Mark Saunders
18                         police force
19                           love boots
20                   love having police
21                         police chief
22                           years Ford
23                        Toronto mayor
24                      Mayor John Tory
25                  increase cash flows
26                      help combat gun
27   Prime Minister Justin Trudeau Tory
28            revoke bail opportunities
29                 repeat gun offenders
30                       statement Tory
31                        gun offenders
32                           move today
33                           city staff
34                   re-elected October
35                      defence lawyers
36                           SWAT teams
37                        bail hearings
38                             one type
39                         said Toronto
40        defence lawyer Shane Martinez
41                        Supreme Court
42                        bail hearings
43               addition Crown lawyers
44                      detention order
45                       firearms cases
46                       Justices Peace
47          be granted release Martinez
48                            part Tory
49                            hand guns
50                   Ford said Thursday
51                       handgun owners
52                           month Tory
53              violence reduction plan
54                      police officers

These are what I know from the AtS selection before reading news.

a. There are 5 said here, shows that this news came from some interviews or conference meeting.
b. It is announced by Tory Ford, about gun problems or violence happened in Toronto, invoved in government and police.
c. Related to federal and provincial government
d. Keywords :

    SWAT teams (maybe a team to deal with gun violence)
    Additional bail compliance officers
    prevent repeat gun offenders
    policing gang activity
    community safety
    increase cash flows
    handgun owners
    violence reduction plan