orange iconOperational Response Level: Restricted ›

Computer and Information Systems Technology

Natural Language Processing

September 10, 2018

Project Term: Spring/Summer 2018

This Natural Language Processing (NLP) project was a research/survey of NLP resources and algorithms that exist that can be used towards a commercial application one of our ACE Project Space partners was working on.

This project was an interesting foray into Machine Learning that the faculty at the ACE Project Space felt was well suited to a Business Technology Management (BTM) student in the space. We were very fortunate that one of our students was pursuing diplomas in both Business Information Technology (BIT) and BTM. Having strong skills in programming and analysis, he was really able to make the most of this project. The general goal of the project was to explore how NLP can be used to determine user intentions based on their use of language.

Starting out on the project, our student didn’t know what NLP was and had no prior knowledge of Python. Luckily, our research coordinator, Elsanussi Mneina, was versed in Machine Learning, specializing in linguistics and NLP. After an introduction between our coordinator and our student, our student felt supported on this new and uncertain project. Elsanussi was instrumental in pointing our students towards resources, such as the NLTK library, and shared some of his own experiences related to Data Science and NLP.

In the early weeks, our student was able to teach himself Python, relying on his programming skills from BIT and applying principles of Object Oriented programming learned in the BIT and BTM programs. While picking up Python, he used resources from BeautifulSoup, Tweeter API, and JSON to understand reading and fetching data from various sources.

Our student was able to identify and explore the area of Sentiment Analysis as a way to classifying text, rating a sentence, paragraph and words whether if it was negative, neutral, positive, and so on. Using NLTK our student explored tokenizing text, identifying stop words (words that are irrelevant to the sentence), and determining which parts of speech held the most importance in a sentence. He also explored and tested similar classifiers and their suitability to the project scope, including NLTK SentiWordNet, NLTK VADER Sentiment Intensity Analyzer, NTLK Sentiment Analyzer and Textblob Sentiment Analysis. His goal was to determine if any of the classifiers were customizable if they could be used to classify text, and if they could be trained.

Technologies used:
NLTK, TextBlob, BeautifulSoup4, Pandas, Jupyter, JSON, Unicode, Notepad++, python 3.5 (32bits)