Part 1. Introduction to Natural Language Processing
To Py or Not to Py, that is the question!
Literally, there are tons of tools to run semantic analysis out there… why should one choose Python, for any Natural Language Processor by any another name would smell just as sweet.
For one, Python makes it easy to do the following:
Tokenizing – Splitting sentences and words from the body of text.
Tag parts of speech
Perform real time semantic analysis on social media
text = "HAMLET
To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;"
Now let’s tag it:
from textblob import TextBlob
text = "HAMLET To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep;"
blob = TextBlob(text)
result = blob.tags
print("{}").format(result)
When you see the result, you will notice that the text was tagged depending on the parts of speech that were used:
Advanced Practical Python #4 Sentiment Shakespearean Analysis
Part 1. Introduction to Natural Language Processing
To Py or Not to Py, that is the question!
Literally, there are tons of tools to run semantic analysis out there… why should one choose Python, for any Natural Language Processor by any another name would smell just as sweet.
For one, Python makes it easy to do the following:
Let’s get started by installing the Python Natural Language Toolkit and TextBlob:
Now open up Python and type the following in the Python Shell:
Now a GUI should pop up! Let’s download the extra components for the Natural Language Toolkit.
Before we go further, we should define some of the “lexicon” used when you clicked download:
Corpus is defined as any body of text, corpa is the plural of corpus.
Ex. A corpa of Scientific American articles can be used in semantic analysis.
Lexicon is the definition of words and their meanings, contingent on the context.
Ex. A zoologist may define “python” differently than a computer scientist
Token is used whenever an entity is broken up by a set of rules. When we turn something into a token, we tokenize it.
Ex. We can treat words in a sentence as a token, we can also tokenize sentences in a paragraph.
On this basis, let’s start our Shakespearan Semantic Analysis!
Part 2. Of tags and words
To begin, head over to MIT’s Hamlet website and copy some lines:
Now let’s tag it:
When you see the result, you will notice that the text was tagged depending on the parts of speech that were used:
If we want to see how many times noun phrases were used, we can use the blob.np_count method:
That returns something like this:
Hey wait, that looks like a dictionary!
You can also create a dictionary based on the word counts:
Doing so returns another frequency distribution of the words:
Part 3. Sentiment Analysis
We will now do sentiment analysis and look for whether or not Hamlet’s soliloquy was positive or negative!
All you need to do is change our result variable to be:
And wallah! A sentiment analysis was done based on the NTLK:
What the results mean is the following:
Part 4. Loop through sentences
Let’s modify our code to run sentiment analysis on each “sentence” (although for Hamlet it is a bit more difficult since there is no punctuation..
All we need to do is change our result to look for sentences:
And then we will use a Python loop to perform sentiment analysis on each sentence: