NLP Via R, R# (Revolution R) and WordNet

Embed from Getty Images

You can download the accompanying R script here.

As you can imagine, the Herald and Athena products rely on NLP mining. I have several models using various technologies that contribute to a home-grown Machine Learning implementation. These models have to accomplish 4 main tasks:

  1. Determine news articles’ value for screen-space
  2. Figure out what’s going to be the next big news topic
  3. Be able to plug-and-play in my ML pipeline to provide for self-training and automated execution
  4. Have the ability to run several models simultaneously and deterministically pick the model that has the most success

These requirements call for complex routines. R is a natural choice because it’s free (a big help), has a large community, is meant for data science applications, and the language is fairly easy to learn. In this post, I will illustrate what I had to do to get R working with WordNet on a Windows desktop (Windows 7, 8.*, and 10). You can also download the script from the above link that takes a file and processes it using the WordNet library in R. So, let’s get started:

  1. Download the latest version of Java (the link is for the 64-bit version)
  2. Download WordNet from Princeton
  3. Setup an environment variable called ‘WNHOME’ that points to the WordNet installation directory. I setup a User variable called ‘WNHOME’ pointing to ‘C:\Program Files (x86)\WordNet\2.1’
  4. Download R or Revolution R
  5. Install the WordNet package in R by running this command in the R console window: install.packages(‘wordnet’)
  6. Create a file foo.txt with an article you want to process. For demonstration purposes, my script uses a local file to process against WordNet. You can use a relational or NoSQL source as you please.
  7. Download and paste the script in the R script editor (File -> New Script)
  8. Depending on where you saved your file, you may have to change the working directory on the script in line 2 (setwd(….))
  9. Now run the script (Edit -> Run All in the R editor window)

Congratulations! You just processed your first NLP script via R!

There’s obviously a lot more than this simple script to build a NLP model in an ML pipeline; hopefully, this article helps an R novice get NLP processing up and running quickly.

Notes from the field

Shri Bhupathi View All →

Husband, father, and consultant who dabs mostly in Microsoft technologies. Loves tennis – both watching and playing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: