You can download the accompanying R script here.
As you can imagine, the Herald and Athena products rely on NLP mining. I have several models using various technologies that contribute to a home-grown Machine Learning implementation. These models have to accomplish 4 main tasks:
- Determine news articles’ value for screen-space
- Figure out what’s going to be the next big news topic
- Be able to plug-and-play in my ML pipeline to provide for self-training and automated execution
- Have the ability to run several models simultaneously and deterministically pick the model that has the most success
These requirements call for complex routines. R is a natural choice because it’s free (a big help), has a large community, is meant for data science applications, and the language is fairly easy to learn. In this post, I will illustrate what I had to do to get R working with WordNet on a Windows desktop (Windows 7, 8.*, and 10). You can also download the script from the above link that takes a file and processes it using the WordNet library in R. So, let’s get started:
- Download the latest version of Java (the link is for the 64-bit version)
- Download WordNet from Princeton
- Setup an environment variable called ‘WNHOME’ that points to the WordNet installation directory. I setup a User variable called ‘WNHOME’ pointing to ‘C:\Program Files (x86)\WordNet\2.1’
- Download R or Revolution R
- Install the WordNet package in R by running this command in the R console window: install.packages(‘wordnet’)
- Create a file foo.txt with an article you want to process. For demonstration purposes, my script uses a local file to process against WordNet. You can use a relational or NoSQL source as you please.
- Download and paste the script in the R script editor (File -> New Script)
- Depending on where you saved your file, you may have to change the working directory on the script in line 2 (setwd(….))
- Now run the script (Edit -> Run All in the R editor window)
Congratulations! You just processed your first NLP script via R!
There’s obviously a lot more than this simple script to build a NLP model in an ML pipeline; hopefully, this article helps an R novice get NLP processing up and running quickly.
Husband, father, and consultant who dabs mostly in Microsoft technologies. Loves tennis – both watching and playing.