This is the first in a series of posts related to a project I am working on that uses SyntaxNet, a model for Tensorflow, which is an Open Source library for Machine Learning. SyntaxNet is a tool derived from the field of Natural Language Processing that comes ready for many different tasks. Although SyntaxNet is a very complex set of tools for Neural Networks, I will focus here on its parser, Parsey McParceface, which is said to possibly be the most accurate parser freely available today.

My goal for the series is to create a Python backend for a bot that can be asked about beer-related information in order to analyze beer prices in a city, find craft breweries, festivals, and many other beer-related things. It is quite a big task, so I will slowly be writing more and more posts to follow my project and talk about things that I find.


Intro



If you have not been through any of the Tensorflow tutorials, you definitely should. Although they require a little knowledge of Machine Learning, they explain everything that you need to actually build a network and take you through all the steps. They are pretty easy to setup and get running.

SyntaxNet is not for the feint of heart. It’s definitely not building a network from scratch, but I have now spent a good couple hours battling to set it up and going through forum posts to find answers to why something is failing or how to actually use Parsey McParseface. Here, I will try and take you through the steps and provide you with answers and links to the posts where I found my answers.


Installation



Virtualenv setup

I am setting up a Virtual Environment to manage my installations for this project. If you are kind of new to Python, you must learn this. It seems kind of tricky at first, but it makes managing Python a million times easier, I promise.

To install Virtualenv:

$ pip install virtualenv
$ cd my_project_folder
$ virtualenv venv

virtualenv venv is creating the virtual evironment and setting the name of the environment to venv. So if you don’t like that name and want to call it tensorflow or syntaxnet, you should do that here. I will be using venv.

If you are managing your Python versions yourself (between 2.7 and 3.5), a way to handle that quickly is

$ virtualenv -p /usr/bin/python2.7

SyntaxNet requires Python 2.7!

I have Anaconda for scientific computing. So, I had to change my default Python to 2.7. There are a couple different ways to do this, which can be read in this Stackoverflow post.

Activate your virtual environment

$ source venv/bin/activate


Tensorflow

Now you have a clean environment and we must install everything that we need. Let’s start with Tensorflow virtualenv-installation. You might run into some hiccups, but you can battle the beast.


SyntaxNet

Now you are ready for the process of SyntaxNet installation. If you don’t have Homebrew installed, you need to install it, it will save you life. If you don’t want to, there are other ways to install the packages you need as well.

You must set your JAVA_HOME with JDK 8. Use this post if you have troubles setting the path to 1.8.

Now you can download Bazel with Homebrew:

$ brew install bazel

Do the same for Swig:

$ brew install swig

You should already have the protocal buffers if you installed Tensorflow, but follow the next step in the git page if you need to.

Install asciitree for displaying the demo:

$ pip install asciitree

You should also have numpy, but you could run another pip command just in case:

$ pip install numpy

Now comes the beast. Once you have everything from above you have to build SyntaxNet

$ git clone --recursive https://github.com/tensorflow/models.git
$ cd models/syntaxnet/tensorflow
$ ./configure
$ cd ..
$ bazel test --linkopt=-headerpad_max_install_names \
    syntaxnet/... util/utf8/...

This will a really long time, but it’s worth the wait. We are now ready to start building our system. Continue following my project!