Why it isn’t difficult to train a Neural Network with a dynamic structure anymore!
The open source community finally addressed the demands for Dynamic Structures in Neural Networks. We saw 3 major library releases in the last 3 months that support Dynamic structure.
- Tensorflow Fold (Google)
- Dynet (CMU)
- Pytorch (Twitter, NVIDIA, SalesForce, ParisTech, CMU, Digital Reasoning, INRIA, ENS)
At Innoplexus, we compile information from structured, unstructured and semi-structured sources to assist our customers in making real time decisions. To achieve this speed, we convert text in natural language from unstructured sources to a properly structured format. Since speed is a major bottleneck, our NLP systems are based on the recurrent structure of language due to the ready availability of tools and distribution of computation over multiple machines.
Over the course of time we realized the limitations of recurrent approaches like LSTM and GRU which try to fit recursive natural language into a sequential framework. This leads to loss of syntactical information in information processing. Unfortunately, implementing recursive neural networks from scratch can turn out to be a nightmare since it involves writing complex backprop code with a very high precision.
The majority of ML libraries like Tensorflow, Torch or Theano allow the creation of a static network which restricts the change in the structure of the network as a function of the input. This turns out to be a significant limitation in Natural Language Processing where syntactic information is encoded in a parse tree which varies as a function of the input text. Many applications like Syntactic Parsing, Machine Translation and Sentiment analysis require syntactical information along with semantic. Due to the unavailability of any framework, developers use to end up implementing a training procedure in Numpy. This turns out to be very error prone & tedious and have to be performed with high precision.
We faced a similar problem while implementing ‘Entity Extractor’ at Innoplexus. It uses semantically united recursive neural nets which have a tree like structure. Due to the unavailability of any framework that supports dynamic structure, we ended up implementing it in Tensorflow. This caused heavy loads on our computational graph which made our training process slow and memory inefficient. Moreover, deciding a batch size to flush the graph became a critical question to the training process. Just when we were about to rewrite the entire training procedure in Numpy to speed things up, we came across Dynet.
DyNet (formerly known as cnn) is a neural network library developed by Carnegie Mellon University and many others. It is written in C++ (with bindings in Python) and is designed to be efficient when run on either CPU or GPU, and to work well with networks that have dynamic structures that change for every training instance.
We refactored our code in Dynet with petite modification to our Tensorflow Code. Dynet isn’t as mature as Tensorflow in terms of functions available, therefore we ended up writing our implementation for Tensorflow counterpart. Alternatively, PyTorch is more mature and supported by a wider community.
Google recently launched Fold which encompasses a wider array of Python objects than Tensorflow. It provides support for structured data, such as nested lists, dictionaries, and protocol buffers. This overcomes the static graph limitation of Tensorflow. It’s approach is entirely different from PyTorch/Dynet. It uses dynamic batching to parallelize the operations in graphs of multiple instances. Look into it, it’s pretty cool.
In the space of NLP where language can come in various expression lengths, dynamic computational graphs are essential. One can just imagine how grammar is parsed to realize the need for a stack and therefore dynamic memory and thus dynamic computation. This significant development is summarized aptly by Carlos E. Perez in his post.
With this development, it would not be unreasonable to expect that Deep Learning architectures will traverse the same evolutionary path as traditional computation. That is, from monolithic stand-alone programs to more modular programs. Introducing dynamic computational graphs are like introducing the concept of procedure when all one previously had was “goto” statements. The concept of procedure helps us write our programs in a composable manner. One of course can argue that DL architectures have no need for a stack, however one only needs to see recent research on HyperNetworks and Stretcher networks. There are networks in research were context switching like a stack appears to be effective.
We are using these libraries to refactor our code to move from recurrent systems to recursive systems with minor modifications. This provided us with tremendous improvement in our existing model as well as enabling us to solve problems that were previously out of reach. I hope this helps you in making the same shift as we did!
Tanay Gahlot is a Computer Scientist who trains machines for living. He is a Computer Science graduate from NIT Goa and an ex-Google Student ambassador.