Grid Long Short-Term Memory by Nal Kalchbrenner, Ivo Danihelka, & Alex Graves. ICLR 2016.
I learned about this paper from this blog post by Christopher Boulez, which I found when trying to better understand how an LSTM might be used to find the parity of a string (generalized XOR).
The paper gives an excellent overview of how LSTM cells can be arranged in a multidimensional grid that can be applied to vector spaces. I'll return to that in a later post, but I wanted to focus on the appendix, which dives into the parity problem. The paper reports that a 1-LSTM network can "learn to compute parity for up to 250 input bits". To achieve this, "the k-bit string is given to the neural network as a whole through a single projection; considering one bit at a time and remembering the previous partial result in a recurrent or multi-step architecture". This shifts the problem from learning k-bit parity to simply learning 2-bit parity (phew!).
A list of hyperparameters are given, which I'll use to implement it, so stay tuned!