I submitted a PR #516 to fix a bug with the sample code. the code has a bug that will set output of last layer (prior to softmax) to one less than classes and that breaks training.