Some tools for sparse inputs
Most of Deep Networks input are contiguous data: Pictures, sounds, video and so on… This is not a surprise for convolution layers are excellent finding spatial and/or temporal correlations. So why on earth should we use sparse inputs? First, if you want to binarize your data, you may end with very large vector, sparse vectors then may help to storing your data. Secondly, inputs are sometimes intrinsically sparsed!
The goal is to have the following code working with both dense vectors and sparse vectors
assert( type(input) == "table" )
local output = network:forward(input)
local loss = lossFct:forward(output, target)
local dloss = lossFct:backward(output, target)
local dout = network:backward(input, dloss)
Well, torch does not deal well with sparse data. and that will crash.
I have no github for now since I have never uploaded my code before (Well, I was working in Finance… This is not really a open-project world ), yet I will upload the code soon on github. In the meantime, do not be surprised if the code (and the post) is modified!
1 – SparseLinearBatch
It is impossible to use a mini-batch for nn.SparseLinear. The following implementation aims to simulate a mini-batch version. This can be very useful if you want to use mini-batch normalization or if SparseLinear prevent you from using mini-batches to learn other layers. There might be some other way to tackle the issue, but I have not found them. Do not expect any speed improvement for this layer do not parallelize the SparseLinear layer.
A GPU/mini-batch layer is under development there : https://github.com/soumith/cunnsparse
Well, most of the time, sparseLayers are the input layers, thus, the default behavior is to not compute the gradInput to get some speed/memory improvement.One may notice that the backpropagation is very slow because of the function AccumulateSparse. Good news, if you only use sparse layers as input layers you can remove the AccGradParameters methods for it is useless.