After we decided to switch to PyTorch for new experiments, we stumbled about some minor problems. It’s no big deal and the workarounds are straightforward, but one should be aware of them to avoid frustration. Furthermore, it should be noted that the framework is flagged as “early beta” and this is part of the adventure mentioned on the website :-).
We extended an existing model by adding a skip-gram like loss to relate samples with tags both in a positive or negative way. For this, we are using the classical sigmoid + log loss:
sigmoid = torch.nn.functional.sigmoid
dot_p = torch.dot(anchor, tag_p)
loss_pos = -torch.log(sigmoid(dot_p)) #(1)
dot_n = torch.dot(anchor, tag_n)
loss_neg = -torch.log(1 - sigmoid(dot_n)) #(2)
The critical point is log(0), since log is undefined for this input, “inf” in PyTorch, and there are two ways how this can happen:
(1) sigmoid(x) = 0, which means x is a “large” negative value.
(2) sigmoid(x) = 1, which means x is a “large” positive value.
In both cases, -log(y) evaluates to zero and a hiccup occurs which leads to a numerical instability that makes further optimization steps useless.
One possible workaround is to bound the values of sigmoid to be slightly above zero and slightly below one, with eps ~1e-4:
value = torch.nn.functional.sigmoid(x)
value = torch.clamp(torch.clamp(value, min=eps), max=1-eps)
With this adjustment, sigmoid(dot_p) is always slightly positive and (1 – sigmoid(dot_n)) also never evaluates to zero.
It might be possible that pre-defined loss functions in PyTorch do not suffer this problem, but since we usually design our own loss function from scratch, numerical instabilities can happen if we combine certain functions. With the described kludge, we did not encounter problems any longer during the training of our model.
Again, we are pretty sure that those issues are addressed over time, but since PyTorch is already very powerful, elegant and fast, we do not want to wait until this happened. In other words, we really appreciate the hard work of the PyTorch team and since we made a choice to use a framework in an “early-release beta”, it’s only fair to be patient. Of course we are willing to help the project, for example by reporting bugs, but in this case someone else already did it (issue #1835).