Induction vs. Transduction

The idea of generalization is very important for the domain of machine learning, because once we trained a model with the data at hand, we would like to make estimations for new data that was not available or seen during training. Stated differently, this is related to induction because we infer global patterns from local observations. And if we succeeded to capture the correlations, a trained model can actually make (useful) predictions for new data.

Transduction has not such a clear definition and actually, there are very few sources at all. Before we start with more details, it should be noted that transduction does not allow to make reliable estimations for new data. In very simple words, we use a smaller (labeled) training set and a larger (unlabeled) test set to learn the structure of the data. Both are specific data sets and are used by the training. Therefore, a new test set might need a re-training of the model.

Let’s start with an example: we have a small training set with labels and we have a larger test set without labels. In case of (I) we would train a classifier with the training data and predict the labels for the test set. However, since the training set is too small it is unlikely that the model will capture the structure of the data and thus, the generalization accuracy of the model is likely to be mediocre at best.

If we use (T), the idea is to use the train + test data to help the supervised model to cluster the data and to improve the predictions. How is that possible? The idea is related to semi-supervised learning where unlabeled data is also used to guide the supervision. We assume that the test data forms natural clusters and that we can -somehow- uild a similarity matrix -W- for all pairs of test samples. Then, we introduce a second objective to ensure that similar test samples i,j stay close in the embedding space:
L_un = W_i_j * ||f_i – f_j||**2, where f_{i,j} is the embedding.

Combined with the supervised objective L_su, the labeled instances need to be correctly predicted which ensures that there is a margin between the embedding of different labels.

Now, if we combine both objectives L_su + alpha * L_un, and the data is consistent, both signals amplify each other. In other words, if all labels are correctly predicted, and similarities of all pairs with the same label are closer than with any other label, the result will be a valid clustering of the data which further helps to correctly predict the unlabeled data.

This makes also clear, why the test set is fixed, since it is used to shape the embedding space. After training, the classification of a new test set might work, but only if the structure strongly resembles the one used during training. Otherwise, the results might be different and could be therefore inconsistent.

Bottom line. The question is why should we use transduction at all? First the problem we solve is easier. Instead of solving a prediction for all instances that exist, we restrict the training to all available instances. Without a doubt this is also a limitation, since often the instance space is growing and a re-training is required from time to time. However, in contrast to, for instance, web shops, where new products arrive very often, the frequency of new movies/series is much lower and there is also fewer variance with respect to high-level topics of the items. Thus, transductive approaches can be beneficial to make the optimization easier and to improve the precision.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s