Kui Jia
✉, 1
|
|
|
|
Code
[GitHub]
|
Paper
[ECVA]
|
|
|
Diagram for our method of stochastic consensus (STOCO). To implement our proposed consistency criterion, we sample multiple classifiers from a learned Gaussian distribution; for the weakly-augmented version of any unlabeled sample, we calculate the element-wise product of category predictions from these stochastic classifiers and select samples with the maximum value in the product higher than a pre-defined threshold $\tau$; we take an average over the predictions from multiple classifiers, and generate pseudo labels from the thus obtained averages via deep discriminative clustering; then, with these derived targets, the model is trained using the strongly-augmented version of selected samples via a cross-entropy loss. |
Semi-supervised learning (SSL) has achieved new progress recently with the emerging framework of self-training deep networks, where the criteria for selection of unlabeled samples with pseudo labels play a key role in the empirical success. In this work, we propose such a new criterion based on consistency among multiple, stochastic classifiers, termed Stochastic Consensus (STOCO). Specifically, we model parameters of the classifiers as a Gaussian distribution whose mean and standard deviation are jointly optimized during training. Due to the scarcity of labels in SSL, modeling classifiers as a distribution itself provides additional regularization that mitigates overfitting to the labeled samples. We technically generate pseudo labels using a simple but flexible framework of deep discriminative clustering, which benefits from the overall structure of data distribution. We also provide theoretical analysis of our criterion by connecting with the theory of learning from noisy data. Our proposed criterion can be readily applied to self-training based SSL frameworks. By choosing the representative FixMatch as the baseline, our method with multiple stochastic classifiers achieves the state of the art on popular SSL benchmarks, especially in label-scarce cases. |
|
|
Inspired by co-training and tri-training; they leverage category predictions of one or two classifiers on unlabeled samples to enlarge the training set,
wherein a designing principle is based on majority voting that shares a similar insight with the popular techniques of ensemble learning.
|
The deep discriminative clustering algorithm can encourage cluster size balance while respecting the underlying data distribution.
Thus, we use this algorithm to generate pseudo labels for unlabeled samples.
|
We provide theoretical analysis for our method to show its progressively improved classification error by connecting with the probably approximately correct (PAC) learning theory on noisy data.
Three conditions to progressively improve the model are derived from Theorem 1. Our method can satisfy the three conditions, as shown shortly.
|
From the following table, we observe that our method (m = 5) degrades by 4.51% after removing the consistency criterion, and then by 2.02% after successively removing the deep discriminative clustering. This verifies that both components are indispensable and thus our method has a reasonable design. Given that the only difference between STOCO (m = 1) and STOCO (w/o CC) is whether they use a stochastic classifier, the former slightly outperforms the latter, showing the superiority of the stochastic classifier. We report error rates on a single 40-label split from CIFAR-10. STOCO (w/o CC and DDC) removes both consistency criterion among stochastic classifiers and pseudo label generation via discriminative clustering, namely FixMatch. STOCO (w/o CC) removes the consistency criterion only. STOCO (m = 5) is with 5 stochastic classifiers, i.e., our method. |
In the row of test loss, we find that FixMatch suffers a slight rise at the late stage of training whereas our STOCO does not, suggesting that our method indeed has the effect of alleviating overfitting.
For all subfigures, the horizontal axis represents the training epoch; the colors of blue and orange correspond to the results of FixMatch and our method respectively. The results are obtained on CIFAR-10 with 40 (column 1), 250 (column 2), and 4, 000 (column 3) labels. |
We observe that as the learning process proceeds, the average variance gradually decreases, meaning that the discrepancy between stochastic classifiers reduces. This observation also suggests the convergence of the learned Gaussian distribution, which guarantees the stability of model training and performance improvement. Average variance of the learned Gaussian distribution during model training. |
On the extremely label-scarce setting (cf. Figs. a and c), our STOCO yields more similar marginal feature distributions between the training and test data.
The t-SNE visualization of features learned by FixMatch (left two columns) and our STOCO (right two columns). In columns 1 and 3, the colors of red and blue denote the training and test samples respectively; their counterparts whose classes are color-coded are in columns 2 and 4 respectively. Results in these plots are obtained on CIFAR-10 with 40 (a-d) and 250 (e-h) labels. |
Experiments on four typical benchmark datasets have demonstrated that our proposed STOCO outperforms existing methods and achieves the state of the art, especially in label-scarce cases. Error rates for CIFAR-10, CIFAR-100, SVHN, and STL-10. |
@InProceedings{STOCO,
author={Tang, Hui
and Sun, Lin
and Jia, Kui},
title={Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers},
booktitle={Computer Vision -- ECCV 2022},
year={2022},
pages={330-346},
}
Based on a template by Keyan Chen.