Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers



Hui Tang1
Lin Sun2
Kui Jia ✉, 1

South China University of Technology1
Magic Leap, Sunnyvale, CA, USA2
Corresponding author
Code [GitHub]
Paper [ECVA]
Cite [BibTeX]


Teaser

Diagram for our method of stochastic consensus (STOCO). To implement our proposed consistency criterion, we sample multiple classifiers from a learned Gaussian distribution; for the weakly-augmented version of any unlabeled sample, we calculate the element-wise product of category predictions from these stochastic classifiers and select samples with the maximum value in the product higher than a pre-defined threshold $\tau$; we take an average over the predictions from multiple classifiers, and generate pseudo labels from the thus obtained averages via deep discriminative clustering; then, with these derived targets, the model is trained using the strongly-augmented version of selected samples via a cross-entropy loss.



Abstract

Semi-supervised learning (SSL) has achieved new progress recently with the emerging framework of self-training deep networks, where the criteria for selection of unlabeled samples with pseudo labels play a key role in the empirical success. In this work, we propose such a new criterion based on consistency among multiple, stochastic classifiers, termed Stochastic Consensus (STOCO). Specifically, we model parameters of the classifiers as a Gaussian distribution whose mean and standard deviation are jointly optimized during training. Due to the scarcity of labels in SSL, modeling classifiers as a distribution itself provides additional regularization that mitigates overfitting to the labeled samples. We technically generate pseudo labels using a simple but flexible framework of deep discriminative clustering, which benefits from the overall structure of data distribution. We also provide theoretical analysis of our criterion by connecting with the theory of learning from noisy data. Our proposed criterion can be readily applied to self-training based SSL frameworks. By choosing the representative FixMatch as the baseline, our method with multiple stochastic classifiers achieves the state of the art on popular SSL benchmarks, especially in label-scarce cases.



Background & Motivation


Diagram of combining self-training and consistency regularization. Recent advances achieve semi-supervised learning (SSL) by combining multiple SSL techniques, e.g., self-training and consistency regularization. The selection criteria used in existing methods are usually based on confidence filtering of pseudo labels, where the unlabeled samples with high confidence remain and others are discarded. We in this work show that the selection criterion can be further improved for better SSL.



Highlights

Consistency Criterion among Stochastic Classifiers

Inspired by co-training and tri-training; they leverage category predictions of one or two classifiers on unlabeled samples to enlarge the training set, wherein a designing principle is based on majority voting that shares a similar insight with the popular techniques of ensemble learning.

Specifically, we sample multiple classifiers from a learned Gaussian distribution; for the weakly-augmented version of any unlabeled sample, we calculate the element-wise product of category predictions from these stochastic classifiers and select samples with the maximum value in the product higher than a pre-defined threshold τ.

Pseudo Label Generation via Discriminative Clustering

The deep discriminative clustering algorithm can encourage cluster size balance while respecting the underlying data distribution. Thus, we use this algorithm to generate pseudo labels for unlabeled samples.

Specifically, we take an average over the predictions from multiple classifiers, and generate pseudo labels from the thus obtained averages via deep discriminative clustering; then, with these derived targets, the model is trained using the strongly-augmented version of selected samples via a cross-entropy loss.

Theoretical Analysis

We provide theoretical analysis for our method to show its progressively improved classification error by connecting with the probably approximately correct (PAC) learning theory on noisy data. Three conditions to progressively improve the model are derived from Theorem 1. Our method can satisfy the three conditions, as shown shortly.

Our theoretical analysis connects noisy label learning to SSL, and can serve as a general analytical method for pseudo-labeling based SSL frameworks including our STOCO.



Experiments

Ablation Studies

From the following table, we observe that our method (m = 5) degrades by 4.51% after removing the consistency criterion, and then by 2.02% after successively removing the deep discriminative clustering. This verifies that both components are indispensable and thus our method has a reasonable design. Given that the only difference between STOCO (m = 1) and STOCO (w/o CC) is whether they use a stochastic classifier, the former slightly outperforms the latter, showing the superiority of the stochastic classifier.


We report error rates on a single 40-label split from CIFAR-10. STOCO (w/o CC and DDC) removes both consistency criterion among stochastic classifiers and pseudo label generation via discriminative clustering, namely FixMatch. STOCO (w/o CC) removes the consistency criterion only. STOCO (m = 5) is with 5 stochastic classifiers, i.e., our method.

Learning Analyses

In the row of test loss, we find that FixMatch suffers a slight rise at the late stage of training whereas our STOCO does not, suggesting that our method indeed has the effect of alleviating overfitting.

As the training process proceeds, our STOCO has an increasing mask rate, and its noise rate and mislabeled number decrease, indicating that the three conditions are satisfied; notably, our method achieves a much lower mislabeled number than FixMatch on all label settings, verifying that the strength of STOCO comes from better noise reduction. These observations corroborate our theoretical analysis.


For all subfigures, the horizontal axis represents the training epoch; the colors of blue and orange correspond to the results of FixMatch and our method respectively. The results are obtained on CIFAR-10 with 40 (column 1), 250 (column 2), and 4, 000 (column 3) labels.

Classifier Variance

We observe that as the learning process proceeds, the average variance gradually decreases, meaning that the discrepancy between stochastic classifiers reduces. This observation also suggests the convergence of the learned Gaussian distribution, which guarantees the stability of model training and performance improvement.


Average variance of the learned Gaussian distribution during model training.

Feature Visualization

On the extremely label-scarce setting (cf. Figs. a and c), our STOCO yields more similar marginal feature distributions between the training and test data.

In Fig. b of FixMatch, two different classes wrongly merge into one cluster, e.g., red airplane and purple ship. A possible reason is that the shapes of a ship and a plane with its wings removed and the backgrounds of sky and sea are visually similar. In contrast, our STOCO separates these ambiguous classes in the feature space (cf. Fig. d), demonstrating that our method can learn more discriminative features.


The t-SNE visualization of features learned by FixMatch (left two columns) and our STOCO (right two columns). In columns 1 and 3, the colors of red and blue denote the training and test samples respectively; their counterparts whose classes are color-coded are in columns 2 and 4 respectively. Results in these plots are obtained on CIFAR-10 with 40 (a-d) and 250 (e-h) labels.

Comparison with SOTA

Experiments on four typical benchmark datasets have demonstrated that our proposed STOCO outperforms existing methods and achieves the state of the art, especially in label-scarce cases.


Error rates for CIFAR-10, CIFAR-100, SVHN, and STL-10.



BibTeX

  	
    @InProceedings{STOCO,
	author={Tang, Hui
	and Sun, Lin
	and Jia, Kui},
	title={Stochastic Consensus: Enhancing Semi-Supervised Learning with Consistency of Stochastic Classifiers},
	booktitle={Computer Vision -- ECCV 2022},
	year={2022},
	pages={330-346},
	}
  	
      


Acknowledgements

Based on a template by Keyan Chen.