Semi Supervised Learning

Machine Learning has been broadly classified into three types i.e. Supervised Learning, Unsupervised Learning and Reinforcement Learning. 

As we know that there is a basic difference between Supervised Learning and Unsupervised Learning Supervised learning has a labelled output while an Unsupervised Learning dataset does not have that. And there is also a major drawback in Supervised learning is that the dataset has to be hand-labelled either by a Machine Learning Engineer or a Data Scientist. This process is very costly especially when dealing with a larger volume of data. The most basic disadvantage of any Unsupervised Learning is that its application spectrum is limited. To deal with these disadvantages semi-supervised learning came into the picture. 

In semi-supervised learning, all the algorithm is trained upon both types of data i.e. labelled and unlabelled data. It contains a combination of a very small amount of labelled data and a very large amount of unlabelled data. In this, the basic producer is that the programmer will cluster all the similar types of data using unsupervised learning algorithms and then label unlabelled data using labelled data. It is why label data is a comparatively, more expensive acquisition than unlabeled data.

 Assumption

  • Continuity Assumption-This assumption shows that the object or points near each other tend to share the same label or group.
  • Cluster assumptions-In this assumption data can be divided into discrete clusters and points in the same cluster are more likely to share an output label.
  • Manifold Assumption-In this assumption, it helps to use distances and densities, and this data lie on a manifold of fewer dimensions than input space.

Use Cases

  1. Speech Analysis: It is a classic example of Semi-Supervised Learning. Labelling audio data is a very intensive task that requires many human resources. This problem was solved by applying Semi-Supervised Learning.
  2. Internet Content Classification: Labelling every webpage on the internet is very critical and impossible it needs mode human intervention. This problem can also be reduced by applying Semi-Supervised Learning.
  3. Protein sequence classification: DNA strands are larger, so they also require active human intervention. The rise of Semi-Supervised learning has been imminent in this field.