I found this relevant information – An Overview of Ensemble Methods for Binary Classifiers in Multi-class Problems: Experimental Study on One-vs-One and One-vs-All Schemes
The following problem is pertaining to Text Classification.
Problem: Let’s say I have a text dataset which I want to bifurcate into classes A, B, C, and Others. Around 60% of the text is going to be Others (Non A, B, C class) and the rest 40% is distributed among A,B and C, not mutually exclusive! That is, a sentence can belong to class A & B or A & C and so on.
Further the annotated dataset is in the form of three CSVs – A vs ~A, B vs ~B and C vs ~C. Note that the sentences belonging to ~<Class_Name> in each of the CSVs is always large.
At this juncture, where the three binary classifiers are already built and which are working ok-ish right now (as per requirements) –
How would you suggest to combine the classifiers into one multi-class classifier? That is where the linked resource comes in. Multi-label would be one step ahead.
The second alternative could be merge the 3 different CSVs into one CSV and start building models based on this aggregated data which would ultimately lead to building a multi-class classifier. But still the problem of multi-label remains. I have no idea how that should be approached.
Has anyone encountered such a situation before? Appreciate the help 🙂