I have two types of labeled datasets:
Sentences with “positive” or “negative” labels.
Isolated words with “positive” or “negative” labels.
In a very simplistic “bag of words” situation, intuitively I think words and sentences should be treated differently, e.g. a positive word coming from the dataset of words should weigh more than the same word that was learned from the sentences. (Not sure this is right or I am missing something).
How would I integrate both datasets into my model? Should I just add entries to my bag of words containing the words from the type 2 dataset? And how would I deal with the negation of those (“great” and “not great”, for instance)?