I have two types of labeled datasets:

  1. Sentences with “positive” or “negative” labels.

  2. Isolated words with “positive” or “negative” labels.

In a very simplistic “bag of words” situation, intuitively I think words and sentences should be treated differently, e.g. a positive word coming from the dataset of words should weigh more than the same word that was learned from the sentences. (Not sure this is right or I am missing something).

How would I integrate both datasets into my model? Should I just add entries to my bag of words containing the words from the type 2 dataset? And how would I deal with the negation of those (“” and “not ”, for instance)?

Thank you!

Source link
thanks you RSS link
( https://www.reddit.com/r//comments/9gk4ib/d___with___/)


Please enter your comment!
Please enter your name here