I have a dataset of sequences of different lengths (think sentences of words). I have a trained model (Variable-Length Markov Chain) which can predict the next item given some starting prefix. E.g. given “The boy ran in the ___”, it can predict “park”.

I want to the of my model. One way I thought of doing so was by taking all the sentences in my train/CV/test set, and cutting off the last word in the sentence, and then comparing against what is predicted, to get the train/CV/test . So I would take the prefix “The man pulled out a ____”, knowing the next word is “gun”, and compare it to the model’s answer. Since there are a lot of words possible, -K accuracy might be a better measure than -1 accuracy.

What is the standard way of doing this?

