I’ve seen research going on for using to encode of (in a video, for purposes like classification, etc). I wonder, for in general, is there a limit on how long each example should be? I have a dataset where each point consists of 3 images (each 1-2 seconds apart) and was trying to see if can be used to encode them.

