Usually a gets as two (raw) images. I wonder if it would be also possible to train an Auto Encoder to be able to encode the images into a first. Instead of the Siamese Network two images, you could them feed them the corresponding representations of these images. I could imagine that this might work since the representation is sufficient to reconstruct the original image (i.e. it contains most of the information) is also should be sufficient to compare images. But I am not sure about that point since maybe the spatial layout gets lost in the AE process. Please let me know if you are aware of anybody doing that or something related.

My motivation: AFAIK (but please correct me if there is anything smarter, I am generally very interested in that question) after we have trained our Siamese network or some other Similarity metric learning network in order to find the image in our repository to which a given image is most similar to we have to do pairwise comparisons with every single image in our repository. Maybe this costly comparisons could be sped up if the repository is held in compressed form (as latent representation created by the AE).

Source link
thanks you RSS link


Please enter your comment!
Please enter your name here