I ran out of memory running my network on a single GPU(12 GB memory). My network is a with n timesteps, and at each timestep, I pass a single image, for now.
At each timestep, all the network weights will be same but the feature maps generated would lead to large memory usage.

Should I execute each timestep in a different GPU, or should I split the network into layers/components and execute all the timesteps of that component on a single GPU.

Source link
thanks you RSS link
( https://www.reddit.com/r//comments/7v6g6g/d__convlstm_into_multiple_/)


Please enter your comment!
Please enter your name here