You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thank you for implementing the v2 of this paper and maintaining.
** warning - I am mainly a keras/tf user **
If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.
From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be :
find offsets
fetch the feature space per filter pixel ( should be [batch_size x height x width x features x filters size]
multiply each feature by the relevant weight
This way two nearby pixels in the latent space can overlap if they wanted.
Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong.
Thanks,
Dan
The text was updated successfully, but these errors were encountered:
Hi Dan, I totally agree with your thoughts and three steps above. And the implement code is same as your idea that the offsets should be unique for each conv filter pixel.
I'd like to remind you that in the code, after reshape the size, the x_offset ' shape is [ b, c, hkernel_size, w kernel_size ]. And finally with a conv layer (stride is same as kernel_size), the output can keep the same shape as the input x which is [ b, c, h, w].
:)
First of all thank you for implementing the v2 of this paper and maintaining.
** warning - I am mainly a keras/tf user **
If I am reading this correctly x_offset is the original latent space warped ( non-rigidly ) by offsets that were found by p_conv. So x_offset is of shape [batch_size x height x width x features]. The warp happens only in the height and width dimensions ( naturally ) . You then use a regular convolution on top of that.
From reading the paper I think the author intended that the offsets be unique for each filter pixel. That is that the procedure should be :
This way two nearby pixels in the latent space can overlap if they wanted.
Am I wrong? It seems like all of the implementations online do something similar to what you did so I assume I am wrong.
Thanks,
Dan
The text was updated successfully, but these errors were encountered: