You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Or is there a simpler way to train a Conformer model with CTC loss and do block-wise inference for online streaming and transcribing? If so, do we have to segment the labels per block ? (Which goes against the CTC design)
This page from NVidia seems to suggest you just need to partition your data into blocks. Each block doesn't need any causal attention. Is that right? Then can you still train with CTC as usual ?
Can you support streaming conformer like how this paper proposes to do it ?
The text was updated successfully, but these errors were encountered: