You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! If the model mentioned is CohereForAI/c4ai-command-r-v01, we believe it's possible. It uses typical RoPE. We quickly checked its implementation in Hugging Face's Transformers library. It looks pretty similar to Llama. You can refer to our Llama implementation to modify Cohere's code.
One thing that could matter is that CohereForAI/c4ai-command-r-v01 uses a very large RoPE theta—8,000,000.0, which is much larger than that of other models. This may cause the empirical rule for selecting good hyperparameters (group size, neighbor window) to fail. You may need to try several combinations to find a better one.
Is it possible to adapt this to cohere command-r models ?
The text was updated successfully, but these errors were encountered: