-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WEASEL+MUSE large number of features #102
Comments
Actually it's not that surprising, because this algorithm (in a nutshell) mainly consists in extracting many features and filtering in the best ones. Also, if the window size is very large (compared to the number of time points), the algorithm can only extract a very small number of subsequences for this window size, and since this algorithm counts the number of words (each subsequence is transformed into a word), the number of non-zero values is very small, while the number of features is very large. You have two main approaches to decrease the number of features:
Hope this helps you a bit. |
Thank you for the explanation. I was able to reduce the feature space size by adjusting values as you mentioned, decreasing word size, and setting 2 values for the window instead of a range. This does however affect the model accuracy for classification. I found that when I keep the 650,000 features I get excellent accuracies but lower other wise. |
Great if it's working well with the first set of values for the hyper-parameters. I don't know if it's necessary to mention it, but it's mandatory to perform cross-validation to evaluate a model: it's really easy to overfit any machine learning algorithm on a dataset of 1,500 samples and 650,000 features. |
Description
When using WEASELMUSE for multivariate time series classification the result of the trandformer give a very large number of features 650,000. Also, the number of counter in the histogram is sometime zero for some examples. Is this expected?
Steps/Code to Reproduce
I used my own data set the result of X_weasel was an ndarray size 1500 x 650000.
The 1500 makes sense as this is the number of examples I had, but the 650000 seems large.
I use the following code below. Also, when using the same code in the example when loading basic motions I get similar results. Large number of feature and some examples with all zeros. Thus if I plot the histogram there is nothing to plot.
Versions
NumPy 1.20.3
SciPy 1.6.3
Scikit-Learn 0.24.2
Numba 0.53.1
Pyts 0.11.0
The text was updated successfully, but these errors were encountered: