-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XAFS/pre_edge.py _finde0 recieves numpy arrays and pandas series #544
Comments
@jonvanbuskirk0 yeah, Pandas Series work as Numpy arrays until they don't. ;) Larch does have pandas as a dependency, but it is currently only used in one place, and not for working with Series or Dataframes, but for Styling output... Pandas is not anywhere in the
which might also work with other "almost-like-a-numpy-array" objects, including xarray and plain lists. The challenge is how many functions need to do these tests. I'll guess that |
@jonvanbuskirk0, as already said by Matt, we do not really support Pandas objects as input, so the quick solution to your issue is to simply pass a Numpy array to @newville I think we should remove the Pandas dependency, but this could be discussed at a next developers meeting.
Personally, I am reluctant in manually type checking into the code because there is an elegant solution already provided by Python, which is the use of type hinting. We should go for it and convert as much as possible Larch code with this coding style. For example, in the particular case of import numpy as np
from typing import Union, Any
def _finde0(energy: np.ndarray[np.float64, Any], mu_input: np.ndarray[np.float64, Any], estep: Union[float, None] = None, use_smooth: bool = True): @newville what do you think? |
@maurov Working toward dropping Pandas (it is used only in I am OK with using type hints here and other places. I am not opposed - they just don't come that naturally to me, and I've been a slow adopter. Maybe using them more is a good New Year's goal. Many of the "xafs functions" allow a Group or arrays. This could also be handled with type hints or a more complex decorator. I am also OK with saying that So, I would say "all of the above". |
@newville I also agree on all. I will add here only few comments, but for the sake of coordination and clarity, I will open specific issues. I will assign to myself those I can handle by next release:
Yes, type hints will be very beneficial in validation and testing. I would push for it. Furthermore, we could use
Yes, |
@maurov I like all of that. I would be OK with moving toward a more formal "XASGroup" (or similar name), and also a separate "XRFGroup" and/or "XESGroup" (where, for example, the array called "energy" might mean "emitted" instead of "incident"), and have "the xafs functions" work only on XASGroup. I would still want to keep plain functions over methods for back-comppatibility: call autobk on an XASGroup rather than create an XASGroup and call its autobk method. Having both would be OK. And, right, "remove_nans2()" sort of became "clean input arrays" -- maybe that should be clearer and either a decorator or method of XASGroup.... |
Yes, this is what I have in mind too. The I may propose closing this very specific issue and discuss this in a separate thread. |
Hi,
Thanks for curating such a great set of tools. I've been using the XAFS package, and running into an error with the find_e0 function, which redirects to the _finde0 function, and hits the error on line 80 of pre_edge.py, which is "mu = mu_input[ordered]". It seems that I was accidentally passing pandas data series, and the find_e0 function was only converting these to numpy arrays some of the time. The function accepts either a numpy array (seemingly the intended data type) or a pandas series, which causes a particularly undescriptive error. I've found that importing pandas into pre_edge.py and adding an if statement to check for pandas series and convert to numpy array fixes the error. The if statement that works for me is:
import pandas as pd # placed at the front of the file, and quite slow, unfortunately
if isinstance(mu_input, pd.core.series.Series): # check if pandas series
mu_input = np.array(mu_input.values) # extract values and convert to numpy array
placed directly before line 80 in pre_edge.py
Of course it would be best to simply only pass numpy arrays, but this addition could be very useful for minimizing the debugging others might have to do.
Thanks!
The text was updated successfully, but these errors were encountered: