Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for source code check #13

Open
GuillaumeDesforges opened this issue May 13, 2019 · 4 comments
Open

Support for source code check #13

GuillaumeDesforges opened this issue May 13, 2019 · 4 comments

Comments

@GuillaumeDesforges
Copy link

Hi, I have an idea for a feature that would really be helpful, especially in a data science experimentation workflow.

Add an boolean argument to the wrapper function, for instance inspect_source. When it is set to True, use the inspect module to look at the source code of the function and use its hash the same way you do for the function arguments.

Would help a lot !

@shaypal5
Copy link
Collaborator

shaypal5 commented May 30, 2019

Hi @GuillaumeDesforges !

That's a great idea! I would love helping you write and add this to the package, if you want to see this feature come to life. :)

@GuillaumeDesforges
Copy link
Author

GuillaumeDesforges commented Jun 3, 2019

Thanks @shaypal5 for the enthusiastic reply :)

The feature is a bit tricky to implement. It must be well thought to prevent very dangerous situations.

For instance, announcing that the source code is checked for changes before running the caching operations means that the user will expect modifications to cascade. Say you have

from cachier import cachier


def add_some(x):
  return x + 1

@cachier(inspect_source=True)
def some_heavy_operation(x):
  x = add_some(x)
  return x

def run():
  result = some_heavy_operation(1)
  print(result)

run() # prints 2

When changing the value from 1 to 2 in add_some, recomputation is necessary.

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

My guess would be that rather than a boolean parameter inspect_source, it could be preferable to set it to a list of functions and classes to inspect, so that the users can define himself the behaviour.

The more I think about it, the more it feels like a bad idea™ ...

I would be glad to hear your thoughts !

@NickCrews
Copy link

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

In the general case, it would be impossible to follow the chain of functions called and verify that they are the same. This is the Turing problem, you can't test what a program will do without actually running the program.

I would be curious what the exact use case is that you are describing, for instance what inspired you in the first place?

@GuillaumeDesforges
Copy link
Author

GuillaumeDesforges commented Apr 24, 2020

However, we also don't want to systematically check the source code of functions called, especially if they are from a package and do not change, because that would cause a huge overhead.

In the general case, it would be impossible to follow the chain of functions called and verify that they are the same. This is the Turing problem, you can't test what a program will do without actually running the program.

I would be curious what the exact use case is that you are describing, for instance what inspired you in the first place?

Yes, a fully working mechanism wouldn't be possible, but an approximation would be by tracking source code files where possible if ever possibly called, more like what linters do.
Would not be perfect as it would not differentiate function by what they do but how they are written (which is completely different), but would cover most use cases.

The use case is simple. In data science experimentations it is not rare to build brick by brick your experiment, and storing intermediate results helps faster testing the next brick you are building directly on top of the previous functions (instead of writing to some file and loading it manually).

Some tools like DVC provide means to do that, but in a very heavy way in my opinion.

I'm not doing things like that anymore and won't have time to work on it unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants