Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidy local imports (PyInf#11558) #273

Open
ArvidJB opened this issue Nov 21, 2023 · 9 comments
Open

Tidy local imports (PyInf#11558) #273

ArvidJB opened this issue Nov 21, 2023 · 9 comments
Assignees

Comments

@ArvidJB
Copy link
Collaborator

ArvidJB commented Nov 21, 2023

Currently tidy-imports only optimizes global imports. How difficult would it be to also optimize "local" imports, i.e., imports within a function? Example:

$ cat foo.py
import os

def foo():
    import os
    import sys
    return os.getenv('FOO')
$ tidy-imports foo.py
[PYFLYBY] /tmp/foo.py: removed unused 'import os'
[PYFLYBY] /tmp/foo.py: added mandatory 'from __future__ import annotations'
--- /tmp/foo.py   2023-11-21 16:37:32.473138334 -0500
+++ /tmp/tmpgciyp7p6    2023-11-21 16:38:48.351697332 -0500
@@ -1,4 +1,5 @@
-import os
+from __future__ import annotations
+

 def foo():
     import os

Replace /tmp/foo.py? [y/N]

Here we should

  1. detect that import sys is unused in foo and suggest to remove it
  2. detect that import os is present both at global as well as local scope.
    a. I think here it's correct to remove the global import since the only usage is in a function where os is imported locally.
    b. But in general if there's a global import not covered by local imports everywhere we should remove the local import.

Can you estimate how much work this would be?

@ArvidJB ArvidJB changed the title Tidy local imports Tidy local imports (PyInf#11558) Dec 1, 2023
@Carreau
Copy link
Collaborator

Carreau commented Aug 6, 2024

I've been asked to come back to this.

My feeling with the architecture of pyflyby is that it's going to be complex, and just estimating the work would need significant work to figure out what is needed.

I'll try to look into in over the next few weeks to see what I can do, and try to come up with a plan/estimate.

@ArvidJB
Copy link
Collaborator Author

ArvidJB commented Sep 5, 2024

Hi @Carreau, do you have an estimate how much work this would be?

@Carreau Carreau self-assigned this Sep 11, 2024
@Carreau
Copy link
Collaborator

Carreau commented Sep 11, 2024

I haven't been able to look into it, I'm trying to get the pyinstruments fixes first. I'll put that higher on the priority list.

@Carreau
Copy link
Collaborator

Carreau commented Oct 29, 2024

I'm looking into this.

detect that import os is present both at global as well as local scope. I think here it's correct to remove the global import since the only usage is in a function where os is imported locally.

I believe this is already the case; at least on some example I tried it works.

import os # will be removed; 


def foo():
    import os
    import sys

    return os.getenv("FOO")

I'll edit the original request to split request 2, into 2a and 2b.

@Carreau
Copy link
Collaborator

Carreau commented Oct 29, 2024

More on this also for myself once I resume the work later on.

import os # will be removed; 


def foo():
    import os
    import sys

    return os.getenv("FOO")

Taking the above example, detecting that sys is unused in foo should not be too complicated,
it seem we need to patch _NewScopeCtx context manager to find all the created unused imports when exiting; I have hardcoded modification that does the job;

  • I can do a cleanup and audit all the places where _NewScopeCtx is used; this should be a few more hours 5-8h, so likely done by next week.

With the current hardcoded "Assume sys is unused"; Pyflyby now detects it (at least in debug logs) but warns that import sys is not a global import and thus cannot be removed – so this was foreseen some time ago; This appear to be due to the fact that the source-to-source transformer is not made to remove non-top-level line; I'll look into how to do that.

  • I don't want to estimate that yet as I havent touch the source-to-source transformer so far.

Removing the local import os if the global is used is though on another level of difficulty;
as for a number of other issues an improvement requests there is a big limitation due to pyflyby architecture where the modifications are computed while the ast is walked; by the time we detect that the global import os is used; we already have lost any reference to the local one. This may require more heavy refactor.

I'm also a bit concerned if a local import is made to shadow a variable how this will affects the result.

import os # will be removed; 


def foo():
    os = 1
    import os # we risk marking local os import as unnecessary; but actually it shadows os = 1

    return os.getenv("FOO")


os.stat

I haven't thought on how to take care of this.

  • I suggest we move this to a subtask/separate issue

@Carreau
Copy link
Collaborator

Carreau commented Nov 11, 2024

Modifying non-local imports

I've looked into modifying non-global import more in depth.

Pyflyby SourceToSource transformer has only been designed to only handle top-level refactoring; And this is a deep assumption in pyflyby – the finest level of granularity is PythonBlock which is top-level only.

I think that updating pyflyby to allow non-top-level source updates is quite complicated; likely a multi-month project; and more of a complete rewrite than just an update.

There exists a few more recent frameworks that might allow what you wish;

  • https://github.com/PyCQA/autoflake does remove imports and unused variable; and rely on https://github.com/PyCQA/pyflakes to do so; This is the same ecosystem as https://github.com/PyCQA/flake8 ; I can try to do an in-process autoflake pass in pyflyby; (PyflybyPythonBlock to text, feed the text to autoflake, parse it back) though I'm not sure how much it will keep the formatting Pyflyby try to keep; and it will likely ignore any other pyflyby specific configuration/preference
  • Try a library like libcst, and make pyflyby work on top of it; This is still a multi-month rewrite but we don't reinvent the wheel.
  • The problem I see with both of the above is that the community is already moving aways from these as they are too slow; often in favor of https://github.com/astral-sh/ruff;
  • Ruff has it's own issue as it's written in rust; and sending patches to ruff to allow import aligning like pyflyby is a No as the developers explicitly refuse anything that differs from what black does.

In the options I don't really like;

  • try a heuristic; if module xxx is unused; can the file; and we are close to the correct line number and find ^\W+import\W+xxx$, simply remove the line from the text source.
  • This would not work for multiple import from same submodule:from foo import a, b, xxx, or multiline imports, but we can try to refine the heuristic.

Marking unnecessary local imports

As we talked about last Wednesday, with pyflyby architecture it's hard to prioritize local imports as we don't keep references to the ast while we are walking it; and so in the case:

import sys 


def f():
    import sys # hard to know we can remove as top-level is not yet mark as used.
    print(sys.version_info)

print(sys.argv)

Right now the importChecker keep a set of used/unused imports; I think I might be able to change that to a counter (and store a definite usage counter (sys.argv), and potential usage counter (global import sys would have a +1 potential from sys.version_info); and decide later on which import(s) are unnecessary.

Still keeping the caveat that right now it's unclear if we would be able to automatically remove the non-local imports, or which implementation choice we'd like.

I'll anyway cleanup the updates I have locally, and send a PR that will be a base for the detection of unused local imports.

@Carreau
Copy link
Collaborator

Carreau commented Nov 11, 2024

Removing non top-level import would potentially have another side-effect;

pyflyby currently make the first import in the following as unused:

if sys.version_info >= (3, 10):
    from types import NoneType, EllipsisType
else:
    NoneType = type(None)
    EllipsisType = type(Ellipsis)

As it does not understand control structures; and currently rely on the fact that it's not top-level to not remove it.

I'll have to adapt pyflyby to better track the context in which an import is marked as unused.

@quarl
Copy link
Member

quarl commented Dec 4, 2024

Thanks for investigating & overview.
Can you spec out a proposal for using libcst? And estimate effort to get to proof-of-concept quality?

@Carreau
Copy link
Collaborator

Carreau commented Jan 6, 2025

Just for openness, I will be off work from early March to September for personal reasons if all goes well this year, we are working on finding someone else that could both scope and does the work, as it seem unadvisable for me to take something like this at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants