Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search name suffixes #149

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

sizmailov
Copy link
Contributor

@sizmailov sizmailov commented May 21, 2020

Adds extra entries to search index according to camel and snake case styles.


v         v                     v                    # current serach start points
my_module.my_awesome___sub__module.StrangeGooFactory
   ^         ^         ^    ^             ^  ^       # added by this PR
   1         2         3    4             5  6

In this example only 6 entries will be added to index so index size grows linearly:

1 module
2 awesome___sub__module
3 sub__module
4 module
5 GooFactory
6 Factory

This PR is incomplete as don't have configuration options, etc. I think you have more sense of how it should be done.

demo

Peek 2020-05-22 02-10

@codecov
Copy link

codecov bot commented May 21, 2020

Codecov Report

Merging #149 into master will increase coverage by 0.01%.
The diff coverage is 97.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
+ Coverage   98.10%   98.11%   +0.01%     
==========================================
  Files          27       27              
  Lines        6477     6671     +194     
  Branches       44       44              
==========================================
+ Hits         6354     6545     +191     
- Misses        123      126       +3     
Impacted Files Coverage Δ
documentation/python.py 98.89% <94.73%> (-0.32%) ⬇️
documentation/doxygen.py 98.98% <100.00%> (-0.22%) ⬇️
plugins/m/code.py 92.21% <0.00%> (+0.09%) ⬆️
plugins/ansilexer.py 100.00% <0.00%> (+12.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09db97d...43ff8e3. Read the comment docs.

@mosra mosra mentioned this pull request Jun 8, 2020
@sizmailov
Copy link
Contributor Author

This PR is marked as draft primarily because it unconditionally adds extra entries to search index without possibility retain older behavior. It might be undesirable for some users. I'm not sure how this part should be approached.

@mosra
Copy link
Owner

mosra commented Jun 9, 2020

Hm, forgot to reply here of all things. Sorry.

What's the underlying motivation here? Do you really want to be able to search by any word contained in each symbol, or was it (as you mentioned on gitter) mainly just a clear well-defined set of common prefixes that you want to strip (such as get_, set_, mylib_, pybind_ ...)? If it's any word, then I'm afraid bigger projects will have the search data size massively increased and running into problems like in #123 or #150.

As mentioned in #155, I'm planning to add an option listing prefixes that should be optional in the search, and then investigate implementation of fuzzy search based on initial letters (so in the above case you'd type for example myawesum to get to my_awesome_sub_module). That should be I think a good middle ground and short searches like su or mo won't list a ton of irrelevant results when searching for sum() or mod(). What do you think? :)

@sizmailov
Copy link
Contributor Author

My use case is close to any word search:

image

I agree, search result gets bigger and it makes harder to navigate through. Results sort algorithm needs an adjustment for this change (or upcoming fuzzy search).

The index gets bigger. By how much? Approximately by average number of words per symbol in initial table, so its in a range from 1 to 4, most likely close to 2. So it's not that big.

Even with those downsides accessibility is much higher. I think this PR can be merged only as temporal solution for relatively small projects until proper implementation arrives. Of course this option should be disabled by default.

@crisluengo
Copy link
Contributor

I tried this, it works wonderfully! Many thanks!

For my project, this is exactly the right solution to the biggest issue with search in m.css.

image

@sizmailov
Copy link
Contributor Author

Indeed, with this PR magnum docs hits limits just like in #123 even with changes from search-32 branch, so this is definitely not a perfect/uniform solution.

@mosra
Copy link
Owner

mosra commented Jun 10, 2020

I need to switch back to working on magnum so I can't put more time into this right now, but what I'm planning to do is adding a generic hook for processing/filtering search results, which would allow this to be done from the user side. Hastily written example:

# Filter function signature:
#
# path is the fully qualified name, for std::string::replace() it would be
#   ['std', 'string', 'replace']
# entry_type.name is one of 'PAGE', 'NAMESPACE', 'GROUP', 'CLASS', 'STRUCT',
#   'UNION', 'TYPEDEF', 'DIR', 'FILE', 'FUNC', 'DEFINE', 'ENUM', 'ENUM_VALUE',
#   'VAR'
#
# All parameters are passed as keyword args and more may be added in the 
# future, use **kwargs to ignore the ones you're not interested in.
#
# Returns a list of prefix lengths, empty list means the name shouldn't be 
# added to the search data. The implicit filter returns [0], meaning the result 
# is added just once, with no prefix stripped.
def filter(*, path: List[str], entry_type: EntryType, **kwargs) -> List[int]:
    return [0]
        
# Implement your filter func and put it to SEARCH_PREFIX_FILTER in conf.py
SEARCH_PREFIX_FILTER = my_filter_func

# Examples:

# Add everything except files and dirs beginning with impl_ to search data
def no_file_filter(name, entry_type, **kwargs):
    if entry_type.name in ['FILE', 'DIR'] and name.startswith('impl_'): return []
    return [0]

# Strip library namespace and get_ / set_ prefixes from names
def namespace_getset_prefix_filter(name, **kwargs):
    for i in ['mylib_', 'get_', 'set_']:
        if name.startswith(i): return [0, len(i)]
    return [0]

# Allows to search for parts of a snake-case name (I hope I copied this 
# correctly from the diff).
_snake_case_point_re = re.compile('_[^_]')
def snake_case_filter(name, **kwargs):
    return [0] + [m.start(0) + 1 for m in _snake_case_point_re.finditer(name)]

Further suggestions / ideas welcome. In particular, the fuzzy search suggested by @thomthom in #155 can't be implemented this way, that has to be done differently.

@sizmailov
Copy link
Contributor Author

sizmailov commented Jun 12, 2020

Filtering mechanism is a nice feature! The only thing I want to add is ability to combine two or more filters together, i.e. make interface composable.

#155 can't be implemented this way, that has to be done differently.

I've got same impression :)

@mosra
Copy link
Owner

mosra commented Jun 12, 2020

The only thing I want to add is ability to combine two or more filters together, i.e. make interface composable.

I'd say this probably has to be done on the user side, as I'm not sure about the semantics -- if one filter returns five prefixes and the other seven, which of them should it pick up? Should it have twelve, five, or seven prefixes in total? A dumb trivial "combiner" that could do an union of the prefixes would be

def filter_combiner(**kwargs):
    return set([filter(**kwargs) for filter in [no_file_filter, snake_case_filter, ...]])

which is in my opinion easy enough to be left up to the user. If there needs to be some other behavior (intersection instead of an union etc.), it has to be implemented differently, and I think this logic again doesn't belong to the script :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants