-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search name suffixes #149
base: master
Are you sure you want to change the base?
Search name suffixes #149
Conversation
Codecov Report
@@ Coverage Diff @@
## master #149 +/- ##
==========================================
+ Coverage 98.10% 98.11% +0.01%
==========================================
Files 27 27
Lines 6477 6671 +194
Branches 44 44
==========================================
+ Hits 6354 6545 +191
- Misses 123 126 +3
Continue to review full report at Codecov.
|
run parameters instead)
This PR is marked as draft primarily because it unconditionally adds extra entries to search index without possibility retain older behavior. It might be undesirable for some users. I'm not sure how this part should be approached. |
Hm, forgot to reply here of all things. Sorry. What's the underlying motivation here? Do you really want to be able to search by any word contained in each symbol, or was it (as you mentioned on gitter) mainly just a clear well-defined set of common prefixes that you want to strip (such as As mentioned in #155, I'm planning to add an option listing prefixes that should be optional in the search, and then investigate implementation of fuzzy search based on initial letters (so in the above case you'd type for example |
My use case is close to any word search: I agree, search result gets bigger and it makes harder to navigate through. Results sort algorithm needs an adjustment for this change (or upcoming fuzzy search). The index gets bigger. By how much? Approximately by average number of words per symbol in initial table, so its in a range from 1 to 4, most likely close to 2. So it's not that big. Even with those downsides accessibility is much higher. I think this PR can be merged only as temporal solution for relatively small projects until proper implementation arrives. Of course this option should be disabled by default. |
Indeed, with this PR magnum docs hits limits just like in #123 even with changes from |
I need to switch back to working on magnum so I can't put more time into this right now, but what I'm planning to do is adding a generic hook for processing/filtering search results, which would allow this to be done from the user side. Hastily written example: # Filter function signature:
#
# path is the fully qualified name, for std::string::replace() it would be
# ['std', 'string', 'replace']
# entry_type.name is one of 'PAGE', 'NAMESPACE', 'GROUP', 'CLASS', 'STRUCT',
# 'UNION', 'TYPEDEF', 'DIR', 'FILE', 'FUNC', 'DEFINE', 'ENUM', 'ENUM_VALUE',
# 'VAR'
#
# All parameters are passed as keyword args and more may be added in the
# future, use **kwargs to ignore the ones you're not interested in.
#
# Returns a list of prefix lengths, empty list means the name shouldn't be
# added to the search data. The implicit filter returns [0], meaning the result
# is added just once, with no prefix stripped.
def filter(*, path: List[str], entry_type: EntryType, **kwargs) -> List[int]:
return [0]
# Implement your filter func and put it to SEARCH_PREFIX_FILTER in conf.py
SEARCH_PREFIX_FILTER = my_filter_func
# Examples:
# Add everything except files and dirs beginning with impl_ to search data
def no_file_filter(name, entry_type, **kwargs):
if entry_type.name in ['FILE', 'DIR'] and name.startswith('impl_'): return []
return [0]
# Strip library namespace and get_ / set_ prefixes from names
def namespace_getset_prefix_filter(name, **kwargs):
for i in ['mylib_', 'get_', 'set_']:
if name.startswith(i): return [0, len(i)]
return [0]
# Allows to search for parts of a snake-case name (I hope I copied this
# correctly from the diff).
_snake_case_point_re = re.compile('_[^_]')
def snake_case_filter(name, **kwargs):
return [0] + [m.start(0) + 1 for m in _snake_case_point_re.finditer(name)] Further suggestions / ideas welcome. In particular, the fuzzy search suggested by @thomthom in #155 can't be implemented this way, that has to be done differently. |
Filtering mechanism is a nice feature! The only thing I want to add is ability to combine two or more filters together, i.e. make interface composable.
I've got same impression :) |
I'd say this probably has to be done on the user side, as I'm not sure about the semantics -- if one filter returns five prefixes and the other seven, which of them should it pick up? Should it have twelve, five, or seven prefixes in total? A def filter_combiner(**kwargs):
return set([filter(**kwargs) for filter in [no_file_filter, snake_case_filter, ...]]) which is in my opinion easy enough to be left up to the user. If there needs to be some other behavior (intersection instead of an union etc.), it has to be implemented differently, and I think this logic again doesn't belong to the script :) |
Adds extra entries to search index according to camel and snake case styles.
In this example only 6 entries will be added to index so index size grows linearly:
This PR is incomplete as don't have configuration options, etc. I think you have more sense of how it should be done.
demo