Search name suffixes #149

sizmailov · 2020-05-21T23:19:01Z

Adds extra entries to search index according to camel and snake case styles.


v         v                     v                    # current serach start points
my_module.my_awesome___sub__module.StrangeGooFactory
   ^         ^         ^    ^             ^  ^       # added by this PR
   1         2         3    4             5  6

In this example only 6 entries will be added to index so index size grows linearly:

1 module
2 awesome___sub__module
3 sub__module
4 module
5 GooFactory
6 Factory

This PR is incomplete as don't have configuration options, etc. I think you have more sense of how it should be done.

demo

codecov · 2020-05-21T23:20:01Z

Codecov Report

Merging #149 into master will increase coverage by 0.01%.
The diff coverage is 97.22%.

@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
+ Coverage   98.10%   98.11%   +0.01%     
==========================================
  Files          27       27              
  Lines        6477     6671     +194     
  Branches       44       44              
==========================================
+ Hits         6354     6545     +191     
- Misses        123      126       +3

Impacted Files	Coverage Δ
documentation/python.py	`98.89% <94.73%> (-0.32%)`	⬇️
documentation/doxygen.py	`98.98% <100.00%> (-0.22%)`	⬇️
plugins/m/code.py	`92.21% <0.00%> (+0.09%)`	⬆️
plugins/ansilexer.py	`100.00% <0.00%> (+12.72%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09db97d...43ff8e3. Read the comment docs.

run parameters instead)

sizmailov · 2020-06-09T14:15:00Z

This PR is marked as draft primarily because it unconditionally adds extra entries to search index without possibility retain older behavior. It might be undesirable for some users. I'm not sure how this part should be approached.

mosra · 2020-06-09T14:18:03Z

Hm, forgot to reply here of all things. Sorry.

What's the underlying motivation here? Do you really want to be able to search by any word contained in each symbol, or was it (as you mentioned on gitter) mainly just a clear well-defined set of common prefixes that you want to strip (such as get_, set_, mylib_, pybind_ ...)? If it's any word, then I'm afraid bigger projects will have the search data size massively increased and running into problems like in #123 or #150.

As mentioned in #155, I'm planning to add an option listing prefixes that should be optional in the search, and then investigate implementation of fuzzy search based on initial letters (so in the above case you'd type for example myawesum to get to my_awesome_sub_module). That should be I think a good middle ground and short searches like su or mo won't list a ton of irrelevant results when searching for sum() or mod(). What do you think? :)

sizmailov · 2020-06-09T14:59:03Z

My use case is close to any word search:

I agree, search result gets bigger and it makes harder to navigate through. Results sort algorithm needs an adjustment for this change (or upcoming fuzzy search).

The index gets bigger. By how much? Approximately by average number of words per symbol in initial table, so its in a range from 1 to 4, most likely close to 2. So it's not that big.

Even with those downsides accessibility is much higher. I think this PR can be merged only as temporal solution for relatively small projects until proper implementation arrives. Of course this option should be disabled by default.

crisluengo · 2020-06-09T15:54:34Z

I tried this, it works wonderfully! Many thanks!

For my project, this is exactly the right solution to the biggest issue with search in m.css.

sizmailov · 2020-06-09T21:22:25Z

Indeed, with this PR magnum docs hits limits just like in #123 even with changes from search-32 branch, so this is definitely not a perfect/uniform solution.

mosra · 2020-06-10T09:11:39Z

I need to switch back to working on magnum so I can't put more time into this right now, but what I'm planning to do is adding a generic hook for processing/filtering search results, which would allow this to be done from the user side. Hastily written example:

# Filter function signature:
#
# path is the fully qualified name, for std::string::replace() it would be
#   ['std', 'string', 'replace']
# entry_type.name is one of 'PAGE', 'NAMESPACE', 'GROUP', 'CLASS', 'STRUCT',
#   'UNION', 'TYPEDEF', 'DIR', 'FILE', 'FUNC', 'DEFINE', 'ENUM', 'ENUM_VALUE',
#   'VAR'
#
# All parameters are passed as keyword args and more may be added in the 
# future, use **kwargs to ignore the ones you're not interested in.
#
# Returns a list of prefix lengths, empty list means the name shouldn't be 
# added to the search data. The implicit filter returns [0], meaning the result 
# is added just once, with no prefix stripped.
def filter(*, path: List[str], entry_type: EntryType, **kwargs) -> List[int]:
    return [0]
        
# Implement your filter func and put it to SEARCH_PREFIX_FILTER in conf.py
SEARCH_PREFIX_FILTER = my_filter_func

# Examples:

# Add everything except files and dirs beginning with impl_ to search data
def no_file_filter(name, entry_type, **kwargs):
    if entry_type.name in ['FILE', 'DIR'] and name.startswith('impl_'): return []
    return [0]

# Strip library namespace and get_ / set_ prefixes from names
def namespace_getset_prefix_filter(name, **kwargs):
    for i in ['mylib_', 'get_', 'set_']:
        if name.startswith(i): return [0, len(i)]
    return [0]

# Allows to search for parts of a snake-case name (I hope I copied this 
# correctly from the diff).
_snake_case_point_re = re.compile('_[^_]')
def snake_case_filter(name, **kwargs):
    return [0] + [m.start(0) + 1 for m in _snake_case_point_re.finditer(name)]

Further suggestions / ideas welcome. In particular, the fuzzy search suggested by @thomthom in #155 can't be implemented this way, that has to be done differently.

sizmailov · 2020-06-12T07:42:40Z

Filtering mechanism is a nice feature! The only thing I want to add is ability to combine two or more filters together, i.e. make interface composable.

#155 can't be implemented this way, that has to be done differently.

I've got same impression :)

mosra · 2020-06-12T09:40:34Z

The only thing I want to add is ability to combine two or more filters together, i.e. make interface composable.

I'd say this probably has to be done on the user side, as I'm not sure about the semantics -- if one filter returns five prefixes and the other seven, which of them should it pick up? Should it have twelve, five, or seven prefixes in total? A ~~dumb~~ trivial "combiner" that could do an union of the prefixes would be

def filter_combiner(**kwargs):
    return set([filter(**kwargs) for filter in [no_file_filter, snake_case_filter, ...]])

which is in my opinion easy enough to be left up to the user. If there needs to be some other behavior (intersection instead of an union etc.), it has to be implemented differently, and I think this logic again doesn't belong to the script :)

sizmailov added 2 commits May 22, 2020 02:07

Add name suffixes to search index

817ee35

Expose name suffix search options to higher level function

ef1d802

sizmailov added 2 commits May 22, 2020 02:40

Adjust tests (might be better to keep tests untouched and adjust

60860f3

run parameters instead)

Remove debug print

45fd030

mosra mentioned this pull request Jun 8, 2020

Fuzzy searching? #155

Open

sizmailov added 3 commits June 9, 2020 16:38

Remove repeated key

0f94a3c

Add CamelCase/sanke_case suffixes to doxygen.py

29b40ec

Update doxygen search tests

43ff8e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search name suffixes #149

Search name suffixes #149

sizmailov commented May 21, 2020 •

edited

Loading

codecov bot commented May 21, 2020 •

edited

Loading

sizmailov commented Jun 9, 2020

mosra commented Jun 9, 2020

sizmailov commented Jun 9, 2020

crisluengo commented Jun 9, 2020

sizmailov commented Jun 9, 2020

mosra commented Jun 10, 2020

sizmailov commented Jun 12, 2020 •

edited

Loading

mosra commented Jun 12, 2020

Search name suffixes #149

Are you sure you want to change the base?

Search name suffixes #149

Conversation

sizmailov commented May 21, 2020 • edited Loading

codecov bot commented May 21, 2020 • edited Loading

Codecov Report

sizmailov commented Jun 9, 2020

mosra commented Jun 9, 2020

sizmailov commented Jun 9, 2020

crisluengo commented Jun 9, 2020

sizmailov commented Jun 9, 2020

mosra commented Jun 10, 2020

sizmailov commented Jun 12, 2020 • edited Loading

mosra commented Jun 12, 2020

sizmailov commented May 21, 2020 •

edited

Loading

codecov bot commented May 21, 2020 •

edited

Loading

sizmailov commented Jun 12, 2020 •

edited

Loading