You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the HN article, quickly tried the demo on the places set but came up with weird results:
How come a result that doesn't match the input string gets a better quality score than the one that includes the entry ?
And how come another one not maching is having the same score just below it ?
If it's due to length having a closer match, I then don't understand the advantage for making it real time ? (If I need to type my string almost entirely to make my targeted result come up in the visible list, then "time gained" by indeed good performances is already lost)
But anyway, performances are very good indeed, just trying to figure out the use case.
The text was updated successfully, but these errors were encountered:
m31coding
changed the title
Quality index: how come ?
Prefix match is ranked low
Feb 18, 2024
Hi, thank you very much for reporting this observation. I think this is an interesting topic for many people. Carcasse and Carasso are good matches for the query carcasso because the words are similar. To be precise, the computed quality is the number of common n-grams between the query and the matched term, divided by the number of n-grams of the longer string.
Most fuzzy searchers are based on such string similarities, related heuristics, or well-defined string metrics. That being said, I can definitely see where you are coming from. A correctly typed prefix could potentially yield a higher-quality match than what is currently computed.
One solution that can be adopted is to implement a search controller that queries a fuzzy searcher as well as a suffix array searcher. The latter is good at finding prefix and suffix matches. The result matches of both can then be distinct and combined. However, I consider this approach to be out of scope for this library.
To get closer to your desired behaviour with the library as is, you could boost the qualities of prefix matches like so:
The lower the exponent, the more weight is given to prefix matches. With my choice of 0.5 Carcassonne is at rank 1 with a quality of 0.8 for the query carcasso. The quality of the other results don't change because their prefixes don't match.
I will try to find time in Q2 to investigate whether something like this can be implemented in the core of the searcher in a proper and performant way.
m31coding
changed the title
Prefix match is ranked low
Prefix match is ranked lower than expected
Feb 18, 2024
Following the HN article, quickly tried the demo on the places set but came up with weird results:
How come a result that doesn't match the input string gets a better quality score than the one that includes the entry ?
And how come another one not maching is having the same score just below it ?
If it's due to length having a closer match, I then don't understand the advantage for making it real time ? (If I need to type my string almost entirely to make my targeted result come up in the visible list, then "time gained" by indeed good performances is already lost)
But anyway, performances are very good indeed, just trying to figure out the use case.
The text was updated successfully, but these errors were encountered: