Replies: 19 comments 22 replies
-
Hi Joe! Search is hard! 😂 I agree it's not providing great results here, but I believe it is working as designed at the moment. Now, there's a good argument that that design is wrong, but I'll give a bit of background. First things first, if you have an exact match on package name, that package gets rated top search result regardless of anything else. However, after that, all packages that match are listed in reverse score order. So it’s everything that matches (including matching on a partial keyword, as CollectionViewPagingLayout does) ordered by our package score. That’s not ideal by any classification, but it’s what we have right now. Search is on our list of things to look at, and maybe it’s getting towards that time. It’s potentially a huge task though. 😅 You can find the query builder in I wish I had a better answer for you. |
Beta Was this translation helpful? Give feedback.
-
All good - thanks for pointing me into the code, I'll poke around a bit more. And yeah, I'm all too aware that search is seemingly simple and brutally complex, especially for subjective results such as what I was reporting. I assume any potential other solution should rely as much as possible on what's available and provided by the database? (that is, available to return via a SQL query of some form?) Postgres does have some full-text search capabilities - although they need to be relevantly configured, so I might poke there to see how hard it would be to set up. |
Beta Was this translation helpful? Give feedback.
-
I wasn't suggesting you tackle it, just that you might be interested 👍 I'll look forward to a fully redesigned and reimplemented search system by the time I wake up! 😂 |
Beta Was this translation helpful? Give feedback.
-
It won't happen that fast, but I'll poke around a bit - I'm curious what's available and it's a good excuse to lean into Vapor a bit. |
Beta Was this translation helpful? Give feedback.
-
Did a bit of digging, and I managed to search on a word that happens to be quirky when it comes to matching in English. "Ping" overlaps a bit excessively with keywords that it shouldn't match with ( So the good news is that most search terms likely won't exhibit this pathological behavior. I took a bit of time to understand the A potential solution is to leverage the Postgres built-in full-text search capabilities, specifically
You can then compare against them using the I did some digging to verify that most of the search terms were in English (the stemming/normalizing is VERY specific to language), so any enhancements here would need to be aware of the language in use. In scanning through a data dump, I didn't see any keywords that were in a language other than English, but there are a number of instances where the summary (also searched) is in another language - a couple of what I think were mandarin, one in Japanese, etc. Those were easy to spot because of the different alphabet. I also scanned, but not exhaustively, for European languages (guessing german, French, or Italian might be there) - but didn't see any samples in my previewing scan (just visual). Given how the search is coded today, I suspect it's possible to apply the Postgres stemming/word normalization functions to keywordMatchQueryBuilder, replacing the where clause with some updated logic that uses the search terms vs. keywords after they're normalized - the line: That said, I'm not very familiar with SQLKit, so I'm sorting out how to convert my raw SQL experiments into swift code using SQLKit to even see if it would help/work - and how much of a performance impact my crazy idea would have. Right now, the code's doing the rough equiv of
(Not sure how to handle converting keywords as an array into the ts_vector type, but I think that can be done - idea being to slap them all together, then run the stemmer over the whole set) As a broader solution, if the language of the summary is English, this same concept might be useful to be employed searching the contents of the summary fields that's maintained locally. That begs the question on support for other languages, but might still be an interesting option. |
Beta Was this translation helpful? Give feedback.
-
I don't have access to that link, but I'd love to see it. I'd imagine it maps embarrassingly closely to GitHub stars right now. Something I'd love to fix. |
Beta Was this translation helpful? Give feedback.
-
I'd agree we probably don't have many/any keywords in the database that are in the Latin alphabet that are not English apart from terms like I'd be interested to see this move forward into something we can test (if that's not too much more work). I've no problem with things that make English-language search better as long as they don't break searches for other languages completely. From a quick look through some data. Here might be some good searches to test:
|
Beta Was this translation helpful? Give feedback.
-
@daveverwer Actually - the distribution is nothing like the GitHub stars setup. It's a pretty good uniform curve from what I'm seeing. I suck at using Observable, so I've fixed up and published the link - apparently the earlier one is "just for me?" - anyway: https://observablehq.com/d/acc9f659f95e8e58/ should be visible for you now. I made it unlisted. The test searches are perfect, thank you!! |
Beta Was this translation helpful? Give feedback.
-
That's fascinating, thanks Joe! What an amazing tool Observable is, too. |
Beta Was this translation helpful? Give feedback.
-
I've been stepping back from fiddling with the server code a bit to try and wrap some numbers around "good search results/bad search results" and more specifically how to make some reasonable estimate of if a search result implementation is better (or worse). Fortunately, there's a good bit of prior research out on this topic, and there's a couple of algorithms that I'm think would be good to use, and wanted to present them here for discussion, if anyone else is interested in the topic: MeasuresThe two key measures - which are implicitly balanced against each other in any search - are
You can overbalance in either direction, and "best" is a balancing act. All of this is based on subjective measures - "Is this specific result "relevant" to my search - which puts it very much in the eye of the beholder." There's a couple of other quirks with these measurements - specifically recall - in that you rather have to "know" the whole set of documents that could be relevant to get to a resulting percentage. Also know that you can game the There's a secondary algorithm that's more useful for fine-tuning that's evolved from the web-search studies over the past 20+ years, called "mean reciprocal rank" - which adds weighting to "is it relevant or not" for the position in the search results, giving more credit for relevant results higher in the return list than not. This banks on the whole idea of "above the fold" and browser-based searches where us humans are far less likely to iterate through paginated search results to see what else might have been uncovered. The weighting mechanisms are often as not matched with a non-binary choice of relevant - instead of "yes/no", in some search situations it's quite reasonable to see "yes, definitely", "somewhat", and "no" - or even finer gradations. This kind of the result is more focused on broader narrative text and phrase searching, by the graduated levels of relevance might still be very useful in the Package Index context. Current Index SearchThe current search primarily uses SQL Measuring the current for comparisonsUltimately, measuring relevance means someone (or better - multiple someones) going through some searches and rating the results. The most ideal situation is the person wanting to do the search providing a ranking, but in practice that's usually not very feasible. (There's some algorithmic feedback loops involving click-through recording that try to put something in this place, but that's a level of effort I suspect unwarranted for the index) That said, I think a few people all independently making an evaluations of search results, using our personal experience to do best-effort at inferring the search intent, would result in reasonable values. To that end, I'm cobbling up a utility app that can make and store searches, and getting set to add the capability to store relevant rankings on the individual results, with the idea that I'd like to do some "comparison" searching between different techniques on the server (so comparing the hosted version to something I run locally, with a reasonably close database dump) - or to allow anyone else to do the same. You could even search over time, but since the index is fairly constantly growing and changing with submissions, you'd have to be more careful with assertions about the quality of search. (The app is open & public on GitHub if you're curious or want to poke: https://github.com/heckj/SPISearch). My idea was to capture the searches, add the ranking, and have the app help provide the measures to compare searching against the current implementation vs. something you might be running locally to test/work out - and being able to get measured values of the comparison in the terms of the measures I outlined above. The askWhile I'm getting the utility app functional, I wanted to get any community feedback on the measures I was planning on using - to see if there's other measures or algorithms that anyone would like to suggest - and to see what people thought of graduated relevance (relevance measured as a value 0...something as opposed to a binary "yes/no") for making measurements. |
Beta Was this translation helpful? Give feedback.
-
I have an initial trail in a branch that i looked at with metrics, starting with just the keyword matching mechanisms. There's a quirk here that's not unexpected, but not ideal that i wanted to point out. When using the tsvector to normalize and then compare words, there's a common set that seems to get dropped: compound words that include terms prefixed by either NS or UI. I spotted this while reviewing the keyword search changes, as the keyword 'uibezierpath' was dropped for the keyword search 'bezier' when using the tsvector comparison (vs ILIKE which is what's happening currently). I'm noodling on some ideas to allow for this, at least for the most common prefixes that we'll encounter ('NS', 'UI'), to hopefully preserve the recall. The steamer mechanism isn't exposed in Postgres, intentionally so - but i have some whack ideas for keeping the stock postgres infrastructure and pre-processing in some limited fashions to help alleviate this. It might not be worth it, but i'm kind of curious what I can get away with within this context. This initial test didn't really expose what or how non-english words or alphabets might be impacted. That's next on my investigation list. |
Beta Was this translation helpful? Give feedback.
-
Something functional (although NOT clean) is up in a draft PR so we can start picking apart what's happening and get it out my head and step forward from my local experimental branch: #1905 One thing I wanted to call out is that I'm not currently enabling any of the fancier "binary search" capabilities that we could be using. There's multiple variations in Postgres to generate a There is capability to use AND, OR, NOT, and "followed by" operators when constructing a The initial PR doesn't do any boosting of rank by score - but I did do experiments and multiple iterations (mostly directly in SQL) to verify that - contrary to the documentation - Postgres tends to return a ranking value maxing out at 1.0, and with a minimum value or either 0 or 1e20. The documentation showed ranking examples stepping up to '3' in some cases - which I've been unable to reproduce. I did comb through (to my limited ability) the Postgres source on Based on that, my thinking is a score-based boost (for otherwise equivalent rankings) might be best achieved by normalizing score into a 0...0.1 range and directly adding that onto the rank. The exact factor is a guess - that might overwhelm some relevance values, but I think that's low enough that it would end up sorting any "irrelevant" results by quality-of-package. I haven't measured all the various numbers with my utility SPISearch app yet, but will do so and provide the results - either here, or within the PR - if that would be more meaningful. |
Beta Was this translation helpful? Give feedback.
-
I had a large pile of exploratory SQL queries and learning comments that seem like they'd be better here, so I'm pulling that out from the PR work and stashing it here in case anyone wants to reference it. Exploratory SQL commands, specific to Postgres that leverage the existing select to_tsvector('stopping watching ping') @@ to_tsquery('ping') returns true The @@ operator applies a query to a vector to determine if the query would match The coalesce function ensures that a NULL result from a column just results in an select * from search where to_tsvector(coalesce(summary, '') || coalesce(package_name, '')) @@ to_tsquery('ping') select package_name, to_tsvector(ARRAY_TO_STRING(keywords, ' ')) from search select to_tsvector('stopping watching ping') @@ to_tsquery('ping') select * from search where to_tsvector(coalesce(summary, '') || coalesce(package_name, '')) @@ to_tsquery('ping') The || concatenating operator above is concatenating the strings, which then gets converted into SELECT package_name, keywords_vector, ts_rank(keywords_vector, query) AS rank
FROM search, to_tsquery('bezier') query, to_tsvector(ARRAY_TO_STRING(keywords, ' ')) keywords_vector
WHERE query @@ keywords_vector
ORDER BY rank DESC; Exploring the vector spaces SELECT package_name, keywords, combined_vector, query, ts_rank(combined_vector, query) AS rank
FROM search,
plainto_tsquery('bezier') query,
to_tsvector(ARRAY_TO_STRING(keywords, ' ')) keywords_vector,
to_tsvector(package_name) name_vector,
to_tsvector(summary) summary_vector,
to_tsvector(coalesce(summary, '') || coalesce(package_name, '') || coalesce(ARRAY_TO_STRING(keywords, ' '), '')) combined_vector
WHERE query @@ combined_vector
ORDER BY rank DESC; exploring weighting values and how they apply SELECT package_name, keywords, combined_vector, query, ts_rank(combined_vector, query) AS rank
FROM search,
to_tsquery('ux & (bezier | arrow) ') query,
to_tsvector(ARRAY_TO_STRING(keywords, ' ')) keywords_vector,
to_tsvector(package_name) name_vector,
to_tsvector(summary) summary_vector,
to_tsvector(coalesce(summary, '') || coalesce(package_name, '') || coalesce(ARRAY_TO_STRING(keywords, ' '), '')) combined_vector
WHERE query @@ combined_vector
ORDER BY rank DESC; -- vs. SELECT package_name, keywords, combined_vector, query, ts_rank(combined_vector, query) AS rank
FROM search,
to_tsquery('ux & (bezier | arrow) ') query,
to_tsvector(ARRAY_TO_STRING(keywords, ' ')) keywords_vector,
to_tsvector(package_name) name_vector,
to_tsvector(summary) summary_vector,
setweight(to_tsvector(coalesce(summary, '') || coalesce(package_name, '') || coalesce(ARRAY_TO_STRING(keywords, ' '), '')), 'A') combined_vector
WHERE query @@ combined_vector
ORDER BY rank DESC; The results of the above, varying the weighting to see the effect:
Contrary to what the documentation example shows, I couldn't arrange a rank result that Using tsvector @@ tsquery as the boolean to indicate a As an example, the query for One option to work around this is specifying all possible values into a thesaurus, which augments SELECT package_name, ts_rank(combined_vector, query) AS rank, combined_vector
FROM search,
to_tsquery('ux & (bezier | arrow) ') query,
to_tsvector(ARRAY_TO_STRING(keywords, ' ')) keywords_vector,
to_tsvector(package_name) name_vector,
to_tsvector(summary) summary_vector,
setweight(to_tsvector(coalesce(summary, '') || coalesce(package_name, '') || coalesce(ARRAY_TO_STRING(keywords, ' '), '')), 'A') combined_vector
WHERE query @@ combined_vector
ORDER BY rank DESC; The ranking values returned from exploration:
Same sample, except for
Sample for
Sample for
Sample for
Sample for
I think we want to use
The normalization factor is always related to the document length, which in our case isn't On the plus side, we can get a rank for a non-matching term that reliably comes back as 0, so we can SELECT query, a_vector, ts_rank(a_vector, query) AS rank
FROM plainto_tsquery('fiddlesticks') query,
to_tsvector('Heres my example package of swift package manager') a_vector
-- WHERE query @@ a_vector
ORDER BY rank DESC; The union query implemented in the first cut of this: SELECT * FROM (
(
SELECT DISTINCT 'author' AS "match_type", NULL AS "keyword", NULL::UUID AS "package_id", NULL AS "package_name", NULL AS "repo_name", "repo_owner", NULL::INT AS "score", NULL AS "summary", NULL::INT AS "stars", NULL AS "license", NULL::TIMESTAMP AS "last_commit_date", NULL::TIMESTAMP AS "last_activity_at", NULL::TEXT[] AS "keywords", NULL::BOOL AS "has_docs", LEVENSHTEIN("repo_owner", 'bezier') AS "levenshtein_dist", ts_rank("tsvector", "tsquery") AS "tsrankvalue"
FROM "search", plainto_tsquery('bezier') AS "tsquery", setweight(to_tsvector("repo_owner"), 'A') AS "tsvector"
WHERE "repo_owner" ILIKE '%bezier%'
ORDER BY "levenshtein_dist"
LIMIT 50
)
UNION ALL (
SELECT DISTINCT 'keyword' AS "match_type", "keyword", NULL::UUID AS "package_id", NULL AS "package_name", NULL AS "repo_name", NULL AS "repo_owner", NULL::INT AS "score", NULL AS "summary", NULL::INT AS "stars", NULL AS "license", NULL::TIMESTAMP AS "last_commit_date", NULL::TIMESTAMP AS "last_activity_at", NULL::TEXT[] AS "keywords", NULL::BOOL AS "has_docs", LEVENSHTEIN("keyword", 'bezier') AS "levenshtein_dist", ts_rank("tsvector", "tsquery") AS "tsrankvalue"
FROM "search", UNNEST("keywords") AS "keyword", plainto_tsquery('bezier') AS "tsquery", setweight(to_tsvector("keyword"), 'B') AS "tsvector"
WHERE "keyword" ILIKE '%bezier%'
ORDER BY "tsrankvalue"
LIMIT 50
)
UNION ALL (
SELECT 'package' AS "match_type", NULL AS "keyword", "package_id", "package_name", "repo_name", "repo_owner", "score", "summary", "stars", "license", "last_commit_date", "last_activity_at", "keywords", "has_docs", NULL::INT AS "levenshtein_dist", ts_rank("tsvector", "tsquery") AS "tsrankvalue"
FROM "search", plainto_tsquery('bezier') AS "tsquery", setweight(to_tsvector(coalesce(summary, '') || coalesce(package_name, '') || coalesce(ARRAY_TO_STRING(keywords, ' '), '')), 'A') AS "tsvector"
WHERE CONCAT_WS(' ', "package_name", COALESCE("summary", ''), "repo_name", "repo_owner", ARRAY_TO_STRING("keywords", ' ')) ~* 'bezier' AND "repo_owner" IS NOT NULL AND "repo_name" IS NOT NULL
ORDER BY LOWER("package_name") = 'bezier' DESC, "tsrankvalue" DESC, "score" DESC, "package_name" ASC
LIMIT 21 OFFSET 0
)
) AS "t"
With the inputs: ["bezier", "bezier", "%bezier%", "bezier", "bezier", "%bezier%", "bezier", "bezier", "bezier"]
With the inputs:
Experiments in seeing the tsvector weighted, and then coalesced: SELECT setweight(to_tsvector(coalesce(summary, '')), 'A') || setweight(to_tsvector(coalesce(ARRAY_TO_STRING(keywords,' '))) , 'B')
FROM search
WHERE package_name = 'SCNBezier' Just the summary vector, weighted 'A':
With keywords coalesced into it, weighted 'B':
|
Beta Was this translation helpful? Give feedback.
-
Just adding another example to this discussion, I was trying to search for FTP libraries which seems to match perfectly with "SwiftPM" and many other versions of it. So very difficult to find the library I was looking for. |
Beta Was this translation helpful? Give feedback.
-
Here's another good test search that I just did for real, knowing the package was there, and expecting it to be top result https://swiftpackageindex.com/search?query=introsp I was expecting to find Introspect top. I bet it will be with the changes. |
Beta Was this translation helpful? Give feedback.
-
Mmm, for a partial search string this feels like a good result, with solid matches in pos 1 and 2. I’ll try the branch tomorrow but I’d be surprised if it fared better since “introsp” is not a word. Note that searching for “introspect” would have given you pos 1. |
Beta Was this translation helpful? Give feedback.
-
No notable improvement (but not worse either): Because it's a partial word, the lexeme normalization won't give it any specific boosting here, so it's effectively falling back to the more grep-like finding mechanism. (tested on current |
Beta Was this translation helpful? Give feedback.
-
This is now up on staging! |
Beta Was this translation helpful? Give feedback.
-
This is incredible work. Thanks so much for all your hard work, @heckj (and to @finestructure, for shepherding it in!) 🎉 |
Beta Was this translation helpful? Give feedback.
-
Please describe the bug
The search behavior seems to downplay the text in the title or description for relevance as compared to other elements. For example, a search for "ping" - which you'd expect to highlight
SwiftyPing
(https://swiftpackageindex.com/samiyr/SwiftyPing) shows a plethora of other elements ranked above that package.I suspect what's happening, based on what's visible, is that
ping
is being used to look up matching keywords, and the prioritized list is being provided by what matches those keywords, where I suspect it would be better to prioritize the package's name and/or description above keyword matching results.(In this case, I knew SwiftyPing was in the results - but I was surprised that doing a search for Ping would result in it being at item 14, since as far as I can tell it's the only package that includes any "ping" functionality in the index.)
Explain the steps needed to reproduce the bug
What was the expected behaviour?
I expected that SwiftyPing would be the top listed package
Screenshots
The screenshot below shows the top hits (as of Jun 29, 2022) for
ping
:What web browser were you using? (if relevant)
If you're reporting a bug with the front end, please include your browser information below:
Additional context
Is there anything else we might find useful?
Beta Was this translation helpful? Give feedback.
All reactions