-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and report incorrectly delimited strings #161
Conversation
Strings in pofiles are always delimited on both ends with double-quotes. The POFile parser in polib didn't check for this, and therefore happily accepted invalid msgstr/msgid/etc, potentially loosing some of the contents of the file. In such cases, the first or last character of the string would be lost, as *they* would be considered the string delimiters. This commit adds a check to the POFile parser to ensure strings are always delimited by double quotes on both ends. After adding it, I spotted a couple of offending po file contents in the tests, which have also been fixed. Additionally, a new test had been added to ensure these cases are caught. The new test indeed fails if the new check is removed. This issue was found while investigating an error produced by the "powrap" tool while running it over the po files for the Spanish translation of the CPython documentation. The tool failed check one of our files because gettext's `msgcat` utility failed to parse the file. Upon closer inspection I realised the error in our pofile, which was caught by gettext but not polib. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #161 +/- ##
==========================================
+ Coverage 95.71% 95.73% +0.01%
==========================================
Files 1 1
Lines 864 868 +4
==========================================
+ Hits 827 831 +4
Misses 37 37 ☔ View full report in Codecov by Sentry. |
A msgstr was missing its opening double quotes, which gettext's msgcat complained about (but powrap and polib didn't). See https://git.afpy.org/AFPy/powrap/pulls/4 and izimobil/polib#161 for PRs to fix both of those. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
En `library/re.po` había una entrada que no estaba delineada correctamente con comillas dobles (si ven el diff entero es la última entrada en el diff, o pueden ver simplemente el primer commit de este PR). Esto hacía que `powrap --check` se saltara el archivo y no lo validara. Esto, a su vez, ocurría porque la utilidad `msgcat` de `gettext` identificaba el error de sintaxis, y fallaba al ser ejecutada. `powrap` no consideraba esos errores al momento de calcular el exit code del proceso, y por lo tanto el archivo no sólo seguía siendo inválido, sino que tampoco era verificado. De igual forma, el archivo no podía ser wrapeado correctamente usando `powrap library/re.po`. Ya abrí un PR contra `powrap` para cambiar este comportamiento en https://git.afpy.org/AFPy/powrap/pulls/4 (actualización: el PR ya fue mergeado, y una nuevs versión de powrap fue publicada, pornlo que también actualicé en este PR nuestra dependencia de powrap, además del pre-commit hook de powrap). Por otro lado, el resto de nuestras herramientas *no* consideraban este archivo como inválido, Esto es porque `polib` no hacía la validación correspondiente, e incorrectamente parseaba la entrada. También abrí un PR contra polib para esto en izimobil/polib#161. Actualización: en el intertanto también me di cuenta de que el paquete `babel` sufre del mismo problema, yo incorrectamente había asumido que babel dependía de polib; PR creada contra babel: python-babel/babel#1151. Después de corregir el error de sintaxis, ejecuté powrap de tal manera que ahora `library/re.po` está bien formateado. --------- Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Gentle ping |
Another gentle ping, thanks! |
Sorry for the delay thanks for your contribution, merged. |
Strings in pofiles are always delimited on both ends with double-quotes. The POFile parser in polib didn't check for this, and therefore happily accepted invalid msgstr/msgid/etc, potentially loosing some of the contents of the file. In such cases, the first or last character of the string would be lost, as they would be considered the string delimiters.
This commit adds a check to the POFile parser to ensure strings are always delimited by double quotes on both ends. After adding it, I spotted a couple of offending po file contents in the tests, which have also been fixed. Additionally, a new test had been added to ensure these cases are caught. The new test indeed fails if the new check is removed.
This issue was found while investigating an error produced by the "powrap" tool while running it over the po files for the Spanish translation of the CPython documentation. The tool failed check one of our files because gettext's
msgcat
utility failed to parse the file. Upon closer inspection I realised the error in our pofile, which was caught by gettext but not polib.Update: I also realised the babel package suffers from the same issue, so I've created a PR addressing the same issue there: python-babel/babel#1151