Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and report incorrectly delimited strings #161

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

rtobar
Copy link
Contributor

@rtobar rtobar commented Nov 13, 2024

Strings in pofiles are always delimited on both ends with double-quotes. The POFile parser in polib didn't check for this, and therefore happily accepted invalid msgstr/msgid/etc, potentially loosing some of the contents of the file. In such cases, the first or last character of the string would be lost, as they would be considered the string delimiters.

This commit adds a check to the POFile parser to ensure strings are always delimited by double quotes on both ends. After adding it, I spotted a couple of offending po file contents in the tests, which have also been fixed. Additionally, a new test had been added to ensure these cases are caught. The new test indeed fails if the new check is removed.

This issue was found while investigating an error produced by the "powrap" tool while running it over the po files for the Spanish translation of the CPython documentation. The tool failed check one of our files because gettext's msgcat utility failed to parse the file. Upon closer inspection I realised the error in our pofile, which was caught by gettext but not polib.

Update: I also realised the babel package suffers from the same issue, so I've created a PR addressing the same issue there: python-babel/babel#1151

Strings in pofiles are always delimited on both ends with double-quotes.
The POFile parser in polib didn't check for this, and therefore happily
accepted invalid msgstr/msgid/etc, potentially loosing some of the
contents of the file. In such cases, the first or last character of the
string would be lost, as *they* would be considered the string delimiters.

This commit adds a check to the POFile parser to ensure strings are
always delimited by double quotes on both ends. After adding it, I
spotted a couple of offending po file contents in the tests, which have
also been fixed. Additionally, a new test had been added to ensure these
cases are caught. The new test indeed fails if the new check is removed.

This issue was found while investigating an error produced by the
"powrap" tool while running it over the po files for the Spanish
translation of the CPython documentation. The tool failed check one of
our files because gettext's `msgcat` utility failed to parse the file.
Upon closer inspection I realised the error in our pofile, which was
caught by gettext but not polib.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.73%. Comparing base (85f6d1c) to head (9c760ba).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #161      +/-   ##
==========================================
+ Coverage   95.71%   95.73%   +0.01%     
==========================================
  Files           1        1              
  Lines         864      868       +4     
==========================================
+ Hits          827      831       +4     
  Misses         37       37              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rtobar added a commit to rtobar/python-docs-es that referenced this pull request Nov 13, 2024
A msgstr was missing its opening double quotes, which gettext's msgcat
complained about (but powrap and polib didn't).

See https://git.afpy.org/AFPy/powrap/pulls/4 and
izimobil/polib#161 for PRs to fix both of those.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
rtobar added a commit to python/python-docs-es that referenced this pull request Nov 18, 2024
En `library/re.po` había una entrada que no estaba delineada
correctamente con comillas dobles (si ven el diff entero es la última
entrada en el diff, o pueden ver simplemente el primer commit de este
PR). Esto hacía que `powrap --check` se saltara el archivo y no lo
validara. Esto, a su vez, ocurría porque la utilidad `msgcat` de
`gettext` identificaba el error de sintaxis, y fallaba al ser ejecutada.
`powrap` no consideraba esos errores al momento de calcular el exit code
del proceso, y por lo tanto el archivo no sólo seguía siendo inválido,
sino que tampoco era verificado. De igual forma, el archivo no podía ser
wrapeado correctamente usando `powrap library/re.po`.

Ya abrí un PR contra `powrap` para cambiar este comportamiento en
https://git.afpy.org/AFPy/powrap/pulls/4 (actualización: el PR ya fue
mergeado, y una nuevs versión de powrap fue publicada, pornlo que
también actualicé en este PR nuestra dependencia de powrap, además del
pre-commit hook de powrap).

Por otro lado, el resto de nuestras herramientas *no* consideraban este
archivo como inválido, Esto es porque `polib` no hacía la validación
correspondiente, e incorrectamente parseaba la entrada. También abrí un
PR contra polib para esto en izimobil/polib#161.
Actualización: en el intertanto también me di cuenta de que el paquete
`babel` sufre del mismo problema, yo incorrectamente había asumido que
babel dependía de polib; PR creada contra babel:
python-babel/babel#1151.

Después de corregir el error de sintaxis, ejecuté powrap de tal manera
que ahora `library/re.po` está bien formateado.

---------

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar
Copy link
Contributor Author

rtobar commented Nov 21, 2024

Gentle ping

@rtobar
Copy link
Contributor Author

rtobar commented Dec 25, 2024

Another gentle ping, thanks!

@izimobil izimobil merged commit e290181 into izimobil:master Jan 10, 2025
1 check passed
@izimobil
Copy link
Owner

Sorry for the delay thanks for your contribution, merged.

@rtobar rtobar deleted the delimited-strings branch January 10, 2025 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants