-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #9507 Describe accurately acceptable package names #9508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terrific, thank you! Only soft cosmetic suggestions from me.
Could someone explain how Unicode package names are supposed to work? I’d expect that the issue is in parser, not in documentation. |
@Bodigrim, the short answer is that Unicode package names work like other package names with the exception that Hackage will not accept them for 'publication'. Looking at Cabal's code, it seems to have a long history of accepting Unicode package names (by using -- | Selects alphabetic or numeric digit Unicode characters.
--
-- Note that numeric digits outside the ASCII range are selected by this
-- function but not by 'isDigit'. Such digits may be part of identifiers
-- but are not used by the printer and reader to represent numbers.
isAlphaNum :: Char -> Bool |
@mpilgrem Does |
Also Unicode package names are a terrible idea from security perspective: |
@Bodigrim, on |
I should say, I have no skin in the game on what the Cabal package description format specification should say and/or what the Cabal library/tool should do (given the specification). My objective is only that the Cabal User Guide should describe accurately what, in fact, Cabal accepts. If that is a 'bug' (perhaps a long-standing one) the Guide should warn users accordingly. |
I agree with @mpilgrem here: the docs should explain what is the current state of affairs faithfully. If you don't like that state, that would be another ticket to open. On the other hand. The docs may gently steer people away from Unicode names by, for instance, mentioning that you can't publish them on Hackage and they are not widely used and may have security implications. Note that this patch clarifies the docs in several ways, not only w.r.t. the Unicode business. |
4aa738e
to
fdb8ad4
Compare
I have put through the suggestions of @ulysses4ever. I have also done two other things:
|
fyi, docs from this pr are rendered here: https://cabal--9508.org.readthedocs.build/en/9508/ |
@ulysses4ever, many thanks for the pointer. I can confirm that the |
@Bodigrim, returning to let tarPath isDir fp =
case Tar.toTarPath isDir (forceUtf8Enc (pkgIdName FP.</> fp)) of
Left e -> prettyThrowIO $ ToTarPathException e
Right tp -> pure tp
-- convert a String of proper characters to a String of bytes in UTF8
-- encoding masquerading as characters. This is necessary for tricking the
-- tar package into proper character encoding.
forceUtf8Enc = S8.unpack . T.encodeUtf8 . T.pack In my reading, I came accross open issue: #3758 (Lack of support for non-ASCII characters in cabal sdist) and haskell/tar#6 (Paths with unicode characters in them?). |
ba6d1c4
to
05305db
Compare
I have improved/corrected the formula for |
Hey @mpilgrem ! Do you still want to merge this? |
@ulysses4ever, thanks for the ping. I'll return to this and see what is the current position. |
3c1b3b2
to
a329900
Compare
I've run into a hurdle. Cabal's CI has a step where it generates a template for that part of the manual, and then compares it to the actual manual contents. The step that generates the template assumes that each item will be described with some narrative followed by a single 'formula'. That is no longer the case for my suggested correction, which was formatted to be useful to people and has two 'formulas':
Part of the template-generating code is in a package reUnqualComponent :: GrammarRegex a
reUnqualComponent = RENamed "unqual-name" $
REMunch1 (reChar '-') component
where
component
= REMunch reEps (RECharSet csAlphaNum)
-- currently the parser accepts "csAlphaNum `difference` "0123456789"
-- which is larger set than CS.alpha
--
-- Hackage rejects non ANSI names, so it's not so relevant.
<> RECharSet CS.alpha
<> REMunch reEps (RECharSet csAlphaNum) The code comment seems to reference the problem with the manual that my changes to it are seeking to fix. Perhaps the path of least resistence is to refactor my two 'formulas' into one 'formula', even if that makes them more difficult for people to read. However, that means I will have to work out what is the grammar used in EDIT: I took a different path, and moved the alpha-num-not-digit 'formula' to the introduction to the relevant section of the manual page. |
05d055a
to
0ab62e1
Compare
7a8a994
to
d02ed81
Compare
@ulysses4ever, from my perspective, this is good to be merged. |
Terrific and thank you! Do you want to do the honor and put the merge-me label on it @mpilgrem ?
Incidentally, that's what I was going to suggest. |
Also extinguishes false positive in typo detection. Also aims to make corresponding changes to buildinfo-reference-generator, including the `Cabal-described` package. Adds `alphanumNotDigit` and now specifies `alphanum` by inserting digits into `alphanumNotDigit`. The description of `alpha-num-not-digit` is added to the introduction to the section on 'Non-terminals' to accommodate the limitation of buildinfo-reference-generator that only a single 'formula' can follow a single paragraph of narrative for each non-terminal production.
See:
Cabal accepts package names that the Cabal User Guide says are invalid #9507
Patches conform to the coding conventions.