-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cookie-octet reality check #2185
Comments
(so 0x80-FF are gone again) |
Ah thanks for pointing that out. I think they are slightly less problematic than the more common ASCII bytes though. |
Hi @bagder, Section 4 defines how a well behaved server should format their cookies. Unfortunately not all servers are well behaved so in the interest of compatibility UAs are encouraged to implement a more permissive syntax as mentioned in Section 5.
Section-5.4 goes into more detail.
I would imagine it's due to potential parsing errors and other ambiguities, but that's just an educated guess. |
What's the point in specifying a format that isn't followed by implementations? Isn't it much smarter to align that with what is actually in use in the world so that the spec can help implementers? |
By which implementations? The spec exists for UAs consuming cookies as well as the servers producing them. Note that I wasn't around when the original decision to split the formats in this manner was made, so take this all with a grain of salt. But with my knowledge of the space I believe it's because different parts of the cookie ecosystem have different goals and need somewhat different guidance to reach those goals. Generally speaking though I assume that implementers want their work to be compatible with their counterparts' implementation:
Each side, server and UA, should ideally try to align on their own format within the spec but they can't always assume their counterpart will. E.x.: Maybe a user is running some ancient browser to visit a site or maybe someone on the latest Firefox version wants to visit a site that hasn't been updated since the 90s. (And finally, if you don't care about being widely compatible you can do whatever you want.) |
The split-description is incomprehensible and just makes readers of the spec to get it wrong and not read both descriptions (I know I'm not alone in repeatedly doing this mistake). There is just one Right now, the spec specifies a way how servers should send the cookie, but none of the popular browsers care about that format and happily accept cookies not following it, because you need to read the second format for the same header to see that a receiver can be much more liberal. So the single header field has... two formats? I care deeply about compatibility. That's why I submitted this issue. I think the current way of specifying it makes us all a disservice. |
The very reason existing cookie receivers don't use the same strict checks as is documented for the server side is of course because many servers don't comply. Otherwise there wouldn't be a need to write them differently. |
It looks like #1759 is a fix for this. |
To make sure I understand: |
I think the way it currently is phrased easily misleads readers of the spec. I think it should be improved to reduce that risk. I don't think there is a server-syntax and a separate client-syntax of the same header field. And basically, since the server-syntax clearly isn't obeyed in the wild, it should be fixed to better match reality. |
So then is your proposal to remove the syntax as described currently in section 4 and replace its usage with the syntax as described in Section 5.4? I.e.: To recommend that all implementers of cookies (UA and servers alike) use the section 5.4 syntax? |
Yes. In spirit with how HTTP header fields are documented/specified in other HTTP related RFCs: yes. One header field, one syntax in one place. |
I agree that having a single syntax to refer to would be ideal, unfortunately the web has not been ideal with respect to cookies with many erroneous and incomplete implementations having been released on both sides along with simply outdated versions (likely with many still in use today). Many of the designs and considerations that have gone into the spec have been to maximize compatibility, including backwards compatibility, between servers and UAs in the real world and that has required some concessions which includes the server/UA syntax differences. ("Lax-Allowing-Unsafe" enforcement is another example.) The chief benefit of using a single syntax appears to be ease of implementation, mainly locating the correct grammar within the document, with a minor benefit (for the servers) of a slightly expanded character set. These seem appealing but I expect there will be subtle problems arising from allowing more characters: Both of the benefits of a single syntax seem to me to be minor compared to the compatibility risks. I acknowledge that there are existing servers that do not follow the stricter syntax, but I feel we should not encourage even more to do so. I understand that a UA inadvertently implementing the server syntax is frustrating, maybe there is something we can do to reduce the chances of that happening. Perhaps a more obvious note or some other strong indication that there are different requirements for different implementors. |
That's exactly what brought me to file this issue to begin with. The current spec combined with the current state of widely used implementations cause cookie interop problems. I argue these problems already exist and I propose a way that could work to reduce them. I cannot agree with your risk assessment. |
Clearly the servers of the world don't follow the stricter specified server-side syntax since if they did, the client parser could use and insist on the same syntax and they don't: since they can't. It means servers don't adhere to that syntax. Why then document that syntax? |
It seems we agree that there are user agents (UAs) already that will not accept cookies created with the more lax syntax, so then I do not follow how encouraging more servers to send those types of cookies will reduce problems.
Some servers do not, many do. For example, all my current github.com cookies follow the stricter server syntax. If a server does not care about being widely compatible then they are free to do whatever they want. The spec is for servers that would like to be widely compatible across UA vendors and versions and provides a syntax to accomplish that. This includes UAs that will reject cookies created with the more lax syntax.
They could, if the UA was not interested in being widely compatible across servers and was willing to accept the breakage that that decision incurs. Again, the spec is interested in compatibility and provides a syntax for UAs interested in being widely compatible with many servers. Since it’s possible some servers produce cookie strings that differ from the spec’s suggested server syntax, UAs are encouraged to accept those.
The current spec, when implemented correctly, will result in minimal interoperability problems for both servers and UAs. Compatibility for each side means something slightly different and so their requirements differ accordingly. Whether or not it's easy to implement the spec correctly is a different question. Cookies are messy and come with plenty of historical baggage. Forcing a single syntax will result in breakage and problems for one side or another. |
I strongly disagree. The client-side "lenient" syntax is already what dictates how cookies work on the web. |
Your proposal removes compatibility with existing UAs that reject cookies made with the lax syntax. What benefits do you see that outweigh this loss of compatibility? |
Please be specific and tell me exactly what these UAs are that you speak of? The existing problem with the way the current spec is written, is that existing servers send cookies that contain octets that are fine according to the client-syntax (but not according to the server-syntax). The client-syntax that by the way is used by all the major browsers for example. If we want to write a cookie using HTTP client that interops and works with the same set of cookies that the browsers support, we need to mimic what they accept. That's interop. You seem to speak of clients that explicitly decide to not interop with existing current widely deployed cookie clients and servers. Well, such clients already decided to not interop so they might just as well continue working the way they do. I don't see how us fixing the spec will hurt them. They already decided to run their own race. |
Assuming I'm following this discussion correctly, I think I agree with @bagder. The cookies in the wild conform to the format accepted by major browsers. It seems like the right thing to do is specify that format as the standard. If there are clients that today do not accept cookies that are currently in use in the web, then they are already not interoperable. |
I have no data indicating which/how many UAs would be adversely affected by this nor which/how many servers are already producing cookies that don’t match the stricter syntax. While some servers certainly send spec non-compliant cookies I’m not aware of any indication that this happens a majority of the time nor from a majority of servers, is it really so common to be considered a de facto standard? Perhaps these are the non-interoperable entities that we shouldn’t be encouraging. I’ve checked a number of my open tabs and haven’t found any non-compliant cookies, it seems all of them conform to the stricter server syntax. Any data to this effect would be helpful (possibly browsers could collect it via metrics, it seems straightforward enough). At the moment though, maybe this change will affect zero users, maybe it’ll affect millions, we don’t know; there are a lot of smaller UAs in the wild. So what is the benefit of this change that makes taking that risk worthwhile? Are servers deprived of something they need that this’ll remedy? Will this provide some security improvement? Something else? I’m unclear what the benefits are that make this appealing in the face of potentially breaking users. The spec isn’t unfamiliar with painful changes that are for the better, samesite by default is a great example of this. But that change had clear benefits that made the pain it caused worthwhile. What is this proposal providing? In addition to the possible UA breakage, the promotion of the lax syntax will explicitly allow foot gun behavior that UAs attempt to deal with. This includes the likes of Servers also benefit from abiding by the stricter syntax. Doing so vastly increases the chances the cookies they receive back from different UAs are identical and what the server intended. The UA syntax covers a lot of corner cases and depending on the UA’s implementation it’s possible for subtle differences and bugs to work their way in which could result in Browse A’s cookie not matching browser B’s and/or not matching what the server wanted in the first place. This has been a problem as recently as the past year. All of this is why I’m erring on the side of caution and saying that we should keep the syntaxes as is, for now at least.
My statements were intended for UAs that attempted to implement the spec but are perhaps out of date, have a bugged implementation, or whatever else good faith difference which causes them to reject some or all cookies not created with the stricter server syntax. |
I maintain that my proposal doesn't have any bad impact at all. It helps cookie clients interop better. I maintain the opposite: the current way of writing the spec is hard to read and makes cookie interop harder to accomplish than it needs to be. Cookie clients can already and always opt to drop/discard any cookie at any time, so clients that chose to be more picky and to use stricter checks are fine to do so and if they do that today they already opted to not interoperate with several major cookie clients - and that is fine. But it also indicates that a modified specification won't break anything since such non-interop clients already decided that interop is not top prio and they already decide to not work with for examples the popular browsers. I think a main purpose for this spec is to document how to interoperate. Help for example newcomers in the field how they should write their parsers.
That seems utterly strange to me. If you are concerned about those variations, then why document that clients can allow them and how come the major browsers allow them? Shouldn't then both servers and clients use the stricter character set? The stricter character set that isn't what is used by most implementers I realize we're only going in circles here. |
An alternative would be to have a single syntax, but to use obs-text along with some prose explaining the implications for senders and receivers. |
A single syntax with some notes to implementers seems like a good idea. It
makes it clear that there is only one syntax while providing background for
implementers. SGTM.
…On Sun, Jul 17, 2022 at 3:12 PM Mark Nottingham ***@***.***> wrote:
An alternative would be to have a single syntax, but to use obs-text
<https://httpwg.org/specs/rfc9110.html#fields.values> along with some
prose explaining the implications for senders and receivers.
Message ID: ***@***.***>
|
The reasons I’ve seen so far to support a single syntax have presupposed two important factors:
I’m unaware of any data supporting either claim and do not understand why we should make those assumptions. If we are wrong then it's ultimately the users that will feel the pain. While I support the effort for an easier to read spec, I don’t understand the desire for a single syntax as an end goal.
This is something I’m very interested in remedying and I believe it can be done so in a way that has zero risk of breaking users. A more prominent warning in section 4.1.1 indicating that UAs have a different set of requirements for example. We can almost certainly make the spec less confusing to implementors without materially changing it. Mnot’s suggestion seems interesting, particularly the obs- notation, and I think it’s an improvement to the current proposal. However, given that implementors have previously run into problems by not fully reading the relevant sections I’m not entirely confident that extra prose will be enough to discourage someone from blindly implementing the grammar, although maybe its proximity will help. I also wonder if the additional obs- notation will make the grammar unwieldy, making it harder for both servers and UA to implement correctly. In either case this option still runs the risk of breaking users so I suggest a different tactic such as above.
I agree it’s important, noting that we want this interoperability so that the system is useful to our users. It follows then that we shouldn’t do anything that risks harming our users’ experience unless we have a good reason to. So without any benefits to changing the syntax we're much better off editing the spec for clarity. |
Perhaps it'd be good to discuss this in the meeting... |
I presume that the "big browsers" are pretty big cookie consumers in the world. I would guess they handle more cookies than most other user agents. They are most liberal in what they accept. If we want to interop with them, we need to do the same. Why are browsers this relaxed on what octets they accept? Presumably because of how browsers always work: there are some servers somewhere in use that would make things break if they would go stricter. And they also need to interop with the other major browsers that have the same laxed parsers. I don't think anyone, except perhaps one of the browser vendors, can produce data/numbers for cookie use. And we already know how they parse cookies. |
@mnot |
Yes, it's going to be a tight meeting. You could either do a side meeting, or wait for the interim-in-planning. |
I think it's correct to close this as a dupe |
Hello,
In draft-10 section 4.1.1 we see:
This means that space, comma and double-quotes for example are invalid contents in cookie values and names. Why?
In RFC 6265 the same section says:
(the difference is
%x80-FF
which now is explicitly allowed)Firefox does not ignore all those and claims "parity with Chrome" on this.
If we ignore such cookies, we break compatibility with two major browsers. If we don't, we don't follow the spec.
The text was updated successfully, but these errors were encountered: