Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop_times.txt: Interpretation of times during daylight savings time transition #325

Open
npaun opened this issue May 5, 2022 · 11 comments
Labels
Change: Clarification Revisions of the current specification to improve understanding. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule

Comments

@npaun
Copy link
Contributor

npaun commented May 5, 2022

In GTFS, specifying trip schedules during the transition into and out of daylight savings time is complex.

Background

Given the definition of time provided in the specification,

Time in the HH:MM:SS format (H:MM:SS is also accepted). The time is measured from "noon minus 12h" of the service day (effectively midnight except for days on which daylight savings time changes occur). For times occurring after midnight, enter the time as a value greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins.
Example: 14:30:00 for 2:30PM or 25:35:00 for 1:35AM on the next day.

Applying this rule to the date DST begins (March 13, 2022 in my region) and the date it ends (Nov 6, 2022), we obtain the following chart:

Table 1

DST Begins DST Ends
GTFS time Date represented GTFS time Date represented
N/A N/A -01:00 Sun Nov 6 00:00:00 PDT 2022
00:00 Sat Mar 12 23:00:00 PST 2022 00:00 Sun Nov 6 01:00:00 PDT 2022
01:00 Sun Mar 13 00:00:00 PST 2022 01:00 Sun Nov 6 01:00:00 PST 2022
02:00 Sun Mar 13 01:00:00 PST 2022 02:00 Sun Nov 6 02:00:00 PST 2022
03:00 Sun Mar 13 03:00:00 PDT 2022 03:00 Sun Nov 6 03:00:00 PST 2022
04:00 Sun Mar 13 04:00:00 PDT 2022 04:00 Sun Nov 6 04:00:00 PST 2022
05:00 Sun Mar 13 05:00:00 PDT 2022 05:00 Sun Nov 6 05:00:00 PST 2022
06:00 Sun Mar 13 06:00:00 PDT 2022 06:00 Sun Nov 6 06:00:00 PST 2022
07:00 Sun Mar 13 07:00:00 PDT 2022 07:00 Sun Nov 6 07:00:00 PST 2022
08:00 Sun Mar 13 08:00:00 PDT 2022 08:00 Sun Nov 6 08:00:00 PST 2022
09:00 Sun Mar 13 09:00:00 PDT 2022 09:00 Sun Nov 6 09:00:00 PST 2022
10:00 Sun Mar 13 10:00:00 PDT 2022 10:00 Sun Nov 6 10:00:00 PST 2022
11:00 Sun Mar 13 11:00:00 PDT 2022 11:00 Sun Nov 6 11:00:00 PST 2022
12:00 Sun Mar 13 12:00:00 PDT 2022 12:00 Sun Nov 6 12:00:00 PST 2022

This has some unusual implications, which I am not certain are understood in the same way by all data producers and consumers.

  • It is impossible to express the time "Nov 6 00:00" as part of the Nov 6 service day, unless negative hours are used. The specification never clearly defines or prohibits this practice.

  • Consider a shuttle that runs once an hour on :15 past the hour, every day of the week. One could use this very simple schedule for the entire year:

Time
00:15
01:15
...
22:15
23:15

But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.

Questions

  • Producers: How do you model your data during the those special hours when the DST transition is occurring? Aside from GTFS, how are service changes during those times communicated to riders?
  • Consumers: Have you run into issues with DST while processing data, and do you use any workarounds internally?

Potential changes

To reduce confusion, the specification could state that all trips during the DST transition (e.g. 00:00-03:00 Mar 13 this year in my timezone) shall be ignored. Producers would be required to use times between 24:00-27:00 Mar 12 instead. However, this is just a starting point for discussion and I hope that we can collaborate to find a good solution to this problem as a community.

@derhuerst
Copy link

related: #15 (comment)
I also wrote this down a while ago, much like you did: https://gist.github.com/derhuerst/574edc94981a21ef0ce90713f1cff7f6

@e-lo
Copy link

e-lo commented May 12, 2022

@npaun and @derhuerst I'm not super familiar with the DST issue (as much as the timezone one) so please feel free to update my #328 to reflect a solution that meets your needs!

@derhuerst
Copy link

derhuerst commented May 13, 2022

First, I'll give my perspective on your specific statements:

It is impossible to express the time "Nov 6 00:00" as part of the Nov 6 service day, unless negative hours are used. The specification never clearly defines or prohibits this practice.

Indeed, we must allow negative GTFS Time values to allow people expressing e.g. 2022-11-06T00:30-07:00.

Consider a shuttle that runs once an hour on :15 past the hour, every day of the week. […] But this would actually create two duplicative trips on "Mar 12 23:15". When DST ends, it would skip the "Nov 6 00:15" run. Unfortunately, I've rarely encountered producers creating special services for the DST transition days.

Yes, but DST <-> standard time switches will almost always have to be handled in a special way, as AFAIK almost all public transport timetables either use headways (both explicitly as part of the published timetable, or implicitly from an operations perspective) or at least recurring wall clock times.


My opinion phrased in a more general way:

  • While GTFS should be as easy to consume as reasonably possible, its design should not encourage wrong interpretations/processing of the data. DST <-> standard time switches are just one example of this, stop_timezone/agency_timezone (Clarify time value: timezones + values spanning midnight #328, Clarify implied timezone for time values in stop_times.txt #322) another.
  • As we have elaborated, GTFS Time – as it is currently defined – doesn't actually follow "wall clock time semantics" as one might assume from the word "time". Rather, it is an offset from "noon - 12h".
  • Redefining the GTFS Time to work like "wall clock time" would not only break many existing datasets, it also wouldn't save us from having to define an additional trip during DST end. That aside, my feeling is that it would probably open a whole new range of unintended edge cases. Except in the specific cases discussed in this Issue, the current definition elegantly allows defining stop times for longer periods of days with a concise markup.
  • I don't see how any other ways of defining stop times would be less complex than the current way is. For example, you cannot process a "wall clock time" anyways without knowing the date and timezone; It is inherently a complex operation!

I propose to:

  • rename Time to TimeOffset (but keep {arrival,departure,start,end}_time as is) to make clear the DST implications,
  • allow Time values to be negative, but only if the resulting date+time still refers to the service day. (Does that make sense?)
  • add a rule that, when processing a GTFS dataset, duplicate "runs" (as in "same trip_id, same point in time") solely caused by the DST start should be filtered out,
  • add a reminder to specify an additional "run" during DST end.

cc @juliuste

@MartinH-open
Copy link

I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data.
E.g. in Germany we have a daylight saving zone
(MESZ, engl. CEST). This time has a +2h Offset to UTC. Therefore all timestamps for Germany during this summer time use a timezone often labeled [UTC+2]. But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all.
So each agency needs to clearly specify its specific timezones. In Germany this might imply the two specifications: MEZ, engl. CET and MESZ, engl. CEST
From external sources GTFS users need to know when the switch day and time is for the specified timezones.
Today this is not part of the GTFS specifications AFAIK.

@derhuerst
Copy link

I don't know how today in GTFS the zone which applies a daylight saving is identified. Each agency carries a timezone attributes to identify which zone it uses in GTFS data.
[...] But "UTC+2" is not enough to identify that a daylight saving is applied . E.g. for many african countries in this timezone there is no daylight saving at all.
[…]
From external sources GTFS users need to know when the switch day and time is for the specified timezones.
Today this is not part of the GTFS specifications AFAIK.

Unfortunately, the terms "time(zone) offset" and "time zone" are not used very precisely; The Time zone and List of time zones Wikipedia articles are good examples of this.

But from my experience, modern technical systems have settled on time zone identifiers as defined by the tz database. Its time zone definitions include all relevant information when and how shifts occur.

The GTFS Timezone field type uses tz identifiers:

Timezone – TZ timezone from the https://www.iana.org/time-zones. Timezone names never contain the space character but may contain an underscore. Refer to https://en.wikipedia.org/wiki/List_of_tz_zones for a list of valid values.
https://gtfs.org/schedule/reference/

@npaun
Copy link
Contributor Author

npaun commented May 27, 2022

@derhuerst

Thank you for taking the time to consider this topic in detail.

Backwards compatible changes

Regarding the suggestions you've proposed:

  1. Renaming Time to TimeOffset

Agreed -- this would slightly improve the clarity of the spec.

  1. Allow Time values to be negative, but only if the resulting date+time still refers to the service day.

Agreed, as this is necessary in order to express those times.

  1. Add a rule that, when processing a GTFS dataset, duplicate "runs" (as in "same trip_id, same point in time") solely caused by the DST start should be filtered out,

Ideally, I'd love to have a heuristic that consumers can use to filter out redundant runs. But I'm not sure what definition we could use.

For instance if we had this situation:

Trip Start time Route Service
1 00:15 20 Sundays
2 23:15 20 Saturdays

Then 1 and 2 would be mapped onto the same time on the date DST begins. What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically. I feel we'd need a quite extensive definition of duplication to solve this. What do you think?

  1. Add a reminder to specify an additional "run" during DST end.

Agreed.


On top of these, I'd like to add a few suggestions as well.

  1. Add a section to the spec reminding agencies that runs during the DST transition are special cases, and that extra trips may need to be added on those das.

This section would include a small example showing cases where the reference point is not midnight, similar to the section we already have for clarifying the effect of block_id. It would probably include 'Table 1' from my initial message on this issue.

  1. It is recommended that trips occuring during the period of DST transition be expressed using the previous service day with times greater than 24:00:00.

This would cover midnight-3am on the date DST begins and the date DST ends, and I think could help make matters less confusing.


I propose to open a PR next week to add suggestions 1, 2, 4, 5, 6 to the spec.

I don't think we can do much more to improve the situation in a backwards compatible way. Perhaps we could consider adding additional fields relating to behaviour in DST, but I haven't come up with anything yet.

@timMillet
Copy link
Contributor

timMillet commented May 27, 2022

Suggestion for resolving the point 3.:

"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same route_id and that have the same combination of (stop_id, stop_sequence) in stop_times.txt than trips starting after 22:59:59 on the service day before."

What do you all think?

@derhuerst
Copy link

Suggestion for resolving the point 3.:

"On service day where DST begins, GTFS consumers MAY remove trips starting before 01:00:00 that are assigned to the same route_id and that have the same combination of (stop_id, stop_sequence) in stop_times.txt than trips starting after 22:59:59 on the service day before."

This hard-codes the time offset by which the DST shifts; I'm not sure if it always is 1h, and I strongly prefer using a definition that doesn't rely on it being 1h. Same for the time when the DST shift occurs.

Also, as the GTFS spec have more complex rules over time if two stop times are identical (I'm thinking about GTFS-Flex, GTFS-RT, etc.), we should try to find a phrasing that doesn't have to be adapted to them.

What do you think about the following definition? I have put the timezone-related part in brackets, because we might want to discuss such phrasing in #322/#328.

"GTFS consumers MAY remove each trip on the service date where the DST begins after the DST shift, if there is an equivalent (same route_id, stop_ids & stop_sequences, and all other rules applying) trip before the DST shift that effectively starts at the same same moment [, taking stop_timezone & agency_timezone into account].
For example, with a timezone of America/Los_Angeles, a trip starting at 00:44:55 on service date 20220313 MAY be removed if there is an equivalent trip starting at 23:44:55 on 20220312."


I just noticed that we should clarify what happens to other entities referencing the removed stop times, e.g. GTFS-RT StopTimeUpdates. Are they allowed to drop them? Should the apply both trips' updates to the remaining one?

@derhuerst
Copy link

What if trip 2 instead started at 23:16? It wouldn't change the real-world conclusion but would also be different algorithmically.

I have a different interpretation: If there were two buses (of the same route_id etc.) with slightly varying start times on a different date than the DST begin, I would assume them to be two separate physical "runs", so I would apply the same interpretation to the DST shift.

@npaun
Copy link
Contributor Author

npaun commented Jun 2, 2022

@derhuerst, @timMillet

I've revised the wording based on your suggestions:

GTFS consumers MAY remove duplicated trips occurring on the service day on which DST begins or ends, between midnight and the time of transition, as defined by the timezone specified by [agency.agency_timezone | stop.stop_timezone]. A trip is a duplicate of another if they have the same route_id and combination of (stop_id, stop_sequence), and both trips [overlap in time | start at the same time].
For example, with a timezone of America/Los_Angeles, a trip starting at 00:44:55 on service date 20220313 MAY be removed if there is an equivalent trip starting at 23:44:55 on 20220312.

(Alternate wordings are in brackets)

  • agency_timezone vs stop_timezone: It seems that discussion on this topic has stalled but the status quo seems to be agency_timezone, so I'd choose this option.
  • overlap in time vs start at the same time: While requiring the same start time is more targeted, I worry that in many cases the equivalent trip doesn't start at exactly the same time (see examples below).

Currently the static specification doesn't seem to mention GTFS-RT at all, so I'm wondering where would be the best place to add that info about StopTimeUpdates.

Examples

Often in North America, agencies will provide relatively frequent service, without using clock-face scheduling.


stm-51

Montreal (route 51) When DST starts, the 23:58 and 00:55 trips would end up superimposed.


bctk-8

Kelowna (route 8): When DST starts the last two trips would conflict.

@emmambd emmambd added the GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule label May 10, 2023
@Sergiodero Sergiodero added the Change: Clarification Revisions of the current specification to improve understanding. label Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Clarification Revisions of the current specification to improve understanding. GTFS Schedule Issues and Pull Requests that focus on GTFS Schedule
Projects
None yet
Development

No branches or pull requests

7 participants