Safety Data new API #2734

atalyaalon · 2024-12-05T17:38:51Z

API template of existing documentation using OPEN API 3.0.0 in this pr.
Existing paths: /involved, /involved/groupby, /city
Regarding hotspots and users, we will implement them in the API after we finish building the current API for the existing state.
You can play with documentation (mock data) here

Please review the API documentation created (of existing situation) together with existing tables in Mongo DB ("accidents" represent involved and "cities" represet the CBS cities)
Please suggest a modified version of API together with Design for safety data tables, based on our existing DB and tables.
You can use the API template I created with swaggerhub, or use another template for your choice.

I want you to take into consideration:

Existing mongo db table containing involved named "accidents" and can be found here) - you should have access
I assume that with the new API will need to perform some FE adjustments. That's OK, we'll adjust but we will consult Dror to make sure changes are minor and won't require a huge additional work.
In current implementation of API querying, we have "AND" between filter fields and "OR" between chosen field values
Performance should be good, just like current Safety Data API.

My suggestions and thoughts:

cbs_cities table:

In our current CBS cbs_cities table, I think we're missing only population field (other fields in mongoDB cities table are not used at the moment). believe you can add the population field from data_gov?
I suggest that FE will query cities info based on yishuv_symbol

cbs_streets table:

I suggest adding an API for it, or better - using our existing API
I suggest that FE will query streets info based on yishuv_symbol and street symbol.
There for either: this won't enable to query a specific street in multiple cities, but I think that's ok for now (if we want to enable this, we will find workaround that is still based on
streets aliasing - (dataset can be found here) - I think that if we do, it will be the last feature since it's not a requirement from the municipality at the moment.

cbs_involved_for_safety_data table:

Existing API does not enable filtering by all involved parameters (for example weather field ), even though it exists in mongoDB. I think we should enable as much flexibility as possible to filter by all relevant fields.
However, we can also start "lean" and only keep existing functionality with relevant fields, and later on to add additional fields that are not present.
Will we create a new table? (or perhaps we'll understand we want to use involved_markers_hebrew with improved structure?)
If we create a new table, I assume this table will be similar to involved_markers_hebrew, just w/o the hebrew fields. Perhaps it's better to save in this table only with integer codes and link it with relevant hebrew views (star schema) for a light table.
I wonder if pagination is needed when data is large, let's discuss our options (I believe that the curr '/accidents' API pulling from mongo's "accidents" table is not paginated).
Note the difference between current field vehicle_vehicle_type_hebrew that contains the involve's vehicle_type AND vehicles that contains a list of all involved vehicles using the following mapping that contains grouping of all vehicles from the specific accident:

vehicle_type=1 ->"רכב נוסעים פרטי"
vehicle_type=2 -> "טרנזיט"
vehicle_type=3 -> "טנדר"
vehicle_type  in (4, 5, 6, 7, 24 ,25) -> "משאית"
vehicle_type  in (8, 9, 10, 19) -> "אופנוע"
vehicle_type in (11 ,18) -> 'אוטובוס'
vehicle_type = 12 -> 'מונית' 
vehicle_type = 13 -> 'רכב עבודה' 
vehicle_type = 14 -> 'טרקטור' 
vehicle_type = 15 -> 'אופניים' 
vehicle_type = 16 -> 'רכבת'
vehicle_type = 17 -> 'אחר ולא ידוע'
vehicle_type = 21 -> 'קורקינט חשמלי'
vehicle_type = 22 -> 'קלנועית חשמלית' 
vehicle_type = 23 -> 'אופניים חשמליים'

There are mappings done in the FE, for example here in injured_type, 4,5 are mapped to motorcycle, 6,7 to cyclist. Should we keep these mapping in FE? or transfer them to BE?

The text was updated successfully, but these errors were encountered:

ziv17 · 2024-12-09T16:55:20Z

Hi @atalyaalon ,
Regarding
"Existing mongo db table containing involved named "accidents" and can be found here) - you should have access"
I did not find anything there. It looks empty (I tried both my yahoo and gmail accounts).

ziv17 · 2024-12-10T18:29:54Z

Hi @atalyaalon ,
regarding viewing the MongoDB tables. To which of my emails did you grant access? I tried logging in with both of my emails and did not see anything. Maybe I do not know where to look

ziv17 · 2024-12-21T06:13:30Z

Hi @atalyaalon ,
Regarding the _id field that is returned in the /involved request: what does it identify?

Should it be the same for the same person that is involved in two different accidents that are returned in the same query?
Should it be the same for the same person that is involved in the same accident that are returned in two different queries?
Will just specifying the order of the returned items in each query suffice?

ziv17 · 2024-12-26T12:30:42Z

Hi @atalyaalon , @citizen-dror
Regarding the involved query parameters:

The parameters have short names, that are not always self explanatory. If we want these names to stay, I will use a translation table from these short names to the DB column names we use in Anyway.
In the API document, several parameters can be assigned a list of values. I understand that such a list will be specified in the URL using multiple appearances of the parameter name, with a single value each time.

ziv17 · 2024-12-29T06:41:15Z

Hi @atalyaalon , @citizen-dror
Regarding error handling,
In case of an error in a parameter, e.g. unknown parameter, or illegal value, Should the API return 404, or write error to log and return the answer as if that parameter was not specified?

atalyaalon · 2025-01-01T14:07:03Z

Calculated fields:
@citizen-dror FYI - I listed the calculated fields that are needed in the API.

We need the following fields / fields modifications in API.

@ziv17 the implementation of the fields below can be done in either python or sql (with SQLAlchemy), they can be done either with offline calculations (and mid tables if needed) OR during API query itself (some of the logic are very light and can be done during query) depends on your choice of implementation.

The most important things are:

data consistency. That tables are updated as a whole (same as cache tables, same as markers_heberw and involved_markers_hebrew, in all we have data consistency)
API Performance is kept ~the same. Both of involved (today's accident) query and of group by queries.

The calculated fields:

vehicle_type_short_hebrew
This field is not used by itself in API but we will use it for calculation of the other fields listed below, hence I added its query.
This field is created by this mapping - see query
Note that current mapping does not depend on year / provider_code fields. I verified it.

SELECT id,
       YEAR,
       provider_code,
       vehicle_type_hebrew,
       CASE
           WHEN id = 2 THEN 'טרנזיט'
           WHEN id = 3 THEN 'טנדר'
           WHEN id IN (4,
                       5,
                       6,
                       7,
                       24,
                       25) THEN 'משאית'
           WHEN id IN (8,
                       9,
                       10,
                       19) THEN 'אופנוע'
           WHEN id IN (11,
                       18) THEN 'אוטובוס'
           ELSE vehicle_type_hebrew
       END AS vehicle_type_short_hebrew
FROM vehicle_type
ORDER BY YEAR DESC, provider_code,
                    id

injured_type_hebrew we send in API should be the injured_type_enriched_hebrew field in this query.
Note that current mapping does not depend on year / provider_code fields. I verified it.
Also note that even though we change the Hebrew field values, the FE is still querying by injured_type integer values and we keep it this way for now.

SELECT injured_type,
       injured_type_hebrew,
       vehicle_type,
       vehicle_type_hebrew,
       vehicle_type_short_hebrew,
       CASE
           WHEN injured_type IN (NULL,
                                 0) THEN CONCAT(SPLIT_PART('מעורב שלא נפגע', ' - ', 1), ' - ', vehicle_type_short_hebrew)
           WHEN vehicle_type IS NOT NULL THEN CONCAT(SPLIT_PART(injured_type_hebrew, ' - ', 1), ' - ', vehicle_type_short_hebrew)
           ELSE injured_type_hebrew
       END AS injured_type_enriched_hebrew
FROM
  (SELECT DISTINCT injured_type,
                   vehicle_type,
                   vehicle_type_hebrew,
                   injured_type_hebrew,
                   CASE
                       WHEN vehicle_type = 2 THEN 'טרנזיט'
                       WHEN vehicle_type = 3 THEN 'טנדר'
                       WHEN vehicle_type IN (4,
                                             5,
                                             6,
                                             7,
                                             24,
                                             25) THEN 'משאית'
                       WHEN vehicle_type IN (8,
                                             9,
                                             10,
                                             19) THEN 'אופנוע'
                       WHEN vehicle_type IN (11,
                                             18) THEN 'אוטובוס'
                       ELSE vehicle_type_hebrew
                   END AS vehicle_type_short_hebrew
   FROM involved
   LEFT JOIN vehicle_type ON involved.vehicle_type = vehicle_type.id
   AND involved.accident_year = vehicle_type.year
   AND involved.provider_code = vehicle_type.provider_code
   LEFT JOIN injured_type ON involved.injured_type = injured_type.id
   AND involved.accident_year = injured_type.year
   AND involved.provider_code = injured_type.provider_code) IVT

Note that I added in the query the calculation of vehicle_type_short_hebrew

vehicle_vehicle_type_hebrew we send in API should be the vehicle_type_enriched_hebrew field in this query
Note that current mapping does not depend on year / provider_code fields. I verified it.
Also note that even though we change the Hebrew field values, the FE is still querying by vehicle_type integer values and we keep it this way for now.

SELECT injured_type,
       injured_type_hebrew,
       vehicle_type,
       vehicle_type_hebrew,
       CASE
           WHEN injured_type = 1 THEN injured_type_hebrew
           ELSE vehicle_type_hebrew
       END AS vehicle_type_enriched_hebrew
FROM
  (SELECT DISTINCT injured_type,
                   vehicle_type,
                   vehicle_type_hebrew,
                   injured_type_hebrew
   FROM involved
   LEFT JOIN vehicle_type ON involved.vehicle_type = vehicle_type.id
   AND involved.accident_year = vehicle_type.year
   AND involved.provider_code = vehicle_type.provider_code
   LEFT JOIN injured_type ON involved.injured_type = injured_type.id
   AND involved.accident_year = injured_type.year
   AND involved.provider_code = injured_type.provider_code) IVT

vehicles we send in API should be the vehicles_distinct_concat_hebrew field in this query. The grouping is per accident (which is unique by accident_id, provider_code, accident_year - the three of them are primary keys in markers table which contains the accidents)

SELECT accident_id,
       provider_code,
       accident_year,
       STRING_AGG(vehicle_type_short_hebrew, ', '
                  ORDER BY vehicle_type_short_hebrew) AS vehicles_distinct_concat_hebrew
FROM
  (SELECT DISTINCT 
          vehicles.accident_id,
          vehicles.provider_code,
          vehicles.accident_year,
          CASE
              WHEN vehicle_type = 2 THEN 'טרנזיט'
              WHEN vehicle_type = 3 THEN 'טנדר'
              WHEN vehicle_type IN (4,
                                    5,
                                    6,
                                    7,
                                    24,
                                    25) THEN 'משאית'
              WHEN vehicle_type IN (8,
                                    9,
                                    10,
                                    19) THEN 'אופנוע'
              WHEN vehicle_type IN (11,
                                    18) THEN 'אוטובוס'
              ELSE vehicle_type_hebrew
          END AS vehicle_type_short_hebrew
   FROM vehicles
   LEFT JOIN vehicle_type ON vehicles.vehicle_type = vehicle_type.id
   AND vehicles.accident_year = vehicle_type.year
   AND vehicles.provider_code = vehicle_type.provider_code) IVT
GROUP BY accident_id,
         provider_code,
         accident_year

atalyaalon · 2025-01-01T14:10:00Z

Hi @atalyaalon , Regarding the _id field that is returned in the /involved request: what does it identify?

Should it be the same for the same person that is involved in two different accidents that are returned in the same query?

Should it be the same for the same person that is involved in the same accident that are returned in two different queries?

Will just specifying the order of the returned items in each query suffice?

As discussed, it's the id field in involved, which is unique for each involve in the table (different from involved_id which is unique only within a specific accident)

atalyaalon · 2025-01-01T14:15:36Z

Hi @atalyaalon , @citizen-dror Regarding the involved query parameters:

The parameters have short names, that are not always self explanatory. If we want these names to stay, I will use a translation table from these short names to the DB column names we use in Anyway.

In the API document, several parameters can be assigned a list of values. I understand that such a list will be specified in the URL using multiple appearances of the parameter name, with a single value each time.

Yes. I think that mapping in our Backed from short names to long names is a good idea
Today in API, both options work for BE, but FE sends it with commas and not with multiple appearances.
We can either handle both cases, or ask @citizen-dror to change it to a preferred option

atalyaalon · 2025-01-01T14:18:00Z

Hi @atalyaalon , @citizen-dror Regarding error handling, In case of an error in a parameter, e.g. unknown parameter, or illegal value, Should the API return 404, or write error to log and return the answer as if that parameter was not specified?

I think the API should return 404, @citizen-dror what do you think?

ziv17 · 2025-01-02T15:57:27Z

Hi @atalyaalon , Regarding the calculated fields:

What are the names of the calculated fields that are returned by the /involved query?
Is there a calculated field that is used as a query parameter?
Is it OK to separate the grouping of the values and the conversion to Hebrew? I prefer to place in the tables numeric values.
Regarding 3, and 4. Do the values injured_type_enriched_hebrew and vehicle_type_enriched_hebrew depend only on injured_type and vehicle_type?
I did not see the vehicles table being used in these queries. Do we need it?

atalyaalon · 2025-01-09T10:29:06Z

Hi @atalyaalon , Regarding the calculated fields:

What are the names of the calculated fields that are returned by the /involved query?
following @citizen-dror's comment, the previous injured_type_enriched_hebrew is splitted into two fields:

vehicle_type_short_hebrew (by this mapping)
injured_type_short_hebrew - with the logic of injured_type_short_hebrew in this query
vehicle_vehicle_type_hebrew - with the logic of vehicle_type_enriched_hebrew field in this query
vehicles with the logic of vehicles in this query
I suggest the we'll also keep the existing field injured_type_hebrew in the API for now, until @citizen-dror will approve its removal.

Is there a calculated field that is used as a query parameter?

Yes.
Existing injured_type_hebrew (in API it's injt filter) (that we'll keep in API)
vehicle_vehicle_type_hebrew (in API it's vcl filter)
vehicles (in API it's vcli filter)

Is it OK to separate the grouping of the values and the conversion to Hebrew? I prefer to place in the tables numeric values.

Yes, I prefer it too, and ideally it will be built with localization in mind, we want to add localization to this API to English in the coming future.

Regarding 3, and 4. Do the values injured_type_enriched_hebrew and vehicle_type_enriched_hebrew depend only on injured_type and vehicle_type?

As mentioned in 1 - we separated injured_type_enriched_hebrew into two fields.

I did not see the vehicles table being used in these queries. Do we need it?

It is used, in field vehicles in this query

atalyaalon assigned ziv17 Dec 5, 2024

atalyaalon linked a pull request Dec 29, 2024 that will close this issue

Safety data schema #2735

Open

atalyaalon linked a pull request Jan 1, 2025 that will close this issue

Create safety_data_api.yaml #2733

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety Data new API #2734

Safety Data new API #2734

atalyaalon commented Dec 5, 2024 •

edited

Loading

ziv17 commented Dec 9, 2024

ziv17 commented Dec 10, 2024

ziv17 commented Dec 21, 2024

ziv17 commented Dec 26, 2024

ziv17 commented Dec 29, 2024

atalyaalon commented Jan 1, 2025 •

edited

Loading

atalyaalon commented Jan 1, 2025 •

edited

Loading

atalyaalon commented Jan 1, 2025

atalyaalon commented Jan 1, 2025

ziv17 commented Jan 2, 2025

atalyaalon commented Jan 9, 2025 •

edited

Loading

Safety Data new API #2734

Safety Data new API #2734

Comments

atalyaalon commented Dec 5, 2024 • edited Loading

ziv17 commented Dec 9, 2024

ziv17 commented Dec 10, 2024

ziv17 commented Dec 21, 2024

ziv17 commented Dec 26, 2024

ziv17 commented Dec 29, 2024

atalyaalon commented Jan 1, 2025 • edited Loading

atalyaalon commented Jan 1, 2025 • edited Loading

atalyaalon commented Jan 1, 2025

atalyaalon commented Jan 1, 2025

ziv17 commented Jan 2, 2025

atalyaalon commented Jan 9, 2025 • edited Loading

atalyaalon commented Dec 5, 2024 •

edited

Loading

atalyaalon commented Jan 1, 2025 •

edited

Loading

atalyaalon commented Jan 1, 2025 •

edited

Loading

atalyaalon commented Jan 9, 2025 •

edited

Loading