You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@azimov -- CohortDiagnostics within Strategus continues to throw SQL errors in the VA environment, and getting these solved is a high-priority for finishing our ExampleStrategusStudy for tutorials (with @schuemie). Here is the SQL error, and I suspect it is occurring in the SUM().
DBMS:
sql server
Error:
com.microsoft.sqlserver.jdbc.SQLServerException: Arithmetic overflow error converting expression to data type int.
SQL:
SELECT t.cohort_definition_id cohort_id,
c.cohort_definition_id comparator_cohort_id,
CAST(1 AS INT) time_id,
COUNT_BIG(DISTINCT c.subject_id) subjects,
-- present in both target and comparator
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_start_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_before_ts,
-- comparator cohort start date before target start date (offset) [How many subjects in comparator cohort start prior to first target cohort start]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date = DATEADD(day, -9999, t.cohort_start_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_on_ts,
-- comparator cohort start date on target start date (offset) [How many subjects in comparator cohort start with first target cohort start]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date > DATEADD(day, -9999, t.cohort_start_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_after_ts,
-- comparator cohort start date after target start date (offset) [How many subjects in comparator cohort start after first target cohort start]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_before_te,
-- comparator cohort start date before target end date (offset) [How many subjects in comparator cohort start after first target cohort end]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date = DATEADD(day, -9999, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_on_te,
-- comparator cohort start date on target end date (offset) [How many subjects in comparator cohort start on first target cohort end]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date > DATEADD(day, -9999, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_after_te,
-- comparator cohort start date after target end date (offset) [How many subjects in comparator cohort start after first target cohort end]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_window_t,
-- comparator cohort subjects start within period (incidence) relative to target start date and end date
-- [How many subjects in comparator cohort start within a window of days relative to first target start date and first target end date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_end_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_end_date <= DATEADD(day, 0, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_ce_window_t,
-- comparator cohort subjects start within period (incidence) relative to target start date and end date
-- [How many subjects in comparator cohort start within a window of days relative to first target start date and first target end date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_start_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_window_ts,
-- comparator cohort subjects start within period (incidence) relative to target start date [How many subjects in comparator cohort start within a
-- window of days relative to first target start date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date >= DATEADD(day, -9999, t.cohort_end_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_cs_window_te,
-- comparator cohort subjects start within period (incidence) relative to target end date [How many subjects in comparator cohort start within a
-- window of days relative to first target end date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_end_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_end_date <= DATEADD(day, 0, t.cohort_start_date)
THEN c.subject_id
ELSE NULL
END) sub_ce_window_ts,
-- comparator cohort subjects end within period (incidence) relative to target start date [How many subjects in comparator cohort end within a
-- window of days relative to first target start date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_end_date >= DATEADD(day, -9999, t.cohort_end_date)
AND c.cohort_end_date <= DATEADD(day, 0, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_ce_window_te,
-- comparator cohort subjects end within period (incidence) relative to target end date [How many subjects in comparator cohort end within a
-- window of days relative to first target end date]
COUNT_BIG(DISTINCT CASE
WHEN c.cohort_start_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_end_date)
AND c.cohort_end_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_end_date <= DATEADD(day, 0, t.cohort_end_date)
THEN c.subject_id
ELSE NULL
END) sub_c_within_t,
-- comparator cohort days within target (offset) days [How many subjects in comparator cohort have their entire cohort period within first target cohort period]
SUM((
CASE -- comparator cohort start date before target start date (offset)
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_start_date)
THEN datediff(dd,
c.cohort_start_date,
CASE --min of comparator end date/target start dates (offset)
WHEN c.cohort_end_date < DATEADD(day, -9999, t.cohort_start_date)
THEN c.cohort_end_date
ELSE DATEADD(day, -9999, t.cohort_start_date)
END)
ELSE 0
END
) + 1) c_days_before_ts,
-- comparator cohort days before target start date (offset)
SUM((
CASE -- comparator cohort start date before target end date (offset)
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_end_date)
THEN datediff(dd,
c.cohort_start_date,
CASE --min of comparator end date/target end dates (offset)
WHEN c.cohort_end_date < DATEADD(day, -9999, t.cohort_end_date)
THEN c.cohort_end_date
ELSE DATEADD(day, -9999, t.cohort_end_date)
END)
ELSE 0
END
) + 1) c_days_before_te,
-- comparator cohort days before target end date (offset)
SUM((
CASE -- comparator cohort days within target days (offset)
WHEN c.cohort_end_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_end_date)
THEN datediff(dd,
CASE --min of comparator start date/target start dates (offset)
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_start_date)
THEN DATEADD(day, -9999, t.cohort_start_date)
ELSE c.cohort_start_date
END,
CASE --min of comparator end date/target end dates (offset)
WHEN c.cohort_end_date > DATEADD(day, 0, t.cohort_end_date)
THEN DATEADD(day, 0, t.cohort_end_date)
ELSE c.cohort_end_date
END)
ELSE 0
END
) + 1) c_days_within_t_days,
-- comparator cohort days within target cohort days (offset)
SUM((
CASE -- comparator cohort end date after target start date (offset)
WHEN c.cohort_end_date > DATEADD(day, -9999, t.cohort_start_date)
THEN datediff(dd,
CASE --max of comparator start date/target start dates (offset)
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_start_date)
THEN DATEADD(day, -9999, t.cohort_start_date)
ELSE c.cohort_start_date
END,
c.cohort_end_date)
ELSE 0
END
) + 1) c_days_after_ts,
-- comparator cohort days after target start date (offset)
SUM((
CASE -- comparator cohort end date after target end date (offset)
WHEN c.cohort_end_date > DATEADD(day, -9999, t.cohort_end_date)
THEN datediff(dd,
CASE --max of comparator start date/target start dates (offset)
WHEN c.cohort_start_date < DATEADD(day, -9999, t.cohort_end_date)
THEN DATEADD(day, -9999, t.cohort_end_date)
ELSE c.cohort_start_date
END,
c.cohort_end_date)
ELSE 0
END
) + 1) c_days_after_te,
-- comparator cohort days after target end date (offset)
SUM(datediff(dd, DATEADD(day, -9999, t.cohort_start_date), DATEADD(day, 0, t.cohort_end_date)) + 1) t_days,
-- target cohort days (no offset)
SUM(datediff(dd, c.cohort_start_date, c.cohort_end_date) + 1) c_days
-- comparator cohort days (offset)
INTO #cohort_rel_output
FROM #target_cohort_table t
INNER JOIN ORD_OHDSI.bv.StrategusExample c ON c.subject_id = t.subject_id
AND c.cohort_definition_id != t.cohort_definition_id
-- comparator cohort overlaps with target cohort during the offset period
AND c.cohort_end_date >= DATEADD(day, -9999, t.cohort_start_date)
AND c.cohort_start_date <= DATEADD(day, 0, t.cohort_end_date)
WHERE t.cohort_definition_id IN (19021,19022,19023,19024,19059,19137,19021001,19022101,19137001)
AND c.cohort_definition_id IN (19022,19023,19024,19059,19137,19021001,19022101,19137001,19021)
GROUP BY t.cohort_definition_id,
c.cohort_definition_id
R version:
R version 4.4.1 (2024-06-14 ucrt)
Platform:
x86_64-w64-mingw32
Attached base packages:
- stats
- graphics
- grDevices
- datasets
- utils
- methods
- base
Other attached packages:
- Strategus (1.0.0)
- CohortGenerator (0.11.2)
- R6 (2.5.1)
- DatabaseConnector (6.3.2)
To sum using BIGINT each argument should (I believe) be first CAST to BIGINT, like SUM(CAST(column) AS BIGINT)
The text was updated successfully, but these errors were encountered:
(The reason we haven't seen this in for example J&J is because we were using RedShift, where DATEDIFF returns a BIGINT. Adding the CAST as Marc suggested would make sure the code runs on all platforms, even including DataBricks, where DATEDIFF also returns an INT).
@azimov : could we have a release soon that implements this? It should be a fairly straightforward change.
@msuchard@schuemie I have attempted a at hotfix for all datediffs and will attempt to release today. Please note that I will be away for some time from the 10th January but will try to fully test this before then
@msuchard@schuemie - Ok I have created a PR #1154 that converts all date diffs to big int and the release is ready to go (this is a cherry pick so develop will be in conflict). However, there are two points:
It would be good if this could be tested on the data where this failed - my hotfix is a stab in the dark as I cannot seem to reproduce the issue on my own.
I won't be around to fix any unforeseen consequences of this release
Regardless, its never ideal to release on a Friday
If you can test it on some suitable data, please feel empowered to merge.
@azimov --
CohortDiagnostics
withinStrategus
continues to throw SQL errors in the VA environment, and getting these solved is a high-priority for finishing ourExampleStrategusStudy
for tutorials (with @schuemie). Here is the SQL error, and I suspect it is occurring in theSUM()
.To sum using
BIGINT
each argument should (I believe) be firstCAST
toBIGINT
, likeSUM(CAST(column) AS BIGINT)
The text was updated successfully, but these errors were encountered: