Polish CodeQL and do some chores #2255

thorstenhater · 2024-02-07T08:27:26Z

Polish CodeQL integration:

Fix C++ warnings.
Ignore current Python warnings:
- tsplot is not processing html
- json/.../serve.py is not ours
Add a status badge to README
Bump GH Actions as required.
Black was unhappy about our new type stubs?!

halfflat · 2024-02-07T15:06:18Z

I was wondering about some of these size_t casts in calls to std::vector::reserve(); they create an inconsistency with prior checks in arb_assert and the explicit casting here makes it look like we're working around some sort of range issue.

Are these there to avoid a potential int32_t overflow? If so, we should consider using different types (viz. std::size_t or std::ptrdiff_t) to represent quantities such as n_sample etc.

For the changes in modcc/io/prefixbuf.cpp, storing the unsigned values from the indent stack and iword-stored tabwidth as size_t is a bit misleading: they really are unsigned values. And while there is a potential unsigned wraparound from their multiplication if the indent code is somehow being abused, I'd assert that it's safer to let it wrap in an unsigned then to start allocating strings of length greater than 4GB.

thorstenhater · 2024-02-08T08:31:27Z

Hi @halfflat,

they're here to silence CodeQL static analysis. I agree, that was a unnecessarily lazy approach to the affair.
The question back is then: Why was sample_size_type chosen to be a signed type in the first place?
We can just accept a bad_alloc on prefixbuf and the analyser's warning; one might argue though that
width and count of tabs should fit in an unsigned char ;).

halfflat · 2024-02-08T09:23:06Z

I don't recall why it was signed; I'd have to go look at the git blame :) It might have been because we use them to store differences. It's possible but less likely they get SIMDed somewhere and we don't yet have unsigned SIMD in the library.

…fixes.

halfflat · 2024-02-08T09:30:29Z

But on the CodeQL front, what is its complaint? Potential signed overflow, or is it complaining about implicit conversion of signed to size_t in the reserve() calls? If the latter, I don't think that should be something that's flagged: it's a reasonable thing to allow if we are confident that the non-negative precondition is met, and casting to an unsigned type first doesn't fix the problem.

Is there some way of decorating arb_assert so as to make that assertion visible to CodeQL? E.g. if we had

    int n = something.last-something.first;
    arb_assert(n>=0); // macro includes some magic for CodeQL
    vec.reserve(n); // this is fine because we made an assertion that n>=0

it would all be fine?

halfflat · 2024-02-08T09:32:18Z

modcc/io/prefixbuf.cpp

-        pbuf->prefix = std::string(tabs*tabwidth, ' ');
+        if (auto ptr = static_cast<indent_stack*>(s.pword(index));
+            ptr && !ptr->empty()) {
+            unsigned n_tab = ptr->top();


Argh! This is terribly misleading formatting.
Edit: what was so bad with unsigned tabs = (!stack_ptr || stack_ptr->empty())? 0: stack_ptr->top();?

I knew you were going to love it. Isn't it beautiful how the second line of the condition and the first line of the body line up?

It has a certain horrible charm to it.

thorstenhater · 2024-02-08T09:34:53Z

Possibly, but I do not think it's worth it even. CodeQL is technically correct in pointing out that possibility, but
it doesn't flag the project as endangered. It will complain, though, if a PR introduces new problems.

The urge to 'fix' this was merely my completionist attitude.

halfflat · 2024-02-08T09:37:02Z

test/unit/test_sde.cpp

@@ -377,8 +377,8 @@ class sde_recipe: public simple_recipe_base {
 // generic advance method used for all pertinent mechanisms
 // first argument indicates the number of random variables
 void advance_common(unsigned int n_rv, arb_mechanism_ppack* pp) {
-    const auto width = pp->width;
-    arb_value_type* ptr = archive_ptr->claim(std::size_t{width} * n_rv);
+    auto width = static_cast<size_t>(pp->width);


Why not just size_t width = pp->width?

halfflat · 2024-02-08T09:39:29Z

Regarding the CodeQL flagging the size_t conversion, I'd say we should either ignore it or address it in a robust way (with actual range checks). My inclination is to ignore it.

thorstenhater · 2024-02-21T20:00:44Z

modcc/io/prefixbuf.cpp

+            ptr && !ptr->empty()) {
+            unsigned n_tab = ptr->top();
+            unsigned width = s.iword(index);
+            pbuf->prefix = std::string(n_tab*width, ' ');


boeschf · 2024-02-15T08:21:39Z

arbor/communication/dry_run_context.cpp

-        using count_type = typename gathered_vector<cell_gid_type>::count_type;
-
-        count_type local_size = local_gids.size();
+        size_t local_size = local_gids.size();


std::size_t?

Yes and all other instance in that file.

boeschf · 2024-02-15T08:28:34Z

test/unit/test_sde.cpp

@@ -377,8 +377,8 @@ class sde_recipe: public simple_recipe_base {
 // generic advance method used for all pertinent mechanisms
 // first argument indicates the number of random variables
 void advance_common(unsigned int n_rv, arb_mechanism_ppack* pp) {
-    const auto width = pp->width;
-    arb_value_type* ptr = archive_ptr->claim(width * n_rv);
+    std::size_t width = static_cast<size_t>(pp->width);


Suggested change

std::size_t width = static_cast<size_t>(pp->width);

auto width = static_cast<std::size_t>(pp->width);

Yes, that's fine, too.

boeschf · 2024-02-15T08:30:33Z

arbor/include/arbor/common_types.hpp

@@ -138,8 +138,8 @@ using time_type = double;
 constexpr time_type terminal_time = std::numeric_limits<time_type>::max();

 // For holding counts and indexes into generated sample data.
-
-using sample_size_type = std::int32_t;
+using sample_index_type = std::int32_t;


where is this type actually used?

It -- currently -- isn't, but it's useful when we are considering things like:

sample_size_type a = ..., b = ..., c = a - b;

due to the potential overflow.

boeschf · 2024-02-15T12:59:47Z

arbor/communication/dry_run_context.cpp

+        for (size_t i = 0; i < num_ranks_; i++) {
            util::append(gathered_gids, local_gids);
        }

-        for (count_type i = 0; i < num_ranks_; i++) {
-            for (count_type j = i*local_size; j < (i+1)*local_size; j++){
+        for (size_t i = 0; i < num_ranks_; i++) {


2 more std::size_ts

halfflat

One query and some requests for changing how we deal with the wrapping product warning. Generally, I suspect the cost of a range-checking assert is cheap and provides the guarantees we need while casting up and down the unsigned integer sizes keeps the potential bug while changing its clothes.

Currently arb_assert is a build-time option, and I'm wondering if we could perhaps split its functionality into asserts which should always be checked and those which are really there to be enabled when we're tracking down bugs in our code logic.

halfflat · 2024-02-19T04:03:02Z

arbor/cable_cell_group.cpp

-    arb_assert((unsigned)n_raw_per_sample==p.weight.size());
+    sample_size_type n_sample = (sc.end_offset - sc.begin_offset)/n_raw_per_sample;
+    arb_assert((sc.end_offset - sc.begin_offset)==n_sample*n_raw_per_sample);
+    arb_assert((unsigned)n_raw_per_sample == p.weight.size());


Are these changes to make CodeQL happy, or just for readability?

Readability. I can get behind a*b + c for similarities with mathematical notation, but =, ==, and especially - tend to merge visually into one token. Or: Too similar to snake_case for my liking.
As much of the code, especially newer parts is in this style, I tend to adjust these expression in passing when editing files. Same for removing redundant headers, which my IDE picks up on these days.

halfflat · 2024-02-19T04:25:48Z

arbor/communication/dry_run_context.cpp

@@ -26,7 +24,7 @@ struct dry_run_context_impl {
        count_type local_size = local_spikes.size();

        std::vector<spike> gathered_spikes;
-        gathered_spikes.reserve(local_size*num_ranks_);
+        gathered_spikes.reserve(std::size_t{local_size}*num_ranks_);


We shouldn't change the (effective) maximum size of the gathered_spikes vector without also changing the range of the indices in the loop 35-37.

We ultimately make a partition of the indices into the variable partition which has count_type as its value type, and this vector is used to construct the gathered_vector return type, which also expects things in terms of count_type. I believe the correct fix is not to cast local_size or use std::size_t here but instead

Add an assertion which confirms that the local_size*num_ranks_ doesn't wrap in count_type, and

Uses std::size_t{local_size*num_ranks_} to keep CodeQL happy or else we tell CodeQL that this is not a bug (my preference).

halfflat · 2024-02-19T04:28:50Z

arbor/communication/dry_run_context.cpp

        std::vector<count_type> partition;
-        for (count_type i = 0; i <= num_ranks_; i++) {
+        for (std::size_t i = 0; i <= num_ranks_; i++) {
            partition.push_back(static_cast<count_type>(i*local_size));


There's no point doing everything in size_t only to cast back down to count_type here; the alterations just make the potential bug more obscure. IMO the only correct resolution is to have an assert that checks that the product doesn't wrap and then to keep using count_type.

test/unit/test_sde.cpp

thorstenhater added 3 commits February 7, 2024 09:26

Tentatively fix one instance.

e6d0e14

Add status badge, bump actions, and fix CodeQL warnings

61fff65

Satisfy the linter.

9f5d14e

thorstenhater changed the title ~~Tentatively fix one instance.~~ Polish CodeQL and do some chores Feb 7, 2024

thorstenhater marked this pull request as ready for review February 7, 2024 09:26

thorstenhater requested a review from boeschf February 7, 2024 09:28

et tu, Cmake

8b0c6f9

Make sample size type unsigned.

a170cfa

Accept that prefixbuf might fail to allocate and clean the remaining …

da0d437

…fixes.

halfflat reviewed Feb 8, 2024

View reviewed changes

github-advanced-security bot found potential problems Feb 8, 2024

View reviewed changes

thorstenhater requested a review from halfflat February 14, 2024 18:33

Make a less irritating formatting happen, use explicit types.

d48759d

boeschf reviewed Feb 15, 2024

View reviewed changes

Be more precise with size_t.

9d25ce6

thorstenhater requested a review from boeschf February 15, 2024 11:04

boeschf reviewed Feb 15, 2024

View reviewed changes

halfflat suggested changes Feb 19, 2024

View reviewed changes

thorstenhater added 2 commits February 21, 2024 20:09

Revert to unsighed types.

883065e

One last.

2e2f329

github-advanced-security bot found potential problems Feb 21, 2024

View reviewed changes

test/unit/test_sde.cpp Dismissed Show dismissed Hide dismissed

boeschf approved these changes Feb 22, 2024

View reviewed changes

thorstenhater merged commit f720597 into arbor-sim:master Feb 22, 2024
24 checks passed

thorstenhater deleted the qa/codeql-phase-2 branch February 22, 2024 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polish CodeQL and do some chores #2255

Polish CodeQL and do some chores #2255

thorstenhater commented Feb 7, 2024 •

edited

Loading

halfflat commented Feb 7, 2024

thorstenhater commented Feb 8, 2024 •

edited

Loading

halfflat commented Feb 8, 2024

halfflat commented Feb 8, 2024

halfflat Feb 8, 2024 •

edited

Loading

thorstenhater Feb 8, 2024 •

edited

Loading

halfflat Feb 8, 2024

thorstenhater Feb 21, 2024

thorstenhater commented Feb 8, 2024

halfflat Feb 8, 2024

halfflat commented Feb 8, 2024

thorstenhater Feb 21, 2024

boeschf Feb 15, 2024

thorstenhater Feb 15, 2024

boeschf Feb 15, 2024

thorstenhater Feb 15, 2024

boeschf Feb 15, 2024

thorstenhater Feb 15, 2024

boeschf Feb 15, 2024

halfflat left a comment

halfflat Feb 19, 2024

thorstenhater Feb 21, 2024

halfflat Feb 19, 2024

halfflat Feb 19, 2024

	std::size_t width = static_cast<size_t>(pp->width);
	auto width = static_cast<std::size_t>(pp->width);

Polish CodeQL and do some chores #2255

Polish CodeQL and do some chores #2255

Conversation

thorstenhater commented Feb 7, 2024 • edited Loading

halfflat commented Feb 7, 2024

thorstenhater commented Feb 8, 2024 • edited Loading

halfflat commented Feb 8, 2024

halfflat commented Feb 8, 2024

halfflat Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

thorstenhater Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thorstenhater commented Feb 8, 2024

Choose a reason for hiding this comment

halfflat commented Feb 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halfflat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thorstenhater commented Feb 7, 2024 •

edited

Loading

thorstenhater commented Feb 8, 2024 •

edited

Loading

halfflat Feb 8, 2024 •

edited

Loading

thorstenhater Feb 8, 2024 •

edited

Loading