Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish CodeQL and do some chores #2255

Merged
merged 10 commits into from
Feb 22, 2024

Conversation

thorstenhater
Copy link
Contributor

@thorstenhater thorstenhater commented Feb 7, 2024

Polish CodeQL integration:

  • Fix C++ warnings.
  • Ignore current Python warnings:
    • tsplot is not processing html
    • json/.../serve.py is not ours
  • Add a status badge to README
  • Bump GH Actions as required.
  • Black was unhappy about our new type stubs?!

@thorstenhater thorstenhater changed the title Tentatively fix one instance. Polish CodeQL and do some chores Feb 7, 2024
@thorstenhater thorstenhater marked this pull request as ready for review February 7, 2024 09:26
@thorstenhater thorstenhater requested a review from boeschf February 7, 2024 09:28
@halfflat
Copy link
Contributor

halfflat commented Feb 7, 2024

I was wondering about some of these size_t casts in calls to std::vector::reserve(); they create an inconsistency with prior checks in arb_assert and the explicit casting here makes it look like we're working around some sort of range issue.

Are these there to avoid a potential int32_t overflow? If so, we should consider using different types (viz. std::size_t or std::ptrdiff_t) to represent quantities such as n_sample etc.

For the changes in modcc/io/prefixbuf.cpp, storing the unsigned values from the indent stack and iword-stored tabwidth as size_t is a bit misleading: they really are unsigned values. And while there is a potential unsigned wraparound from their multiplication if the indent code is somehow being abused, I'd assert that it's safer to let it wrap in an unsigned then to start allocating strings of length greater than 4GB.

@thorstenhater
Copy link
Contributor Author

thorstenhater commented Feb 8, 2024

Hi @halfflat,

they're here to silence CodeQL static analysis. I agree, that was a unnecessarily lazy approach to the affair.
The question back is then: Why was sample_size_type chosen to be a signed type in the first place?
We can just accept a bad_alloc on prefixbuf and the analyser's warning; one might argue though that
width and count of tabs should fit in an unsigned char ;).

@halfflat
Copy link
Contributor

halfflat commented Feb 8, 2024

I don't recall why it was signed; I'd have to go look at the git blame :) It might have been because we use them to store differences. It's possible but less likely they get SIMDed somewhere and we don't yet have unsigned SIMD in the library.

@halfflat
Copy link
Contributor

halfflat commented Feb 8, 2024

But on the CodeQL front, what is its complaint? Potential signed overflow, or is it complaining about implicit conversion of signed to size_t in the reserve() calls? If the latter, I don't think that should be something that's flagged: it's a reasonable thing to allow if we are confident that the non-negative precondition is met, and casting to an unsigned type first doesn't fix the problem.

Is there some way of decorating arb_assert so as to make that assertion visible to CodeQL? E.g. if we had

    int n = something.last-something.first;
    arb_assert(n>=0); // macro includes some magic for CodeQL
    vec.reserve(n); // this is fine because we made an assertion that n>=0

it would all be fine?

pbuf->prefix = std::string(tabs*tabwidth, ' ');
if (auto ptr = static_cast<indent_stack*>(s.pword(index));
ptr && !ptr->empty()) {
unsigned n_tab = ptr->top();
Copy link
Contributor

@halfflat halfflat Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! This is terribly misleading formatting.
Edit: what was so bad with unsigned tabs = (!stack_ptr || stack_ptr->empty())? 0: stack_ptr->top();?

Copy link
Contributor Author

@thorstenhater thorstenhater Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew you were going to love it. Isn't it beautiful how the second line of the condition and the first line of the body line up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a certain horrible charm to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

@thorstenhater
Copy link
Contributor Author

Possibly, but I do not think it's worth it even. CodeQL is technically correct in pointing out that possibility, but
it doesn't flag the project as endangered. It will complain, though, if a PR introduces new problems.

The urge to 'fix' this was merely my completionist attitude.

@@ -377,8 +377,8 @@ class sde_recipe: public simple_recipe_base {
// generic advance method used for all pertinent mechanisms
// first argument indicates the number of random variables
void advance_common(unsigned int n_rv, arb_mechanism_ppack* pp) {
const auto width = pp->width;
arb_value_type* ptr = archive_ptr->claim(std::size_t{width} * n_rv);
auto width = static_cast<size_t>(pp->width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just size_t width = pp->width?

@halfflat
Copy link
Contributor

halfflat commented Feb 8, 2024

Regarding the CodeQL flagging the size_t conversion, I'd say we should either ignore it or address it in a robust way (with actual range checks). My inclination is to ignore it.

ptr && !ptr->empty()) {
unsigned n_tab = ptr->top();
unsigned width = s.iword(index);
pbuf->prefix = std::string(n_tab*width, ' ');

Check failure

Code scanning / CodeQL

Multiplication result converted to larger type High

Multiplication result may overflow 'unsigned int' before it is converted to 'size_type'.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed.

using count_type = typename gathered_vector<cell_gid_type>::count_type;

count_type local_size = local_gids.size();
size_t local_size = local_gids.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and all other instance in that file.

@@ -377,8 +377,8 @@ class sde_recipe: public simple_recipe_base {
// generic advance method used for all pertinent mechanisms
// first argument indicates the number of random variables
void advance_common(unsigned int n_rv, arb_mechanism_ppack* pp) {
const auto width = pp->width;
arb_value_type* ptr = archive_ptr->claim(width * n_rv);
std::size_t width = static_cast<size_t>(pp->width);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::size_t width = static_cast<size_t>(pp->width);
auto width = static_cast<std::size_t>(pp->width);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's fine, too.

@@ -138,8 +138,8 @@ using time_type = double;
constexpr time_type terminal_time = std::numeric_limits<time_type>::max();

// For holding counts and indexes into generated sample data.

using sample_size_type = std::int32_t;
using sample_index_type = std::int32_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this type actually used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It -- currently -- isn't, but it's useful when we are considering things like:

sample_size_type a = ..., b = ..., c = a - b;

due to the potential overflow.

Comment on lines 55 to 59
for (size_t i = 0; i < num_ranks_; i++) {
util::append(gathered_gids, local_gids);
}

for (count_type i = 0; i < num_ranks_; i++) {
for (count_type j = i*local_size; j < (i+1)*local_size; j++){
for (size_t i = 0; i < num_ranks_; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more std::size_ts

Copy link
Contributor

@halfflat halfflat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One query and some requests for changing how we deal with the wrapping product warning. Generally, I suspect the cost of a range-checking assert is cheap and provides the guarantees we need while casting up and down the unsigned integer sizes keeps the potential bug while changing its clothes.

Currently arb_assert is a build-time option, and I'm wondering if we could perhaps split its functionality into asserts which should always be checked and those which are really there to be enabled when we're tracking down bugs in our code logic.

arb_assert((unsigned)n_raw_per_sample==p.weight.size());
sample_size_type n_sample = (sc.end_offset - sc.begin_offset)/n_raw_per_sample;
arb_assert((sc.end_offset - sc.begin_offset)==n_sample*n_raw_per_sample);
arb_assert((unsigned)n_raw_per_sample == p.weight.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes to make CodeQL happy, or just for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability. I can get behind a*b + c for similarities with mathematical notation, but =, ==, and especially - tend to merge visually into one token. Or: Too similar to snake_case for my liking.
As much of the code, especially newer parts is in this style, I tend to adjust these expression in passing when editing files. Same for removing redundant headers, which my IDE picks up on these days.

@@ -26,7 +24,7 @@ struct dry_run_context_impl {
count_type local_size = local_spikes.size();

std::vector<spike> gathered_spikes;
gathered_spikes.reserve(local_size*num_ranks_);
gathered_spikes.reserve(std::size_t{local_size}*num_ranks_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't change the (effective) maximum size of the gathered_spikes vector without also changing the range of the indices in the loop 35-37.

We ultimately make a partition of the indices into the variable partition which has count_type as its value type, and this vector is used to construct the gathered_vector return type, which also expects things in terms of count_type. I believe the correct fix is not to cast local_size or use std::size_t here but instead

  1. Add an assertion which confirms that the local_size*num_ranks_ doesn't wrap in count_type, and
  2. Uses std::size_t{local_size*num_ranks_} to keep CodeQL happy or else we tell CodeQL that this is not a bug (my preference).

std::vector<count_type> partition;
for (count_type i = 0; i <= num_ranks_; i++) {
for (std::size_t i = 0; i <= num_ranks_; i++) {
partition.push_back(static_cast<count_type>(i*local_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point doing everything in size_t only to cast back down to count_type here; the alterations just make the potential bug more obscure. IMO the only correct resolution is to have an assert that checks that the product doesn't wrap and then to keep using count_type.

test/unit/test_sde.cpp Dismissed Show dismissed Hide dismissed
@thorstenhater thorstenhater merged commit f720597 into arbor-sim:master Feb 22, 2024
24 checks passed
@thorstenhater thorstenhater deleted the qa/codeql-phase-2 branch February 22, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants