-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/base32: Add RFC-compliant error handling and improve reliability #4641
Conversation
Replace assertions with proper error handling in base32.decode() to allow programs to handle invalid input gracefully rather than crashing. The function now returns ([]byte, Error) instead of just []byte.
Fix buffer allocation size calculation and add proper bounds checking to ensure output buffer has sufficient space. This fixes crashes that could occur with inputs like "AA" and other edge cases where the output buffer was too small. Remove #no_bounds_check as proper bounds checking is necessary for safe error handling. The small performance trade-off is worth the improved robustness.
Add test suite based on RFC 4648 test vectors and validation rules: - Add section 10 test vectors for valid encoding/decoding - Add test cases for invalid character handling (section 3.2) - Add test cases for padding validation (section 4) - Add test cases for length requirements (section 6) The test vectors verify that: - Empty string encodes/decodes correctly - Standard cases like "foo" -> "MZXW6===" work - Invalid characters are rejected - Missing or malformed padding is detected - Invalid lengths are caught
Rework base32.decode() to properly handle all cases per RFC 4648: - Fix error detection order: - Check minimum length first (Invalid_Length) - Check character validity (Invalid_Character) - Check padding and structure (Malformed_Input) - Fix padding validation: - Add required padding length checks (2=6, 4=4, 5=3, 7=1 chars) - Ensure padding only appears at end - Fix handling of unpadded inputs - Fix buffer handling: - Proper output buffer size calculation - Add bounds checking for buffer access - Add proper buffer validation For example: - "M" correctly returns Invalid_Length (too short) - "mzxq====" correctly returns Invalid_Character (lowercase) - "MZXQ=" correctly returns Malformed_Input (wrong padding) - Unpadded input lengths must be multiples of 8 These changes make the decode function fully compliant with RFC 4648 requirements while providing proper error handling.
Fix memory handling throughout base32 package: - Make padding map package-level constant (to avoid repeated allocs) - Use passed allocator in encode's make() call - Add defer delete for allocated memory in encode - Add proper cleanup in test cases - Fix memory cleanup of output buffers The changes ensure consistent allocator usage and cleanup in both implementation and tests.
Replace package-level map with a simple switch statement for padding validation. This eliminates allocations we can't properly free while maintaining the same validation logic.
Add support for custom alphabet validation through an optional validation function parameter. The default validation follows RFC 4648 base32 alphabet rules (A-Z, 2-7). This properly supports the documented ability to use custom alphabets.
Fix encoding to properly use provided encoding table parameter instead of hardcoded `ENC_TABLE`. This makes encode properly support custom alphabets as documented.
The decoding table was only 224 bytes which caused type mismatches when using custom alphabets, so expand with zeroes to cover full byte range while maintaining the same decoding logic.
Move existing test procedures to a dedicated test file for better code organization and maintainability.
Add test_base32_roundtrip() to verify the encode->decode roundtrip preserves data integrity. This test helps ensure our base32 implementation correctly handles the full encode->decode cycle without data loss or corruption.
Remove premature deallocation of the output buffer which was causing use-after-free behavior. The returned string needs to take ownership of this memory, but the defer delete was freeing it before the string could be used. This fixes issues with encoding that were introduced by overly aggressive memory cleanup in 93238db.
Add test case to verify custom alphabet support. The test uses a decimal-uppercase alphabet (0-9, A-V) to test both encoding and decoding with custom tables, including validation. This ensures the encode and decode functions work correctly with custom encoding tables and validation functions as documented.
Add defer delete for encoded strings across all test procedures to ensure proper cleanup and prevent memory leaks. This completes the memory management improvements started in 591dd87.
60f794a
to
d6f4412
Compare
Fix incorrect RFC 4648 section references: - Add RFC URL reference at package level - Update Error enum documentation to reference correct sections: - Invalid_Character: Section 3.3 (non-alphabet characters) - Invalid_Length: Section 6 (base32 block size requirements) - Malformed_Input: Section 3.2 (padding) - Fix test file section references to match correct sections This ensures all RFC references are accurate and adds a link to the source RFC for reference.
encoding/base64 needs the same treatment I guess? |
Yup! That is why I mentioned it under "Future Work". If this happens to get merged, I will work on encoding/base64 as well. |
Missed it ngl 🙂 |
Add `@(rodata)` attribute to `ENC_TABLE` and `DEC_TABLE` to mark them as read-only data. This places these tables in the read-only section of the executable, protecting them from modification during program execution.
// Validate padding and length | ||
data_len := len(data) | ||
padding_count := 0 | ||
for i := data_len - 1; i >= 0; i -= 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#no_bounds_check
can be used here
// Check for proper padding and length combinations | ||
if padding_count > 0 { | ||
// Verify no padding in the middle | ||
for i := 0; i < data_len - padding_count; i += 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#no_bounds_check
can be used here
} | ||
|
||
@private | ||
_encode :: proc(out, data: []byte, ENC_TBL := ENC_TABLE, allocator := context.allocator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can #no_bounds_check
can be used for this proc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nitpicks. Looks good to me otherwise. Also thanks for adding a test suite for the package.
content_len := data_len - padding_count | ||
mod8 := content_len % 8 | ||
required_padding: int | ||
switch mod8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this works fine, but I might be a bit more explicit about the values here:
// Remainders of 2, 4, 5, and 7 need to be padded to 8 bytes.
switch mod8 {
case 2, 4, 5, 7: required_padding = 8 - mod8
case: required_padding = 0
}
Alternatively, you could use a LUT:
// Remainders of 2, 4, 5, and 7 need to be padded to 8 bytes.
PAD_LENGTH := [8]int{0, 0, 6, 0, 4, 3, 0, 1}
required_padding := PAD_LENGTH[mod8]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Should we not keep the explicit mapping as it directly documents the relationship between character count and required padding (e.g., "2 chars need 6 padding chars"). Using 8 - mod8
or a LUT would obscure this relationship by hiding the actual padding requirements behind a formula or indirect mapping. The current form makes the RFC requirements more apparent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read this particular RFC, so I'll defer to you on that. But to me at least, 6, 4, 3 and 1 felt like magic numbers, even though they were clearly derived from 8 - mod8
, so the 8 - mod8
felt more explicit than "these are the magic numbers". If you follow my train of thought.
But that could be added to the comments instead?
// Requirement is for 2, 4, 5 and 7 remaining characters to be padded to 8. So the values here are `8 - mod8`.
switch mod8 {
case 2: required_padding = 6 // 2 chars need 6 padding chars
case 4: required_padding = 4 // 4 chars need 4 padding chars
case 5: required_padding = 3 // 5 chars need 3 padding chars
case 7: required_padding = 1 // 7 chars need 1 padding char
case: required_padding = 0
}
I've been considering writing a TOTP implementation for |
I would be interested in contributing my TOTP implementation to I haven't yet pushed the changes that utilize the improved error handling for the As a side note, I noticed there are many uses of I've also been working on a minimal libcurl wrapper, available at odin-curl. The goal is to provide libcurl functionality with idiomatic Odin features where appropriate, rather than just a direct FFI binding. I'm happy to collaborate on Discord (I've joined but haven't been very active), though I should mention that I may be away for a couple of months due to health reasons. If the TOTP work can't wait, feel free to use what you need from my code - it's completely fine. :) |
Great. I'll give it a read soon.
Indeed.
Many of those are intentional, as they deal with programmer/user error. If a certain state is only possible because of misuse of an API, not untrusted input, then an assert is sensible. It's the kind of feedback a programmer can't ignore and will helpfully tell them they've neglected to pass an appropriately sized buffer, for example. So the rule of thumb we've settled on for newer implementations is to have asserts for "contract violations", and errors for well, errors.
We'll have to have a think about whether that fits. As a binding it would have to live in
At your leisure. Health comes first. |
Exactly, that's what I was thinking as well. :)
Yeah, I know, it belongs to |
Thank you a lot! :) |
Thank you for the package's cleanup. :-) Don't mind the CI error. That spurious one occasionally happens on the RISCV target, and I don't really feel like special casing the test to ignore it on that target. |
The current implementation of base32.decode() uses assertions for (incomplete) validation, which causes programs to crash on invalid input. This PR replaces assertions with proper error handling according to RFC 4648, allowing users of the package to handle invalid input gracefully.
Motivation: While implementing a pure Odin implementation of TOTP (Time-based One-Time Password), I discovered that base32.decode() would segfault on invalid input due to assertions. As a core library package, it should provide proper error handling to allow users to gracefully handle decode errors, which these changes enable.
Problem Description
#no_bounds_check
) prevents proper error handlingThe original implementation prioritized performance by disabling bounds checking, but this tradeoff is not appropriate for a core library package. The small or potentially negligible performance overhead from bounds checking is worth the improved safety and usability.
Implementation Details
The error enum:
Before (crashes on invalid input):
After (error handling available):
Key implementation changes:
([]byte, Error)
instead of just[]byte
#no_bounds_check
directive for safetyDesign Decisions
Bounds Checking:
#no_bounds_check
for performanceError Handling:
(nil, .None)
by conventionTesting:
Breaking Changes
Function Signature:
([]byte, Error)
instead of[]byte
Performance:
Error Handling:
Testing
Test coverage includes:
All tests pass with 100% coverage of the new functionality and error cases.
Future Work
Thank you for considering this contribution.
Happy New Year! :)