-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Refactor PyUtf8Str #6047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor PyUtf8Str #6047
Conversation
WalkthroughThis change introduces a new UTF-8 validated string type ( Changes
Sequence Diagram(s)sequenceDiagram
participant VM
participant UserCode
participant PyStr
participant PyUtf8Str
UserCode->>VM: Call function expecting string (e.g., execute SQL)
VM->>PyUtf8Str: Accepts PyUtf8StrRef (UTF-8 validated)
PyUtf8Str-->>VM: Provides &str (guaranteed valid UTF-8)
VM->>UserCode: Executes logic without extra UTF-8 validation
Note over VM,PyUtf8Str: If PyStr received, VM calls try_into_utf8 for validation
PyStr->>PyUtf8Str: try_into_utf8(vm)
alt Valid UTF-8
PyUtf8Str-->>VM: Success
else Invalid UTF-8
PyStr-->>VM: Error (UnicodeEncodeError)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
vm/src/builtins/str.rs (1)
1510-1513: Consider using pointer casts instead of transmuteWhile the transmute is safe due to
repr(transparent), using pointer casts is more explicit about the operation being performed.pub fn try_into_utf8(self, vm: &VirtualMachine) -> PyResult<PyRef<PyUtf8Str>> { self.ensure_valid_utf8(vm)?; - Ok(unsafe { mem::transmute::<PyRef<PyStr>, PyRef<PyUtf8Str>>(self) }) + Ok(unsafe { PyRef::map_unchecked(self, |s| &*(s as *const PyStr as *const PyUtf8Str)) }) }
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
stdlib/src/sqlite.rs(13 hunks)vm/src/builtins/builtin_func.rs(2 hunks)vm/src/builtins/mod.rs(1 hunks)vm/src/builtins/namespace.rs(1 hunks)vm/src/builtins/slice.rs(1 hunks)vm/src/builtins/str.rs(11 hunks)vm/src/builtins/super.rs(2 hunks)vm/src/builtins/type.rs(1 hunks)vm/src/exceptions.rs(3 hunks)vm/src/format.rs(1 hunks)vm/src/object/core.rs(2 hunks)vm/src/object/payload.rs(2 hunks)vm/src/protocol/object.rs(3 hunks)vm/src/stdlib/codecs.rs(2 hunks)vm/src/stdlib/ctypes/structure.rs(1 hunks)vm/src/stdlib/io.rs(10 hunks)vm/src/stdlib/operator.rs(3 hunks)vm/src/stdlib/posix.rs(2 hunks)vm/src/stdlib/pwd.rs(2 hunks)vm/src/stdlib/sys.rs(1 hunks)vm/src/stdlib/time.rs(3 hunks)vm/src/types/slot.rs(4 hunks)vm/src/utils.rs(2 hunks)vm/src/vm/mod.rs(1 hunks)vm/src/vm/vm_ops.rs(2 hunks)vm/src/warn.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style (cargo fmtto format)
Always run clippy to lint code (cargo clippy) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
vm/src/builtins/builtin_func.rsvm/src/stdlib/posix.rsvm/src/format.rsvm/src/utils.rsvm/src/stdlib/operator.rsvm/src/builtins/slice.rsvm/src/builtins/mod.rsvm/src/vm/vm_ops.rsvm/src/stdlib/pwd.rsvm/src/stdlib/sys.rsvm/src/warn.rsvm/src/stdlib/time.rsvm/src/stdlib/ctypes/structure.rsvm/src/builtins/namespace.rsvm/src/exceptions.rsvm/src/object/payload.rsvm/src/builtins/super.rsvm/src/types/slot.rsvm/src/stdlib/codecs.rsvm/src/vm/mod.rsvm/src/protocol/object.rsvm/src/stdlib/io.rsvm/src/builtins/type.rsvm/src/object/core.rsvm/src/builtins/str.rsstdlib/src/sqlite.rs
🧠 Learnings (20)
vm/src/builtins/builtin_func.rs (3)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
vm/src/stdlib/posix.rs (3)
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/format.rs (3)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: In most cases, Python code should not be edited. Bug fixes should be made through Rust code modifications only
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as @unittest.expectedFailure, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only
vm/src/utils.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/stdlib/operator.rs (3)
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
vm/src/builtins/mod.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/vm/vm_ops.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/stdlib/pwd.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/stdlib/sys.rs (2)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as @unittest.expectedFailure, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Follow Rust best practices for error handling and memory management
vm/src/stdlib/time.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/exceptions.rs (2)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as @unittest.expectedFailure, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Follow Rust best practices for error handling and memory management
vm/src/object/payload.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/builtins/super.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/types/slot.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/stdlib/codecs.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/protocol/object.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/stdlib/io.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/builtins/type.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/builtins/str.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
stdlib/src/sqlite.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
🧬 Code Graph Analysis (4)
vm/src/stdlib/sys.rs (1)
vm/src/builtins/str.rs (1)
to_str(434-436)
vm/src/stdlib/time.rs (4)
vm/src/vm/vm_ops.rs (1)
format(495-520)vm/src/builtins/str.rs (3)
format(936-940)as_str(430-432)as_str(2046-2053)vm/src/stdlib/builtins.rs (1)
format(393-399)common/src/str.rs (1)
as_str(262-266)
vm/src/object/core.rs (2)
vm/src/object/payload.rs (2)
try_downcast_from(33-49)downcastable_from(29-31)vm/src/builtins/str.rs (3)
try_downcast_from(1973-1976)downcastable_from(1965-1971)a(1571-1571)
vm/src/builtins/str.rs (4)
common/src/str.rs (5)
as_str(262-266)is_utf8(43-45)is_empty(296-298)char_len(301-306)char_len(337-339)vm/src/object/traverse.rs (1)
try_traverse(17-17)vm/src/object/payload.rs (4)
payload_type_id(23-25)std(24-24)downcastable_from(29-31)try_downcast_from(33-49)wtf8/src/lib.rs (2)
as_str(837-839)is_utf8(983-985)
🔇 Additional comments (74)
vm/src/utils.rs (1)
5-5: LGTM! Properly extends ToCString trait for PyUtf8Str.The import addition and trait implementation correctly enable PyUtf8Str to be converted to C strings, maintaining consistency with the existing PyStr implementation. This is essential for C API interoperability in modules like SQLite.
Also applies to: 38-38
vm/src/format.rs (1)
115-115: LGTM! Correctly uses WTF-8 string view.The change from
as_ref()toas_wtf8()is more explicit about the encoding and aligns with the broader refactor to handle UTF-8/WTF-8 string representations consistently across the codebase.vm/src/builtins/slice.rs (1)
295-295: LGTM! Beneficial code simplification.Removing the unnecessary
.as_str()calls simplifies the code sincePyStrRefimplements theDisplaytrait and can be used directly in theformat!macro. This improves readability while maintaining the same functionality.Also applies to: 300-300
vm/src/stdlib/ctypes/structure.rs (1)
42-42: LGTM! Streamlined string conversion.The direct call to
to_string()eliminates the unnecessary intermediate.as_str()step sincePyStrimplements theToStringtrait directly. This simplifies the code while maintaining the same functionality.vm/src/builtins/mod.rs (1)
62-62: LGTM! Properly exposes new UTF-8 string types.Adding
PyUtf8StrandPyUtf8StrRefto the public exports enables their use across the codebase as part of the UTF-8 validation refactor. The naming follows established conventions and placement with other string types is appropriate.vm/src/vm/mod.rs (1)
475-475: LGTM! Consistent with UTF-8 string handling improvements.This change aligns with the broader refactoring to use UTF-8 validated string types and removes the need for explicit
.as_str()conversion when printing error messages.vm/src/builtins/builtin_func.rs (2)
5-5: LGTM! Necessary import for WTF-8 string handling.The import of
Wtf8supports the updated debug formatting that uses WTF-8 string types consistently.
31-31: LGTM! Consistent WTF-8 formatting for debug output.The change ensures consistent type handling in debug formatting by using
Wtf8::new("<unknown>")instead of a plain string literal, matching theWtf8type returned bym.as_wtf8()when the module is present.vm/src/stdlib/pwd.rs (2)
9-9: LGTM! Updated import for UTF-8 validated string types.The import change supports the function signature update to use
PyUtf8StrReffor better type safety.
57-57: LGTM! Improved type safety with UTF-8 validated strings.Changing the parameter type from
PyStrReftoPyUtf8StrRefenforces UTF-8 validation at the API boundary, which is appropriate for usernames and aligns with the broader migration to UTF-8 validated string types.vm/src/builtins/namespace.rs (1)
92-94: LGTM! Improved string representation handling with WTF-8.The change from
as_str()toas_wtf8()improves robustness by handling strings that may contain surrogate code points or ill-formed UTF-8 sequences, while maintaining the same representation logic for namespace objects.vm/src/stdlib/posix.rs (2)
27-27: LGTM: Import addition supports UTF-8 validated strings.The addition of
PyUtf8StrRefto the import list is consistent with the PR's objective to introduce UTF-8 validated string types across the codebase.
2245-2245: LGTM: UTF-8 validated string conversion improves type safety.The change from
PyStrRef::try_from_objecttoPyUtf8StrRef::try_from_objectensures UTF-8 validity at the parameter level for system configuration name parsing. This eliminates the need for fallible UTF-8 conversions while maintaining the same parsing logic.vm/src/stdlib/codecs.rs (2)
10-10: LGTM: Import addition supports UTF-8 validated strings.The addition of
PyUtf8StrRefto the import list enables the use of UTF-8 validated string references in the codec lookup functionality.
26-29: LGTM: UTF-8 validated encoding parameter simplifies codec lookup.The changes improve the
lookupfunction by:
- Accepting
PyUtf8StrRefparameter to ensure UTF-8 validity of encoding names- Using
encoding.as_str()directly instead of the fallibletry_to_str(vm)?conversion- Eliminating potential error handling for UTF-8 validation
This is consistent with the broader refactoring to use UTF-8 validated string references across the codebase.
vm/src/builtins/type.rs (1)
343-343: LGTM: Efficient byte-level comparison for ASCII slot names.This optimization replaces UTF-8 string slice comparisons with direct byte comparisons when checking for Python special method names (dunder methods). Since we're only checking for ASCII characters (
__), this is both safe and more efficient, avoiding unnecessary UTF-8 validation overhead.Also applies to: 349-349
vm/src/builtins/super.rs (1)
107-107: LGTM: Consistent byte-level comparison for__class__checks.Both changes replace string slice comparisons with byte slice comparisons when checking for the
__class__identifier. Since__class__is pure ASCII, this optimization is safe and more efficient, avoiding UTF-8 validation overhead while maintaining identical behavior.Also applies to: 165-165
vm/src/stdlib/operator.rs (3)
8-9: LGTM: Clean import organization.Good organization of imports - adding
Wtf8and groupingEitherwith other function imports improves readability and prepares for WTF-8 string handling.
327-327: LGTM: Simplified ASCII check.Direct call to
is_ascii()onPyStrRefis cleaner and more readable than the previous approach, maintaining the same functionality with better code clarity.
374-382: LGTM: Optimized attribute name handling with WTF-8 support.This change optimizes dotted attribute name processing by:
- Working at the byte level for splitting (more efficient)
- Using WTF-8 conversion to handle potentially ill-formed UTF-8 sequences
- Maintaining compatibility with the existing API
The approach is consistent with the broader string handling optimizations throughout the codebase and properly handles edge cases with non-standard string encodings.
vm/src/vm/vm_ops.rs (2)
4-5: LGTM: Necessary imports for UTF-8 string support.Adding
PyRefandPyUtf8Strimports to support the newformat_utf8method. The imports are properly organized and necessary for the new functionality.
521-523: LGTM: Well-designed UTF-8 formatting method.The new
format_utf8method provides a convenient API for formatting objects into UTF-8 validated strings. The implementation:
- Reuses the existing
formatmethod appropriately- Uses fallible conversion (
try_into_utf8) to ensure UTF-8 validity- Follows the established pattern for UTF-8 string type usage
- Provides clear error handling through the
PyResultreturn typeThis complements the broader UTF-8 string type introduction across the codebase.
vm/src/stdlib/time.rs (3)
38-38: LGTM: Import addition is correct.The addition of
PyUtf8StrRefto the imports is necessary and appropriate for thestrftimefunction signature change.
347-351: LGTM: Function signature update is semantically correct.Changing the
formatparameter fromPyStrReftoPyUtf8StrRefis appropriate sincestrftimeformat strings should be valid UTF-8. This ensures the function receives UTF-8 validated input, eliminating the need for redundant validation.
362-363: LGTM: Direct UTF-8 string access leverages type guarantees.The changes correctly utilize the UTF-8 validity guarantee of
PyUtf8StrRef:
format.as_str()safely provides direct access to the UTF-8 string sliceformat.to_string()appropriately handles the error fallback caseThis simplifies the code by eliminating redundant UTF-8 validation that was previously required with
PyStrRef.vm/src/stdlib/sys.rs (1)
715-715: LGTM: Improved error handling for strings with surrogates.The change from
str.as_str()tostr.to_str().unwrap_or("<str with surrogate>")is a defensive programming improvement that:
- Safely handles strings containing surrogates without panicking
- Provides a clear, descriptive fallback message for debugging
- Prevents crashes in error handling paths when dealing with malformed UTF-8
This aligns well with the broader goal of safer string handling throughout the codebase.
vm/src/warn.rs (5)
125-125: LGTM: UTF-8 validation for warning action strings.Changing from
str(vm)tostr_utf8(vm)is appropriate for warning action strings since they should be valid UTF-8 (e.g., "error", "ignore", "default"). This provides the necessary UTF-8 guarantee for downstream processing.
204-205: LGTM: Robust filename handling for potentially invalid UTF-8.The changes improve filename handling robustness:
- Line 204: Using
as_bytes().ends_with(b".py")provides byte-level comparison that doesn't require UTF-8 validity- Line 205: Using
as_wtf8()for slicing safely handles filenames with potentially invalid UTF-8 sequencesThis prevents crashes when processing filenames containing invalid UTF-8, which can occur in real-world scenarios.
235-235: LGTM: WTF-8 handling for warning message text.Changing from
as_str()toas_wtf8()appropriately handles warning messages that might contain invalid UTF-8 sequences. WTF-8 can safely represent any byte sequence, preventing potential panics when processing malformed warning text.
281-281: LGTM: Consistent UTF-8 validation for action comparisons.The changes from
str(vm)?.as_str()tostr_utf8(vm)?.as_str()are consistent with the UTF-8 validation approach for warning actions. Since these comparisons are against string literals ("error", "ignore"), UTF-8 validation is appropriate and ensures type consistency.Also applies to: 285-285
348-348: LGTM: Simplified string formatting without explicit conversion.Removing the
.as_str()call is a good simplification sincewriteln!can handlePyStrRefdirectly. This also avoids potential panics if the warning text contains surrogates, making the code more robust.vm/src/object/payload.rs (2)
29-31: LGTM: Simple and correct downcast viability check.The
downcastable_frommethod provides a clean interface for checking if an object can be downcasted to the payload type. The implementation correctly usespayload_type_id()for type comparison, following established patterns in the codebase.
33-49: LGTM: Well-designed error handling for safe downcasting.The
try_downcast_frommethod implementation demonstrates excellent Rust practices:
- Type Safety: Uses
downcastable_fromcheck before proceeding- Performance: Cold function optimization prevents inlining the error path
- Consistency: Leverages
vm.new_downcast_type_errorfor uniform error messaging- Ergonomics: Returns
PyResult<()>following established VM patternsThis provides a safe, efficient foundation for downcasting operations throughout the VM with proper error reporting.
vm/src/protocol/object.rs (6)
8-8: LGTM! Import addition aligns with UTF-8 string type introduction.The addition of
PyUtf8Strto the imports is consistent with the broader refactoring to introduce UTF-8 validated string types across the codebase.
331-333: LGTM! UTF-8 validated repr method implementation is correct.The
repr_utf8method properly delegates to the existingreprmethod and converts the result to a UTF-8 validated string usingtry_into_utf8. This follows the established pattern for providing UTF-8 validated variants.
335-335: LGTM! Explicit return type clarification.Making the return type explicitly
PyRef<PyStr>instead of implicit helps distinguish between the generic string method and its UTF-8 validated counterpart (repr_utf8).
353-353: LGTM! Consistent usage of UTF-8 validated repr.The
asciimethod now correctly usesrepr_utf8()instead ofrepr(), which is appropriate since ASCII conversion requires valid UTF-8 input. This ensures type safety and UTF-8 validation.
358-360: LGTM! UTF-8 validated str method implementation is correct.The
str_utf8method follows the same pattern asrepr_utf8, properly delegating to the existingstrmethod and converting to UTF-8 validated string. This provides a clean API for callers who need UTF-8 guarantees.
361-361: LGTM! Explicit return type clarification.Similar to the
reprmethod, making the return type explicitlyPyRef<PyStr>helps distinguish this generic method from its UTF-8 validated counterpart (str_utf8).vm/src/exceptions.rs (2)
201-205: LGTM! Cleaner filename handling with direct match.The refactoring simplifies the filename handling by using a direct
matchstatement instead of chainingunwrap_or_elseandmap. This is more readable and maintainable while preserving the same behavior - using the provided filename or defaulting to"<string>".
1497-1497: LGTM! Improved control flow with single return point.The refactoring consolidates the return logic to use a single return point by assigning the result to a local
strvariable. This is a good practice in Rust as it:
- Reduces code duplication
- Makes the control flow clearer
- Maintains the same functionality while improving readability
The logic remains identical - return the formatted OS error string for 2-argument cases, otherwise delegate to the base
__str__method.Also applies to: 1517-1517, 1519-1521
vm/src/types/slot.rs (3)
172-172: LGTM! Type alias improvement for better clarity.The change from
PyStrReftoPyRef<PyStr>improves type explicitness and aligns with the broader refactoring effort to distinguish string types in the codebase.
253-253: LGTM! Function signature updated consistently.The return type change aligns with the
StringifyFunctype alias update and maintains consistency throughout the codebase.
980-980: LGTM! Trait method signatures updated consistently.All
Representabletrait methods have been updated to use the more explicitPyRef<PyStr>return type, maintaining consistency with theStringifyFunctype alias and the broader string type refactoring effort.Also applies to: 989-989, 994-994
vm/src/object/core.rs (4)
544-547: LGTM! Safe downcasting method with proper error handling.The new
try_downcastmethod provides a safe alternative to the existingdowncastmethod by usingT::try_downcast_fromfor validation and VM-based error generation. The unsafedowncast_uncheckedis only called after successful validation, which is the correct pattern.
728-731: LGTM! Internal method for exposing TypeId.The
typeidmethod provides necessary access to the object's internalTypeIdfor downcasting operations. Thepub(crate)visibility is appropriate for this internal implementation detail.
736-737: LGTM! Delegating downcast check to payload type.The updated
downcastablemethod delegates toT::downcastable_from(self), allowing payload types to customize their downcasting logic. This is particularly useful for types likePyUtf8Strthat need additional validation beyond type ID matching.
740-746: LGTM! Safe reference-based downcasting method.The new
try_downcast_refmethod provides a safe reference-based alternative todowncast_refwith VM-based error handling. The implementation correctly validates throughT::try_downcast_frombefore performing the unsafe cast, maintaining consistency with the other safe downcasting methods.vm/src/stdlib/io.rs (6)
123-123: LGTM! Import addition is correct.The addition of
PyUtf8StrRefto the imports is necessary for the UTF-8 string refactoring and is properly placed alphabetically.
1582-1582: LGTM! Consistent return type updates for repr methods.The changes from
PyStrReftoPyRef<PyStr>for bothslot_reprand__repr__methods are consistent with the broader string type refactoring. The implementation logic remains unchanged, maintaining functionality while improving type safety.Also applies to: 1595-1595
2317-2317: LGTM! Improved string access pattern.The change from
encoding.try_to_str(vm)?toencoding.as_str()is correct and more efficient. Since the parameter is nowPyUtf8StrRef(UTF-8 validated), the fallible conversion is no longer needed, and direct string slice access is appropriate.
2417-2417: LGTM! Consistent UTF-8 string access pattern.The change from
encoding.try_to_str(vm)?toencoding.as_str()follows the same correct pattern as other changes in the file, taking advantage of UTF-8 pre-validation to eliminate unnecessary error handling.
3895-3895: LGTM! Systematic UTF-8 string type adoption.The changes systematically replace
PyStrRefwithPyUtf8StrReffor mode and encoding parameters in I/O-related structs:
IoOpenArgs.mode: Now uses UTF-8 validated strings for file modesOpenArgs.encoding: Now uses UTF-8 validated strings for encoding parametersFileIOArgs.mode: Consistent with the pattern- Import addition in fileio module enables the new types
These changes are consistent, well-structured, and align with the UTF-8 validation goals of the refactoring.
Also applies to: 3921-3921, 4133-4133, 4260-4260
4298-4299: LGTM! Completion of UTF-8 string refactoring pattern.The changes correctly complete the UTF-8 string adoption:
- Line 4298: Using
PyUtf8Str::from("rb")instead ofPyStr::from("rb")is appropriate since "rb" is valid UTF-8- Line 4299: Direct string access with
as_str()replaces fallible conversion, taking advantage of UTF-8 pre-validationThese changes are consistent with the systematic refactoring throughout the file.
vm/src/builtins/str.rs (7)
18-18: LGTM: Import addition for trait implementationThe addition of
MaybeTraverse,Traverse, andTraverseFnimports is necessary for the newPyUtf8Strtype implementation.
67-69: Type aliases properly definedThe type aliases follow the established pattern and provide convenient shorthand for both regular and UTF-8 validated string references.
354-354: Constructor return type correctly updatedThe change from
PyResult<PyRef<Self>>toPyResultallows for proper subclassing support.
438-455: Enhanced UTF-8 validation with better error reportingThe method is correctly made private and now provides detailed error information including the exact position of invalid surrogates.
954-974: Format method properly updated for UTF-8 handlingThe changes ensure format specifications are valid strings and the formatted result is UTF-8 validated before processing.
1475-1483: UTF-8 validation method correctly implementedThe
try_as_utf8method properly validates UTF-8 before performing the safe cast toPyUtf8Str.
1931-2087: Well-structured PyUtf8Str implementationThe
repr(transparent)wrapper design ensures zero overhead while maintaining type safety. The customdowncastable_fromimplementation correctly validates UTF-8 before allowing downcasts.stdlib/src/sqlite.rs (13)
62-62: LGTM: Correct import addition for UTF-8 validated string types.The addition of
PyUtf8StrandPyUtf8StrRefimports is necessary and appropriate for the systematic refactoring to use UTF-8 validated string types in SQL contexts.
855-855: LGTM: Correct parameter type change for SQL statements.Changing from
PyStrReftoPyUtf8StrRefis appropriate for SQL statement parameters, ensuring UTF-8 validation at the type level.
990-990: LGTM: Appropriate type change for SQL parameter.Using
PyUtf8StrReffor the SQL parameter ensures UTF-8 validation, which is essential for SQL statement execution.
1002-1002: LGTM: Consistent type change for SQL parameter.The change to
PyUtf8StrReffor the SQL parameter maintains consistency with other execute methods and ensures UTF-8 validation.
1014-1014: LGTM: Appropriate type change for SQL script parameter.Using
PyUtf8StrReffor the script parameter ensures UTF-8 validation for SQL scripts, which is necessary for proper execution.
1163-1163: LGTM: Correct type change for collation name.Using
PyUtf8StrReffor the collation name parameter ensures UTF-8 validation, which is appropriate for database collation names.
1494-1494: LGTM: Consistent type change for SQL parameter.The change to
PyUtf8StrRefmaintains consistency with Connection execute methods and ensures UTF-8 validation for SQL statements.
1566-1566: LGTM: Appropriate type change for SQL parameter.Using
PyUtf8StrReffor the SQL parameter ensures UTF-8 validation and maintains consistency across execute methods.
1640-1640: LGTM: Consistent type change for SQL script parameter.The change to
PyUtf8StrRefmaintains consistency with Connection::executescript and ensures UTF-8 validation for SQL scripts.
2376-2376: LGTM: Appropriate type change for Statement constructor.Using
PyUtf8StrReffor the SQL parameter in Statement::new ensures UTF-8 validation at the statement creation level, which is foundational for proper SQL handling.
2731-2731: LGTM: Correct UTF-8 conversion for string binding.The explicit conversion to UTF-8 using
try_as_utf8(vm)?is appropriate when binding PyStr parameters that haven't been upgraded to PyUtf8StrRef, ensuring UTF-8 validation is maintained.
2991-2991: LGTM: Consistent UTF-8 conversion for result handling.The explicit UTF-8 conversion using
try_as_utf8(vm)?maintains consistency with parameter binding and ensures UTF-8 validation for string results.
3072-3078: LGTM: Excellent improvement to eliminate redundant UTF-8 validation.The change from
PyStrRefto&PyUtf8Stris a significant improvement that:
- Provides compile-time UTF-8 validation guarantees
- Eliminates the need for runtime UTF-8 checks within the function
- Makes the code more robust and efficient
This represents the core benefit of the refactoring effort.
| self.to_str() | ||
| .expect("Interned PyStr should always be valid UTF-8") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify UTF-8 validity is enforced at string interning time
The expect assumes interned strings are always valid UTF-8. This invariant should be enforced when strings are interned.
🏁 Script executed:
#!/bin/bash
# Search for string interning implementation to verify UTF-8 validation
ast-grep --pattern 'impl $_ {
$$$
fn intern($$$) {
$$$
}
$$$
}'
# Also search for intern_str or similar methods
rg -A 10 "intern_str|new_interned" --type rustLength of output: 35264
🏁 Script executed:
#!/bin/bash
# Locate string_pool usage in Context
rg -n "string_pool" vm/src/vm/context.rs
# Show beginning of context.rs to inspect field definitions
sed -n '1,200p' vm/src/vm/context.rs
# Search for StringPool type and its implementation
rg -n "struct StringPool" -n vm/src/vm
rg -n "impl StringPool" -n vm/src/vmLength of output: 5090
🏁 Script executed:
#!/bin/bash
# Locate the intern module file
rg --files-with-matches "InternableString" -t rust
# Inspect its contents to find StringPool implementation
FILE=$(rg --files-with-matches "InternableString" -t rust | head -n1)
echo "=== Showing top of $FILE ==="
sed -n '1,200p' "$FILE"
# Search for StringPool definition and intern method
rg -n "pub struct StringPool" -n "$FILE"
rg -n "impl StringPool" -n "$FILE"Length of output: 5331
🏁 Script executed:
#!/bin/bash
# Find InternableString and MaybeInternedString definitions in intern.rs
rg -n "trait InternableString" -n vm/src/intern.rs
rg -n "trait MaybeInternedString" -n vm/src/intern.rs
# Show the definitions around those lines
sed -n '250,350p' vm/src/intern.rsLength of output: 2393
Enforce UTF-8 validity when interning strings
The StringPool::intern implementation accepts Wtf8Buf (which may contain unpaired surrogates) without any check, so the later
self.to_str().expect("Interned PyStr should always be valid UTF-8")in vm/src/builtins/str.rs can still panic. We need to validate once at interning time to uphold the invariant.
• In vm/src/intern.rs, inside impl StringPool { pub unsafe fn intern<…> }, right after you get back the PyRefExact<PyStr> (e.g. before calling miss), add:
let str_ref = s.into_pyref_exact(typ);
+ // Validate that the underlying WTF-8 data is valid UTF-8 before inserting
+ let w = str_ref.as_wtf8();
+ debug_assert!(
+ w.to_str().is_some(),
+ "Tried to intern invalid UTF-8 string: {:?}",
+ w
+ );This guarantees that any panic in to_str().expect(…) truly reflects a broken invariant, not a late check.
🤖 Prompt for AI Agents
In vm/src/intern.rs inside the unsafe fn intern method, after obtaining the
PyRefExact<PyStr> and before calling miss, add a validation step to check that
the interned string is valid UTF-8. This ensures the invariant that interned
PyStr is always valid UTF-8 is enforced at interning time, preventing the later
to_str().expect(...) call in vm/src/builtins/str.rs from panicking unexpectedly.
Implement this by attempting to convert the interned string to &str and handling
any invalid UTF-8 by panicking or returning an error immediately.
Rewrite sqlite3 UTF8 validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
vm/src/builtins/str.rs (2)
1345-1351: LGTM: Proper error handling for surrogatesThis change directly addresses the previous review concern about potential panics with surrogate code points. The method now:
- Returns
PyResult<String>instead ofStringfor proper error propagation- Validates UTF-8 using
try_as_utf8(vm)?before processing- Handles surrogates gracefully by returning an error instead of panicking
The TODO comment appropriately acknowledges that full WTF-8 support could be implemented in the future if needed.
2464-2466: Note: Underlying UTF-8 validation issue persistsWhile the expectation message is now more descriptive, this still has the potential to panic if invalid UTF-8 strings are interned. As noted in previous reviews, the invariant that "interned PyStr should always be valid UTF-8" should be enforced at string interning time rather than relying on runtime checks.
Consider the previous suggestion to validate UTF-8 during the interning process in
vm/src/intern.rsto prevent this expectation from ever failing.
🧹 Nitpick comments (1)
vm/src/macros.rs (1)
189-196: Consider adding debug assertions to validate the UTF-8 invariantThe unsafe unchecked downcast relies on the assumption that all VM identifiers are ASCII strings. While this is likely true in practice, consider adding a debug assertion to validate this invariant:
#[macro_export] macro_rules! identifier_utf8( ($as_ctx:expr, $name:ident) => { // Safety: All known identifiers are ascii strings. + #[cfg(debug_assertions)] + debug_assert!( + $as_ctx.as_ref().names.$name.as_object() + .downcast_ref::<$crate::builtins::PyStr>() + .unwrap().is_utf8(), + "identifier {} is not valid UTF-8", stringify!($name) + ); #[allow(clippy::macro_metavars_in_unsafe)] unsafe { $as_ctx.as_ref().names.$name.as_object().downcast_unchecked_ref::<$crate::builtins::PyUtf8Str>() } }; );This would help catch violations of the UTF-8 invariant during development while maintaining the performance benefits of unchecked downcasting in release builds.
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
stdlib/src/sqlite.rs(13 hunks)vm/src/builtins/builtin_func.rs(2 hunks)vm/src/builtins/mod.rs(1 hunks)vm/src/builtins/namespace.rs(1 hunks)vm/src/builtins/slice.rs(1 hunks)vm/src/builtins/str.rs(12 hunks)vm/src/builtins/super.rs(2 hunks)vm/src/builtins/type.rs(1 hunks)vm/src/convert/try_from.rs(1 hunks)vm/src/exceptions.rs(3 hunks)vm/src/format.rs(1 hunks)vm/src/macros.rs(1 hunks)vm/src/object/core.rs(3 hunks)vm/src/object/payload.rs(2 hunks)vm/src/protocol/object.rs(3 hunks)vm/src/stdlib/codecs.rs(2 hunks)vm/src/stdlib/ctypes/structure.rs(1 hunks)vm/src/stdlib/io.rs(14 hunks)vm/src/stdlib/operator.rs(3 hunks)vm/src/stdlib/posix.rs(2 hunks)vm/src/stdlib/pwd.rs(2 hunks)vm/src/stdlib/sys.rs(1 hunks)vm/src/stdlib/time.rs(3 hunks)vm/src/types/slot.rs(4 hunks)vm/src/utils.rs(2 hunks)vm/src/vm/mod.rs(1 hunks)vm/src/vm/vm_ops.rs(2 hunks)vm/src/warn.rs(5 hunks)
✅ Files skipped from review due to trivial changes (3)
- vm/src/builtins/slice.rs
- vm/src/format.rs
- vm/src/exceptions.rs
🚧 Files skipped from review as they are similar to previous changes (22)
- vm/src/stdlib/ctypes/structure.rs
- vm/src/utils.rs
- vm/src/builtins/mod.rs
- vm/src/stdlib/pwd.rs
- vm/src/builtins/super.rs
- vm/src/builtins/namespace.rs
- vm/src/stdlib/operator.rs
- vm/src/warn.rs
- vm/src/builtins/type.rs
- vm/src/stdlib/time.rs
- vm/src/stdlib/posix.rs
- vm/src/stdlib/codecs.rs
- vm/src/builtins/builtin_func.rs
- vm/src/vm/mod.rs
- vm/src/protocol/object.rs
- vm/src/types/slot.rs
- vm/src/object/core.rs
- vm/src/vm/vm_ops.rs
- vm/src/object/payload.rs
- vm/src/stdlib/io.rs
- vm/src/stdlib/sys.rs
- stdlib/src/sqlite.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style (cargo fmtto format)
Always run clippy to lint code (cargo clippy) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
vm/src/macros.rsvm/src/convert/try_from.rsvm/src/builtins/str.rs
🧠 Learnings (2)
vm/src/macros.rs (1)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
vm/src/builtins/str.rs (5)
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to Lib/test/**/*.py : When tests fail due to unsupported syntax, keep the test as @unittest.expectedFailure, document that it requires PEP 695 support, and focus on tests that can be fixed through Rust code changes only
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust
Learnt from: moreal
PR: #5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Learnt from: CR
PR: RustPython/RustPython#0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-07-19T03:16:56.511Z
Learning: Applies to **/*.rs : Follow Rust best practices for error handling and memory management
🧬 Code Graph Analysis (2)
vm/src/macros.rs (2)
vm/src/vm/context.rs (1)
as_ref(634-636)vm/src/object/core.rs (4)
downcast_unchecked_ref(745-745)downcast_unchecked_ref(754-754)downcast_unchecked_ref(767-767)downcast_unchecked_ref(773-777)
vm/src/convert/try_from.rs (4)
vm/src/builtins/str.rs (6)
obj(1969-1969)obj(1975-1975)class(280-282)class(1638-1640)class(1958-1960)try_downcast_from(1974-1977)vm/src/vm/mod.rs (1)
class(567-576)vm/src/object/payload.rs (2)
class(51-51)try_downcast_from(33-49)vm/src/object/core.rs (1)
class(673-675)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run tests under miri
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Check Rust code with rustfmt and clippy
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
🔇 Additional comments (6)
vm/src/convert/try_from.rs (1)
81-86: LGTM: Improved error handling with VM-aware validationThe refactored implementation correctly uses the new
try_downcast_frommethod to validate the downcast before performing the unsafe operation. This change:
- Maintains the performance optimization of
fast_isinstancefor type checking- Adds proper VM-aware validation through
try_downcast_from- Provides consistent error reporting across the codebase
- Only performs the unchecked downcast after validation succeeds
This aligns well with the new UTF-8 string validation infrastructure and improves type safety.
vm/src/builtins/str.rs (5)
67-68: LGTM: Good placement of type aliasesMoving the type aliases to the top of the file improves readability and discoverability. The new
PyUtf8StrReffollows the same naming convention as the existingPyStrRef.
438-455: LGTM: Comprehensive UTF-8 validation with proper error reportingThe
ensure_valid_utf8method provides thorough validation with:
- Proper position tracking for surrogate detection
- Python-compatible Unicode error messages
- Integration with the VM's error system
Making this method private is appropriate since it's an internal validation utility.
506-521: LGTM: Safe UTF-8 validation and casting methodsThese methods provide a safe interface for converting to
PyUtf8Str:
- Proper validation: Uses
ensure_valid_utf8to check for surrogates before casting- Safe casting: Only performs the unsafe cast after validation succeeds
- Correct lifetimes: Maintains proper lifetime relationships between input and output
- Transparent wrapper: The cast is safe due to
PyUtf8Strbeing a transparent wrapper aroundPyStrThe implementation follows good Rust practices for safe abstractions over unsafe operations.
970-990: LGTM: Added UTF-8 validation to string formattingThe addition of
try_into_utf8(vm)?on line 983 ensures that string formatting only operates on valid UTF-8 data. This prevents potential issues with surrogate code points during formatting operations and maintains consistency with the new UTF-8 validation approach.The error handling is properly propagated, and the change maintains backward compatibility while improving robustness.
1939-2101: Excellent implementation of PyUtf8Str wrapper typeThis is a well-designed UTF-8 validated string type that provides:
Type Safety:
- Transparent wrapper around
PyStrwith enforced UTF-8 invariants- Safe accessor methods that leverage the UTF-8 guarantee
- Proper downcast validation in
try_downcast_fromComprehensive API:
- Multiple
Fromimplementations for UTF-8 compatible types- Standard trait implementations (Debug, Display, Traverse, PyPayload)
- Upcast support via
as_pystr()method- Equality traits based on string content
Integration:
- Proper
payload_type_id()to share type identity withPyStr- Runtime validation in
downcastable_fromchecking UTF-8 validity- Safe unchecked operations justified by type invariants
The implementation follows Rust best practices for newtype wrappers and provides a solid foundation for UTF-8 validated string handling throughout the VM.
cc @ever0de
I am going to add a new validation step for PyUtf8Str. This is useful for sqlite3
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Style