Skip to content

Conversation

@ShaharNaveh
Copy link
Collaborator

@ShaharNaveh ShaharNaveh commented Dec 2, 2025

closes #6312

Summary by CodeRabbit

  • Refactor
    • Modernized core compiler infrastructure with expanded support for asynchronous operations, pattern matching, and enhanced exception handling throughout the bytecode compilation and execution pipeline.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

This PR performs a comprehensive restructuring of the bytecode instruction set across the compiler, JIT, and VM, reorganizing opcode variants alphabetically and updating instruction handlers to support an expanded set of operations including async/context-manager support, advanced data-structure construction, pattern matching, and improved control flow and exception handling.

Changes

Cohort / File(s) Summary
Bytecode Instruction Enum Restructuring
crates/compiler-core/src/bytecode.rs
Major reorganization of the public Instruction enum with alphabetically sorted variants. Removed legacy opcodes and introduced new/renamed suite including async (BeforeAsyncWith, SetupAsyncWith, EndAsyncFor), container construction (BuildList, BuildTuple, BuildMap, BuildSet variants), pattern matching (MatchClass, MatchKeys, MatchMapping), and expanded control flow (JumpIfFalseOrPop, JumpIfTrueOrPop, PopBlock, PopException). Updated LAST_INSTRUCTION constant from ExtendedArg to YieldValue, affecting marshaling bounds and range checks.
JIT Instruction Handler Expansion
crates/jit/src/instructions.rs
Reworked FunctionCompiler::add_instruction with support for expanded instruction set including ExtendedArg, Jump variants, LoadFast/Global/Const, Pop operations, block management (SetupLoop, PopBlock), comparisons (CompareOperation with numeric and mixed-type paths), and double-double arithmetic (dd_\* functions). Added handling for UnpackSequence, BuildTuple, CallFunction variants, and enhanced error paths (NotSupported, BadBytecode).
VM Frame Execution Handler Rewrite
crates/vm/src/frame.rs
Extensive expansion of ExecutingFrame::execute_instruction with comprehensive handlers for: async operations (BeforeAsyncWith, GetAIter, GetANext, EndAsyncFor), container construction (BuildList, BuildTuple, BuildMap, BuildSet with tuple/iterable variants), pattern matching (MatchClass, MatchKeys, MatchMapping), method/attribute resolution (LoadMethod, LoadAttr, LoadBuildClass), finalization logic (WithCleanupStart/Finish, EndFinally, PopException), and call patterns (CallFunctionEx, CallMethodEx/Keyword/Positional). Includes detailed extraction logic and match_args protocol handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Enum variant completeness: Verify all instruction variants are correctly alphabetized and none are missing; cross-reference against CPython opcode_ids.h structure
  • Oparg type consistency: Ensure oparg handling and type conversions are uniform across bytecode.rs, instructions.rs (JIT), and frame.rs (VM)
  • Async/with/finally semantics: Validate BeforeAsyncWith, SetupAsyncWith, EndAsyncFor, WithCleanupStart/Finish, and exception preservation logic match Python semantics
  • Pattern matching protocol: Confirm match_args extraction and MATCH_SELF handling are correctly implemented
  • Control flow branching: Review PopJumpIfFalse/PopJumpIfTrue, JumpIfFalseOrPop, and block management for correctness
  • Edge cases: Pay attention to async protocol mismatches, attribute access failures, and exception unwinding paths

Possibly related PRs

Suggested reviewers

  • youknowone

Poem

🐰 Opcodes shuffled, sorted A to Z,
Async hops and patterns dance with glee,
Containers built, exceptions caught just right,
The bytecode garden blooms in Python's light!

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title describes sorting the Instruction enum and match arms, but the changes are far broader, including significant opcode restructuring, new instruction variants, and extensive implementation work across multiple files. Update the title to reflect the actual scope: something like 'Restructure Instruction enum variants and expand opcode support across compiler, VM, and JIT' would better capture the comprehensive changes made.
Linked Issues check ⚠️ Warning The PR only partially addresses #6312 objectives. While enum variants appear reorganized, the changes include extensive new implementations (BeforeAsyncWith, async/with handling, pattern matching, advanced call patterns) far beyond sorting and converting named fields to tuple variants. Clarify whether the PR's expanded instruction set implementation is in scope for #6312 or should be split into separate issues. The linked issue focuses on sorting and field refactoring, not substantial new feature implementation.
Out of Scope Changes check ⚠️ Warning Significant out-of-scope changes detected. Beyond sorting and field conversion (#6312 scope), the PR introduces comprehensive new instruction handling for async operations (BeforeAsyncWith, GetAIter, GetANext, EndAsyncFor), pattern matching (MatchClass, MatchKeys, MatchMapping), advanced call patterns, and extensive new implementations in frame.rs and instructions.rs. Review and separate feature implementations from refactoring. Move new async, pattern-matching, and call-handling implementations to distinct issues/PRs unless they are explicitly part of #6312's scope with documented justification.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ShaharNaveh ShaharNaveh marked this pull request as ready for review December 2, 2025 12:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
crates/jit/src/instructions.rs (1)

337-357: Incomplete handling of InplaceLshift and InplaceRshift in shift operations.

The match pattern at line 337 only handles BinaryOperator::Lshift | BinaryOperator::Rshift, but the inner conditional at line 350 checks for BinaryOperator::Lshift | BinaryOperator::InplaceLshift. This means if InplaceLshift or InplaceRshift is passed, the outer match won't catch it, leading to NotSupported error.

Apply this diff to handle the inplace variants consistently:

                     (
-                        BinaryOperator::Lshift | BinaryOperator::Rshift,
+                        BinaryOperator::Lshift | BinaryOperator::InplaceLshift 
+                        | BinaryOperator::Rshift | BinaryOperator::InplaceRshift,
                         JitValue::Int(a),
                         JitValue::Int(b),
                     ) => {
crates/vm/src/frame.rs (1)

2087-2114: Unsafe pointer manipulation in execute_set_function_attribute.

The unsafe block at lines 2108-2111 casts away const-ness and mutates the function payload. While the comment claims this is safe because SetFunctionAttribute always follows MakeFunction, this pattern is fragile and could lead to UB if the invariant is violated.

Consider documenting this invariant more prominently or using a safer pattern like RefCell if the performance cost is acceptable.

crates/compiler-core/src/bytecode.rs (1)

1727-1729: MatchKeys stack effect comment appears incorrect.

The comment says "Pop 2 (subject, keys), push 3 (subject, keys_or_none, values_or_none)" but the actual effect is 1, which would mean net +1 to the stack. Looking at the VM implementation (frame.rs lines 1227-1293), MatchKeys doesn't pop subject and keys - it reads them and pushes one value (either values tuple or None). The stack effect of 1 is correct, but the comment is misleading.

Update the comment to accurately reflect the stack behavior:

-            MatchKeys => 1, // Pop 2 (subject, keys), push 3 (subject, keys_or_none, values_or_none)
+            MatchKeys => 1, // Read subject and keys from stack (no pop), push values tuple or None
🧹 Nitpick comments (1)
crates/jit/src/instructions.rs (1)

586-589: Resume instruction is stubbed with TODO.

The Resume instruction handler is effectively a no-op. While acceptable for an initial implementation, this should be tracked for future completion.

Would you like me to open an issue to track the completion of the Resume instruction implementation in the JIT?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 563dc0f and b156296.

📒 Files selected for processing (3)
  • crates/compiler-core/src/bytecode.rs (3 hunks)
  • crates/jit/src/instructions.rs (2 hunks)
  • crates/vm/src/frame.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/frame.rs
  • crates/jit/src/instructions.rs
  • crates/compiler-core/src/bytecode.rs
🧬 Code graph analysis (1)
crates/vm/src/frame.rs (1)
crates/vm/src/stdlib/builtins.rs (1)
  • __build_class__ (893-1062)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (16)
crates/jit/src/instructions.rs (4)

556-559: PopBlock instruction is stubbed with TODO.

Similar to Resume, PopBlock needs proper block management implementation for complete JIT support.


439-443: BuildTuple implementation looks correct.

The tuple building correctly pops elements from the stack and creates a JitValue::Tuple.


464-517: CompareOperation implementation handles int/bool/float comparisons correctly.

The implementation properly extends bools to i64 for comparison and uses appropriate integer/float comparison conditions.


641-655: UnpackSequence correctly validates tuple size.

The implementation appropriately checks that the tuple size matches the expected size before unpacking, and reverses the elements before extending the stack to maintain proper order.

crates/vm/src/frame.rs (6)

534-558: BeforeAsyncWith implementation looks correct.

The async context manager protocol is properly implemented: retrieves __aenter__, invokes it, then gets __aexit__ and pushes both onto the stack in the correct order.


1106-1225: MatchClass implementation is comprehensive.

The pattern matching for class patterns correctly handles:

  • Instance checking
  • __match_args__ extraction for positional arguments
  • MATCH_SELF behavior for built-in types
  • Keyword attribute extraction
  • Proper error handling for missing attributes

1227-1294: MatchKeys implementation correctly uses .get() method pattern.

The implementation properly follows CPython's approach using map.get(key, dummy) for atomic key checking, which handles collections.defaultdict and similar types correctly without triggering __missing__.


1067-1081: LoadMethod pushes three values: target, is_method flag, and func.

This correctly implements the method loading pattern where:

  1. target is the object for bound methods (or None for attributes)
  2. is_method boolean distinguishes bound methods from plain attributes
  3. func is the actual callable

The corresponding execute_method_call at lines 1876-1896 correctly handles both cases.


808-819: EndAsyncFor correctly handles StopAsyncIteration.

The implementation properly:

  1. Pops the exception
  2. Pops the except handler block
  3. Validates block level matches stack size
  4. Pops the async iterator
  5. Checks for StopAsyncIteration and clears the exception stack

1460-1476: SWAP instruction implementation is correct.

The implementation correctly matches CPython 3.11+ semantics. For SWAP(n), the code calculates j = len - index_val, which swaps TOS (at index len - 1) with the element at distance n below it (at index len - n). This implements the specified semantics where STACK[-n] and STACK[-1] are exchanged.

crates/compiler-core/src/bytecode.rs (6)

551-800: Instruction enum is alphabetically sorted.

The enum variants from BeforeAsyncWith through YieldValue appear to be properly sorted alphabetically, fulfilling the PR objective.


803-805: LAST_INSTRUCTION correctly updated to YieldValue.

Since YieldValue is the last variant alphabetically, this constant is correctly set for marshaling bounds checking.


618-620: ContainsOp uses tuple-style variant as intended.

The conversion from named struct-style (ContainsOp { invert: Arg<Invert> }) to tuple-style (ContainsOp(Arg<Invert>)) aligns with the PR objectives for simplifying enum variants.


1550-1569: label_arg correctly includes all jump instructions.

All instructions that contain Label arguments are properly listed in this match, including the newly added Break and Continue variants.


1800-1907: fmt_dis match arms are alphabetically sorted.

The display formatting match arms are properly sorted to match the instruction enum ordering, maintaining consistency throughout the codebase.


1674-1676: SetupAnnotation, SetupLoop, SetupFinally, EnterFinally, EndFinally all have stack effect 0.

These control flow setup instructions correctly report zero stack effect as they manage block state rather than stack values.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone youknowone merged commit d287d1e into RustPython:main Dec 2, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants