Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 29, 2025

close #6548

Summary by CodeRabbit

  • Improvements

    • Bytecode now records start and end source locations per instruction for finer-grained tracebacks and diagnostics.
    • Raise/Error reporting includes precise source ranges for better pinpointing.
  • Bug Fixes

    • Frame instruction offset reporting adjusted to a byte offset for accurate traceback positions.
  • New Features

    • Config option and CLI flag to enable/disable emitting debug ranges (enabled by default).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

📝 Walkthrough

Walkthrough

This PR changes per-instruction source tracking from a single SourceLocation to start/end SourceLocation pairs across codegen, IR, bytecode, marshal, VM construction, and traceback; it also adds a CompileOpts.debug_ranges flag and propagates it to linetable generation and VM settings.

Changes

Cohort / File(s) Summary
Codegen / Compilation
crates/codegen/src/compile.rs
StmtRaise now includes range; _emit computes and attaches both location and end_location to InstructionInfo; finalize_code is called with &CompileOpts; CompileOpts gains pub debug_ranges: bool and a manual Default.
IR & Linetable
crates/codegen/src/ir.rs
InstructionInfo gains pub end_location: SourceLocation; CodeInfo::finalize_code now accepts &CompileOpts; generate_linetable signature changed to accept &[(SourceLocation, SourceLocation)] and debug_ranges, and encoding now uses start/end data.
Bytecode representation
crates/compiler-core/src/bytecode.rs
CodeObject.locations changed to Box<[(SourceLocation, SourceLocation)]>; display and line-extraction updated to read the start location (.0).
Serialization / Marshal
crates/compiler-core/src/marshal.rs
Serialization/deserialization updated to read/write start/end pairs per entry (four u32s per instruction) and collect into Box<[(SourceLocation, SourceLocation)]>.
VM: code objects & construction
crates/vm/src/builtins/code.rs
PyCode/CodeObject construction adjusted to new locations type Box<[(SourceLocation, SourceLocation)]>, currently populated as duplicated (loc, loc) per instruction.
VM: frames & traceback
crates/vm/src/builtins/frame.rs, crates/vm/src/frame.rs
Frame location helpers destructure (loc, _end) and use start location for reporting; lasti-based offsets now use frame.lasti() * 2 for byte-offset mapping.
VM settings & CLI
crates/vm/src/vm/mod.rs, crates/vm/src/vm/setting.rs, src/settings.rs
New code_debug_ranges: bool setting (default true) with CLI flag --no_debug_ranges and PYTHONNODEBUGRANGES env var; propagated into CompileOpts.debug_ranges.
Examples & Misc
examples/dis.rs, .cspell.dict/*
Example usage updated to set debug_ranges: true; dictionary token PYTHONNODEBUGRANGES added.

Sequence Diagram(s)

(No sequence diagram generated — changes are data-shape and API propagation focused, not a new multi-component control flow requiring sequential visualization.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • code object linetable #6150: Modifies per-instruction location handling and linetable/IR APIs; strongly related to InstructionInfo and linetable changes.

Suggested reviewers

  • ShaharNaveh
  • coolreader18

Poem

🐰 I hopped through ranges, start and end in tow,
Each instruction wears a span, now watch it show,
Tracebacks trace finer, columns kept in sight,
I stitched the code with ranges — carrot-coded delight! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The PR title 'Fix traceback' is vague and does not clearly convey the specific nature of the change (adding end_location/end_lineno support for pytest compatibility). Consider a more specific title like 'Add end_lineno support to FrameSummary for pytest compatibility' to better reflect the primary change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR implements end_location tracking throughout the bytecode compilation and runtime stack, adding end_lineno attribute support to FrameSummary to fix the pytest AttributeError reported in issue #6548.
Out of Scope Changes check ✅ Passed All changes are scope-appropriate: bytecode location tracking (compile.rs, ir.rs, bytecode.rs, marshal.rs), runtime integration (vm/, frame.rs, builtins/), configuration flags, and supporting files are all necessary to implement end_location support for pytest compatibility.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the traceback branch 3 times, most recently from c3ed088 to a8ab1b4 Compare December 30, 2025 04:09
@youknowone youknowone marked this pull request as ready for review December 30, 2025 04:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/compiler-core/src/marshal.rs (1)

7-7: Consider incrementing FORMAT_VERSION for breaking marshal format change.

The locations serialization format changed from 2 u32 values per entry (line, character_offset) to 4 u32 values (start line, start char, end line, end char). This is a breaking change that will cause deserialization failures when loading old marshaled code objects. Consider incrementing FORMAT_VERSION from 4 to 5 to prevent incompatible bytecode from being loaded.

🔎 Suggested fix
-pub const FORMAT_VERSION: u32 = 4;
+pub const FORMAT_VERSION: u32 = 5;

Also applies to: 656-661

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 489289f and a8ab1b4.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_code.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_zipimport.py is excluded by !Lib/**
  • Lib/traceback.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/frame.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/frame.rs
  • crates/compiler-core/src/marshal.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/builtins/code.rs
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
🧬 Code graph analysis (1)
crates/codegen/src/compile.rs (1)
crates/compiler/src/lib.rs (1)
  • location (58-63)
🔇 Additional comments (10)
crates/vm/src/builtins/frame.rs (1)

60-62: LGTM - f_lasti byte offset conversion is correct.

The change to return self.lasti() * 2 correctly converts the instruction index to a byte offset, matching CPython's f_lasti behavior where each instruction word is 2 bytes.

crates/compiler-core/src/bytecode.rs (2)

258-261: Core type change for location pairs looks correct.

The change from Box<[SourceLocation]> to Box<[(SourceLocation, SourceLocation)]> enables tracking both start and end locations per instruction, which is necessary for pytest's end_lineno requirement.


1485-1492: LGTM - Display logic correctly uses start location.

The display functions appropriately access .0.line to use the start location for line number display, maintaining backward-compatible output.

crates/vm/src/builtins/code.rs (1)

469-484: Default locations use identical start/end - reasonable fallback.

When constructing a code object from PyCodeNewArgs, end location info isn't available, so creating (loc, loc) pairs where both start and end are identical is a sensible default. This maintains structural consistency while avoiding fabricated end location data.

crates/compiler-core/src/marshal.rs (1)

191-203: LGTM - Deserialization correctly handles location pairs.

The deserialization logic properly reads four u32 values per location entry to construct the (start, end) SourceLocation pair, maintaining consistency with the new data structure.

crates/vm/src/frame.rs (1)

387-394: The end location is captured but not exposed in PyTraceback.

The destructuring pattern (loc, _end_loc) correctly suppresses the unused variable warning (following Rust idiom), but if the PR objective is to provide pytest's end_lineno support in tracebacks, the PyTraceback struct would need an end_lineno field added to its definition and the field must be passed during construction.

crates/codegen/src/compile.rs (2)

1488-1505: Raise now uses the statement range for its source span; this is consistent and precise.

Binding range from StmtRaise and calling self.set_source_range(*range) right before emitting Instruction::Raise ensures the raise instruction’s start/end locations reflect the whole raise statement, not just the last compiled subexpression. This aligns with how other statements use set_source_range and should improve traceback accuracy without altering semantics.


5643-5653: End locations are correctly threaded into InstructionInfo in _emit.

Using the current TextRange to compute both location and end_location, then pushing them into InstructionInfo, cleanly enables per-instruction start/end tracking. Repeating the same pair for each code unit (including any EXTENDED_ARGs) keeps instructions and locations in lockstep and matches how the IR is later linearized.

crates/codegen/src/ir.rs (2)

416-517: Linetable generation correctly incorporates end locations; please keep it in sync with CPython’s encoding.

generate_linetable now takes (start,end) locations, computes line_delta and end_line_delta = end_line - line, and chooses between short, one-line, and long forms while encoding columns as varints of col+1/end_col+1. This matches the expected 3.11+ location table shape and should allow accurate end_lineno/end_colno reconstruction as long as callers maintain the invariant end_line >= line and monotonic character offsets from the TextRange. The control flow (chunking up to 8 instructions, grouping identical locations, and using signed deltas) looks consistent with CPython’s scheme.


84-91: InstructionInfo now carries end locations and locations are stored as (start,end) pairs; wiring looks sound.

Adding end_location: SourceLocation to InstructionInfo and switching locations to a Vec<(SourceLocation, SourceLocation)> that is populated via locations.extend(repeat_n((info.location, info.end_location), arg.instr_size())) keeps the per-bytecode-unit location data aligned with instructions and matches the new CodeObject.locations/generate_linetable expectations. This gives downstream traceback/FrameSummary code both lineno and end_lineno without changing control-flow semantics. All InstructionInfo construction sites correctly initialize end_location, and the linetable generation properly encodes both line deltas and end-line deltas.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
crates/vm/src/frame.rs (1)

174-176: Still need to guard current_location() against lasti() == 0 (panic risk) and confirm new traceback offset semantics

Frame::current_location() still does:

pub fn current_location(&self) -> SourceLocation {
    self.code.locations[self.lasti() as usize - 1].0
}

If lasti() is 0 (e.g., on a freshly-created or not-yet-started frame), self.lasti() as usize - 1 underflows to usize::MAX, causing an out-of-bounds index and panic. This was already raised in a previous review and remains unfixed with the new (start, end) layout. A minimal fix is to special-case lasti() == 0 and fall back to a safe location (e.g., the first entry or one synthesized from first_line_number) before indexing:

Suggested guard for `current_location()`
-    pub fn current_location(&self) -> SourceLocation {
-        self.code.locations[self.lasti() as usize - 1].0
-    }
+    pub fn current_location(&self) -> SourceLocation {
+        let lasti = self.lasti();
+        if lasti == 0 {
+            // Frame has not executed any instruction yet: use the first known
+            // location if present, or synthesize one from first_line_number.
+            if let Some((loc, _)) = self.code.locations.first() {
+                return *loc;
+            }
+            return SourceLocation {
+                line: self
+                    .code
+                    .first_line_number
+                    .unwrap_or(rustpython_compiler_core::OneIndexed::MIN),
+                character_offset: rustpython_compiler_core::OneIndexed::from_zero_indexed(0),
+            };
+        }
+        self.code.locations[lasti as usize - 1].0
+    }

Also, in handle_exception you now compute:

let (loc, _end_loc) = frame.code.locations[idx];
let new_traceback = PyTraceback::new(
    next,
    frame.object.to_owned(),
    frame.lasti() * 2,
    loc.line,
);

Using frame.lasti() * 2 as the offset should match “codeunit-bytes” addressing, but it’s tightly coupled to how tb_lasti and co_positions interpret offsets. Please make sure this lines up with the linetable decoding by running the traceback/pytest scenarios that motivated this PR.

Also applies to: 388-395

🧹 Nitpick comments (3)
src/settings.rs (1)

263-291: CLI/env plumbing for code_debug_ranges is correct; consider documenting the new -X

-X no_debug_ranges and PYTHONNODEBUGRANGES both correctly toggle settings.code_debug_ranges via the existing implementation_option/env_bool plumbing and respect -E/-I. As a follow-up, you might want to extend --help-xoptions / docs to mention no_debug_ranges and PYTHONNODEBUGRANGES, but the runtime behavior itself looks fine.

Also applies to: 294-300

crates/vm/src/builtins/code.rs (1)

470-486: Locations initialization as (loc, loc) is acceptable given existing FIXME

For PyCode::py_new, seeding locations as (loc, loc) for every instruction keeps prior behavior (single-line fallback based on firstlineno) while satisfying the new (start, end) shape. In replace, reusing self.code.locations.clone() is consistent with the existing “locations duplicate linetable” FIXME; the semantics aren’t worsened by the tuple change. Once you eventually switch everything to linetable-only, this field can indeed be retired.

Also applies to: 968-983

crates/codegen/src/compile.rs (1)

5653-5665: Core range tracking in _emit is correct; behavior now hinges on callers setting current_source_range

Computing both location and end_location from current_source_range.start() / .end() and recording them into InstructionInfo is the key step that enables per-instruction ranges throughout the pipeline. Given that:

  • compile_statement sets the range for each statement,
  • compile_expression sets the range for each expression, and
  • specialized sites (e.g. compile_with, Stmt::Raise) now override the range for their control/raise instructions,

the resulting start/end spans should be coherent. Just be aware that any future emission path that forgets to set current_source_range first will now produce misleading locations on both start and end, so it’s worth treating set_source_range as part of the “contract” for any new emission helpers.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8ab1b4 and 095a0cc.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_code.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_zipimport.py is excluded by !Lib/**
  • Lib/traceback.py is excluded by !Lib/**
📒 Files selected for processing (10)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/setting.rs
  • src/settings.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/builtins/frame.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/vm/setting.rs
  • crates/vm/src/frame.rs
  • crates/compiler-core/src/marshal.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/codegen/src/compile.rs
  • crates/vm/src/builtins/code.rs
  • src/settings.rs
  • crates/vm/src/vm/mod.rs
  • crates/codegen/src/ir.rs
🧬 Code graph analysis (2)
crates/vm/src/builtins/code.rs (1)
crates/compiler-core/src/marshal.rs (4)
  • len (191-203)
  • len (226-228)
  • len (233-235)
  • len (239-244)
src/settings.rs (1)
src/interpreter.rs (1)
  • settings (59-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (8)
crates/compiler-core/src/bytecode.rs (1)

259-262: Start/end location tuple wiring looks consistent with existing usage

CodeObject.locations now stores (start, end) and all callers here consistently use the start half (.0) for line display and copy the whole slice through map_bag/map_clone_bag, preserving existing semantics while enabling richer ranges elsewhere. This keeps the previous invariant that locations.len() matches instructions.len(); just make sure downstream consumers relying on the public locations field are updated to the new tuple shape.

Also applies to: 1485-1494, 1562-1564, 1594-1596

crates/vm/src/vm/setting.rs (1)

60-62: New code_debug_ranges setting is well-integrated

The code_debug_ranges flag is documented, exposed on Settings, and defaulted to true in Default, which matches the intended -X no_debug_ranges semantics.

Also applies to: 170-207

crates/vm/src/vm/mod.rs (1)

505-513: debug_ranges propagated cleanly into CompileOpts

Linking CompileOpts.debug_ranges directly to self.state.config.settings.code_debug_ranges is the right place to surface the new setting into the compiler.

crates/codegen/src/compile.rs (4)

118-125: CompileOpts extension is reasonable and default behavior matches new flag semantics

The addition of debug_ranges: bool to CompileOpts plus the explicit Default impl (optimizations off, debug ranges on) cleanly exposes the new knob without breaking existing call sites that rely on Default::default(). This aligns with the CLI/env design where column/range info is enabled by default and disabled via a dedicated flag.

Also applies to: 127-134


1500-1503: Good: raise now uses the full statement range for its traceback location

Destructuring Stmt::Raise to get range and then calling self.set_source_range(*range) immediately before emitting Instruction::Raise ensures the raise instruction’s start/end positions reflect the whole raise ... from ... statement, not just the last expression that happened to be compiled. That’s exactly what you want for accurate traceback ranges and end_lineno computation.

Also applies to: 1516-1518


3231-3233: with-statement source range handling is improved and consistent

Capturing with_range = self.current_source_range at the start of compile_with and reapplying it:

  • before SetupWith/SetupAsyncWith,
  • before binding optional_vars,
  • before recursive nested compile_with calls, and
  • for the cleanup (PopBlock / EnterFinally / WithCleanup*),

makes the control-flow scaffolding for a with consistently point back to the overall with statement span, while still letting inner expressions use their own ranges. This should yield saner linetable entries and tracebacks around multi-context with blocks.

Also applies to: 3241-3242, 3259-3260, 3276-3277, 3282-3285


870-874: exit_scopefinalize_code(&self.opts) wiring is correct and consistent

There is only one call site to finalize_code in the codebase (at this location), and it correctly passes &self.opts to match the function signature. Optimization and debug-range behavior are properly tied to the compiler's options.

crates/codegen/src/ir.rs (1)

84-91: Per-instruction ranges and linetable encoding are sound; verification confirms both concerns are addressed

Adding end_location to InstructionInfo and wiring it through _emit into locations provides the correct data model for end_lineno / range-based tracebacks. The finalize_code API change to accept &CompileOpts and use opts.optimize / opts.debug_ranges is clean and keeps knobs centralized.

The generate_linetable implementation correctly follows CPython's 3.11+ scheme (NoColumns vs short/one-line/long encodings, chunking into up to 8 instructions, relative line deltas), and using (start, end) pairs gives the VM enough information for both line-only and full-range modes.

Both verification points pass:

  • core::iter::repeat_n is stable as of Rust 1.82.0; your MSRV is 1.89.0, so it's available.
  • The encoding safely assumes end_locationlocation with explicit checks (e.g., end_col >= col in short-form selection) and proper varint-based overflow handling in the long form, making reversed or zero-width ranges impossible.

Comment on lines 190 to 204
let len = rdr.read_u32()?;
let locations = (0..len)
.map(|_| {
Ok(SourceLocation {
let start = SourceLocation {
line: OneIndexed::new(rdr.read_u32()? as _).ok_or(MarshalError::InvalidLocation)?,
character_offset: OneIndexed::from_zero_indexed(rdr.read_u32()? as _),
})
};
let end = SourceLocation {
line: OneIndexed::new(rdr.read_u32()? as _).ok_or(MarshalError::InvalidLocation)?,
character_offset: OneIndexed::from_zero_indexed(rdr.read_u32()? as _),
};
Ok((start, end))
})
.collect::<Result<Box<[SourceLocation]>>>()?;
.collect::<Result<Box<[(SourceLocation, SourceLocation)]>>>()?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Marshal format for locations changed without bumping FORMAT_VERSION

The code at lines 190-204 (deserialize) and 649-661 (serialize) is now symmetric and correctly treats each location entry as a (start, end) pair, reading and writing four u32 values per entry (start line, start column, end line, end column). However, FORMAT_VERSION remains at 4 with no version-based conditional logic.

If this represents a format change from prior versions (where locations might have had a different layout), then any marshalled code produced with earlier versions of FORMAT_VERSION 4 will be incompatible with the current reader—and vice versa. Existing frozen stdlib artifacts, cached bytecode, or external tools relying on the old layout would fail to deserialize.

To address this:

  • Bump FORMAT_VERSION and guard the (start, end) tuple handling on the new version, or
  • Add backward-compat deserialization that can handle both old and new layouts
  • Document whether marshal format 4 is intended to be stable across this change

Otherwise, if locations always had this layout, this concern does not apply.

🤖 Prompt for AI Agents
In crates/compiler-core/src/marshal.rs around lines 190-204 (reader) and 649-661
(writer), the marshal layout for `locations` was changed to a `(start, end)`
tuple of four u32s but FORMAT_VERSION was not bumped or handled; either bump
FORMAT_VERSION and gate the new read/write behavior on the new version, or add
backward-compatible logic: detect the FORMAT_VERSION during deserialization and
branch to the old layout reader when the version is older, and when serializing
emit the format corresponding to the chosen FORMAT_VERSION; update the serialize
code at 649-661 to be conditional on version, update any tests/fixtures and the
format documentation, and ensure unit tests cover both old and new
deserialization paths.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
crates/compiler-core/src/marshal.rs (1)

190-204: Marshal layout for locations changed, but FORMAT_VERSION stayed at 4

Deserialization now reads (start, end) as a pair of SourceLocations (four u32s per entry) and serialize_code writes the same shape, so the new reader/writer are symmetric. However, FORMAT_VERSION is still 4, with no version gating or backward-compat branch for older artifacts that encoded only a single location (two u32s) per entry.

If any existing .pyc/marshal payloads or frozen stdlib artifacts were produced with the old layout under version 4, they will now be incompatible with this reader, and newly written artifacts will be unreadable by older RustPython builds that also expect version 4.

You likely want to either:

  • Bump FORMAT_VERSION and tie the (start, end) shape to the new version, or
  • Explicitly state that marshal format 4 was never stable and all artifacts must be regenerated.

Also applies to: 655-661

🧹 Nitpick comments (3)
crates/codegen/src/compile.rs (1)

5655-5665: end_location tracking in _emit is consistent, but note potential cost of repeated to_source_code()

Recording both location and end_location per instruction from current_source_range is the right shape for range-aware tracebacks and aligns with the new IR/CodeObject layout. The implementation assumes current_source_range is always set before emitting, which holds given all uses of set_source_range. If you ever see to_source_code() show up hot in profiles, consider caching the SourceCode for source_file in the Compiler rather than recomputing it here.

crates/codegen/src/ir.rs (2)

178-222: Location vector construction looks correct; double‑check MSRV for core::iter::repeat_n

Using block_to_offset plus instr.arg.instr_size() to reserve and populate both instructions and locations, and repeating (info.location, info.end_location) once per code unit, keeps locations.len() in sync with the encoded bytecode length across extended-arg recompiles.

The only thing to confirm is that your MSRV includes core::iter::repeat_n; if not, this can be trivially replaced with std::iter::repeat((info.location, info.end_location)).take(arg.instr_size() as usize) or a small loop.


423-541: Linetable generation logic is reasonable; verify NoColumns path vs CPython’s no_debug_ranges behavior

The new generate_linetable correctly:

  • Groups consecutive instructions with identical (start,end) locations,
  • Limits each entry to a run of at most 8 instructions,
  • Computes line_delta relative to first_line/prev_line and end_line_delta for multi-line spans, and
  • Chooses between Short*/OneLine*/Long encodings with appropriate signed/unsigned varint encodings for deltas and columns when debug_ranges is enabled.

When debug_ranges is false, the function always emits PyCodeLocationInfoKind::NoColumns entries and writes only a signed line_delta varint, ignoring end_line and all column data. That matches the intent of “no column info”, but it’s worth double-checking against CPython’s 3.11+ encoding to ensure that disabling debug ranges is supposed to drop end-line information entirely rather than just column ranges.

I’d recommend:

  • Verifying this against CPython’s _PyLineTableWriter / PyCode_Addr2Location behavior for -X no_debug_ranges.
  • Adding or updating tests that round-trip a few multi-line constructs with debug_ranges on and off (e.g., disassembly snapshot tests or addr2location tests) to confirm compatibility.
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 095a0cc and 4103350.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_code.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_zipimport.py is excluded by !Lib/**
  • Lib/traceback.py is excluded by !Lib/**
📒 Files selected for processing (12)
  • .cspell.dict/python-more.txt
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/frame.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/setting.rs
  • examples/dis.rs
  • src/settings.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/settings.rs
  • crates/vm/src/vm/mod.rs
  • crates/compiler-core/src/bytecode.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/vm/setting.rs
  • crates/codegen/src/ir.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/builtins/code.rs
  • examples/dis.rs
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/marshal.rs
  • crates/vm/src/builtins/frame.rs
🧬 Code graph analysis (1)
crates/codegen/src/compile.rs (2)
crates/vm/src/vm/setting.rs (1)
  • default (172-207)
crates/compiler/src/lib.rs (1)
  • location (58-63)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
🔇 Additional comments (13)
.cspell.dict/python-more.txt (1)

181-181: LGTM!

The addition of PYTHONNODEBUGRANGES to the spell-check dictionary aligns with the new environment variable controlling debug range emission, mirroring CPython's behavior.

examples/dis.rs (1)

56-59: LGTM!

Explicit debug_ranges: true is appropriate for a disassembly tool where detailed source location information is valuable for debugging and inspection.

crates/vm/src/builtins/frame.rs (2)

60-63: LGTM!

Converting instruction index to byte offset (lasti * 2) correctly aligns with CPython's f_lasti semantics where each instruction unit is 2 bytes.


66-74: LGTM!

This fix properly handles the lasti() == 0 case to prevent the underflow panic that was identified in previous reviews. When execution hasn't started, it correctly falls back to co_firstlineno, mirroring CPython's PyCode_Addr2Line behavior.

crates/vm/src/vm/setting.rs (2)

60-61: LGTM!

The new code_debug_ranges field is well-documented and appropriately placed within the Settings struct, enabling control over column information in bytecode.


196-196: LGTM!

Defaulting code_debug_ranges to true aligns with CPython's behavior where debug ranges are enabled by default unless PYTHONNODEBUGRANGES is set.

crates/vm/src/frame.rs (2)

388-396: LGTM!

The location tuple destructuring (loc, _end_loc) correctly adapts to the new per-instruction (start, end) location pair format. Using frame.lasti() * 2 for the traceback offset properly converts from instruction index to byte offset for CPython compatibility.


174-176: The concern in this review is not valid. Both call sites of current_location() already guard against lasti == 0:

  1. crates/stdlib/src/faulthandler.rs:360 – guards with if frame.lasti() == 0 { ... } else { frame.current_location()...}
  2. crates/vm/src/builtins/frame.rs:72 – guards with if self.lasti() == 0 { ... } else { self.current_location()...}

The method is only invoked when lasti != 0, making the subtraction safe. No additional guard or documentation is needed.

Likely an incorrect or invalid review comment.

crates/vm/src/builtins/code.rs (1)

470-485: LGTM!

The locations initialization correctly adapts to the new (start, end) tuple format. Using (loc, loc) where start equals end is appropriate when precise end location information isn't available from the constructor arguments.

crates/codegen/src/compile.rs (3)

118-134: Explicit CompileOpts default with debug_ranges looks good

Switching to a manual Default impl and defaulting debug_ranges to true is consistent with the documented behavior and keeps existing callers (CompileOpts::default()) behaving as expected.


870-874: Passing full CompileOpts into finalize_code is appropriate

Routing &self.opts into finalize_code (instead of just an optimize level) lets IR finalization make consistent decisions (DCE + linetable) based on compile options without changing scope semantics.


1500-1518: Using StmtRaise.range for the Raise instruction location improves traceback accuracy

Capturing the range from Stmt::Raise and calling set_source_range(*range) right before emitting Instruction::Raise ensures the emitted raise is associated with the statement span instead of the last evaluated subexpression, which should give better tracebacks and exception locations.

crates/codegen/src/ir.rs (1)

84-91: end_location plumbed through InstructionInfo and finalize_code cleanly

Adding end_location to InstructionInfo and building locations: Vec<(SourceLocation, SourceLocation)> in finalize_code (then passing &locations plus opts.debug_ranges into generate_linetable) is a coherent way to support range-based debugging without disturbing existing stack-depth and metadata logic. This also ensures CodeObjects carry full (start,end) data irrespective of whether debug ranges are enabled in the linetable encoder.

Also applies to: 135-177

@youknowone youknowone merged commit e79a1a1 into RustPython:main Dec 30, 2025
13 checks passed
@youknowone youknowone deleted the traceback branch December 30, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pytest: AttributeError: 'FrameSummary' object has no attribute 'end_lineno'

1 participant