Skip to content

Conversation

@galkleinman
Copy link
Contributor

@galkleinman galkleinman commented Aug 14, 2025

Resolves #642

Important

Updated OpenAI SDK to 5.x, improved error handling, testing, and dependency management in OpenAI instrumentation.

  • Behavior:
    • Updated OpenAI SDK to version 5.12.2 in package.json.
    • Enhanced manual instrumentation in OpenAIInstrumentation to support OpenAI 5.x.
    • Improved handling of tool_calls and function_calls in instrumentation.ts.
  • Error Handling:
    • Added comprehensive error handling in instrumentation.test.ts for Polly.js.
    • Improved runtime error handling for tool_calls in instrumentation.ts.
  • Testing:
    • Removed HAR network recording fixtures in instrumentation.test.ts.
    • Updated tests to be more resilient with relaxed assertions and guarded setup.
  • Dependencies:
    • Added @types/node-fetch and node-fetch as dependencies in package.json.
    • Updated pnpm-workspace.yaml with dependency overrides for form-data and tmp.

This description was created by Ellipsis for 86d5d21. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • New Features
    • Expanded compatibility with newer OpenAI API/SDK versions (v4–v5+).
  • Bug Fixes
    • More reliable streaming behavior and tool-call handling.
    • Prevented incorrect span attributes for non-function tool calls.
  • Tests
    • Updated and added HAR recordings for chat, streaming, images, and logprobs scenarios.
    • Stabilized tests with improved Polly configuration and extended timeouts for image workflows.
  • Chores
    • Dependency updates and workspace overrides to address transitive package versions (form-data, tmp).
    • Updated client/runtime integrations to align with latest OpenAI SDK.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

Refactors OpenAI instrumentation for v4–v5+ compatibility, adjusts promise/stream handling and tool-call guards, updates tests and Polly setup, relocates/refreshes HAR recordings under test/recordings, and bumps dependencies (OpenAI, node-fetch). Adds workspace overrides for transitive deps.

Changes

Cohort / File(s) Summary
Core instrumentation update
packages/instrumentation-openai/src/instrumentation.ts
Introduces APIPromiseType unification, updates wrapping via _thenUnwrap, broadens version range (>=4 <6), changes manuallyInstrument(module: unknown), hardens tool/function guards, and tweaks streaming tool_call handling and span emission.
Test harness adjustments
packages/instrumentation-openai/test/instrumentation.test.ts
Extends Polly config, injects node-fetch into OpenAI client, enables instrumentation, guards Polly hooks, relaxes tool-call assertions, moves image calls to imageOpenai, and increases timeouts.
Dependency and workspace overrides
packages/instrumentation-openai/package.json, pnpm-workspace.yaml
Upgrades openai devDependency to 5.12.2; adds node-fetch and @types/node-fetch; adds overrides to force form-data >=4.0.4 and tmp >=0.2.4.
Removed legacy recordings
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/.../recording.har
Deletes multiple Polly HAR fixtures for image edit/variation/generation and streaming/chat tests from the old recordings directory.
Added/updated test recordings
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/.../recording.har
Adds and refreshes HAR fixtures for chat/completions (incl. streaming and logprobs) and image generation; updates headers, payloads, timings for OpenAI JS 5.12.2.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant Instrumentation
  participant OpenAI_SDK
  participant Tracer

  App->>Instrumentation: manuallyInstrument(module)
  Instrumentation-->>OpenAI_SDK: Patch methods (chat/completions/images)
  App->>OpenAI_SDK: invoke API (may stream)
  Instrumentation->>Tracer: start span
  OpenAI_SDK-->>Instrumentation: APIPromiseType (wrapped via _thenUnwrap)
  alt Streaming
    loop chunks
      OpenAI_SDK-->>Instrumentation: chunk (delta/tool_calls/logprobs)
      Instrumentation->>Tracer: record events/attrs (guard non-function tools)
    end
  else Non-stream
    OpenAI_SDK-->>Instrumentation: result
    Instrumentation->>Tracer: set attributes
  end
  Instrumentation->>Tracer: end span
  Instrumentation-->>App: result/stream proxy
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • nirga
  • doronkopit5

Poem

Hoppity logs and spans align,
I nibble bytes—compat divine.
Streams now flow, with tools in tow,
Promises unwrap—onward we go!
New HAR burrows, old ones fade,
Thump-thump: a stable trace parade.
🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gk/openai-v5

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to e0c4d82 in 2 minutes and 13 seconds. Click for details.
  • Reviewed 2716 lines of code in 17 files
  • Skipped 1 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/test/instrumentation.test.ts:116
  • Draft comment:
    Token usage attributes are compared in some tests as strings (e.g., "15" in the chat test) and in others as numbers (e.g., 82 in the function‐calling test). Consider normalizing attribute types to avoid type inconsistencies.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/instrumentation-openai/test/instrumentation.test.ts:825
  • Draft comment:
    Hard-coded expected token counts (e.g., 1056 tokens for DALL-E 2 and 4160 tokens for DALL-E 3 HD) may be brittle. Consider using ranges or thresholds, or documenting why these exact numbers are expected, given that changes in the underlying tokenization library could affect these values.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. packages/instrumentation-openai/test/instrumentation.test.ts:182
  • Draft comment:
    Test for 'streaming chat with new API' is skipped. It might be helpful to include a comment on why it’s skipped and under what conditions it should be re-enabled.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
4. packages/instrumentation-openai/test/instrumentation.test.ts:676
  • Draft comment:
    The tests that check function and tool calling attributes contain several repeated assertions. Consider refactoring common span attribute validations into helper functions to reduce duplication and improve maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_5112oke7OkElsm7V

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed a7153ab in 51 seconds. Click for details.
  • Reviewed 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/src/instrumentation.ts:122
  • Draft comment:
    Consider adding a comment explaining why the version range is capped at <6. This ensures clarity on supported OpenAI SDK versions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_rrFVTXRPiVVxzOkr

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🔭 Outside diff range comments (3)
packages/instrumentation-openai/src/instrumentation.ts (1)

483-497: Tool-call accumulation: assign stable fields (id, type), append only to arguments.

  • id should be assigned (not concatenated).
  • type should be assigned (already fixed here).
  • name may stream in segments, but most servers send it once. To be safe, set on first occurrence, append otherwise.
  • arguments should be concatenated, as it streams chunk by chunk.
           if (result.choices[0].message.tool_calls) {
             if (toolCall.id) {
-              result.choices[0].message.tool_calls[toolCall.index].id +=
-                toolCall.id;
+              (result.choices[0].message.tool_calls[toolCall.index] as any).id =
+                toolCall.id;
             }
             if (toolCall.type) {
               result.choices[0].message.tool_calls[toolCall.index].type =
                 toolCall.type;
             }
             if (toolCall.function?.name) {
-              (result.choices[0].message.tool_calls[toolCall.index] as any).function.name += toolCall.function.name;
+              const target = (result.choices[0].message.tool_calls[toolCall.index] as any);
+              if (!target.function.name) {
+                target.function.name = toolCall.function.name;
+              } else {
+                target.function.name += toolCall.function.name;
+              }
             }
             if (toolCall.function?.arguments) {
-              (result.choices[0].message.tool_calls[toolCall.index] as any).function.arguments += toolCall.function.arguments;
+              (result.choices[0].message.tool_calls[toolCall.index] as any).function.arguments +=
+                toolCall.function.arguments;
             }
           }
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (1)

82-85: Add a 5.x “tools” function-calling recording alongside legacy “functions”.

OpenAI 5.x uses tools: [{ type: 'function', function: { ... } }] and often the Responses API. To truly cover 5.x, add a HAR recording that exercises tools (non-streaming and streaming) so instrumentation spans/attributes are validated on the new shape.

Example request body you can capture:

{
  "model": "gpt-4o-mini",
  "messages": [{ "role": "user", "content": "What's the weather like in Boston?" }],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_current_weather",
      "description": "Get the current weather in a given location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" },
          "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }
        },
        "required": ["location"]
      }
    }
  }]
}

If you’ve already added such tests elsewhere, ignore this.

packages/instrumentation-openai/test/instrumentation.test.ts (1)

79-81: Ensure OpenAI v5.x Is Exercised in CI

Our inspection shows that every openai dependency is still on v4 and all HAR fixtures record “OpenAI/JS 4.x” in their user-agent. To truly validate v5 support, please address the following:

• Dependencies

  • packages/instrumentation-openai/package.json → "openai": "4.38.3"
  • packages/traceloop-sdk/package.json → "openai": "4.38.3"
  • packages/sample-app/package.json → "openai": "^4.104.0"

• HAR fixtures under
packages/instrumentation-openai/recordings still advertise user-agent v4

Recommendations:

  1. Bump openai to ^5.x in your package manifests (or introduce a v4/v5 matrix in CI).
  2. Re-record the HAR fixtures using the v5.x client so that user-agent reflects “OpenAI/JS 5.x”.
  3. Verify replay still passes under both major versions, if backward compatibility is required.
♻️ Duplicate comments (7)
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (2)

65-67: Prevent header-driven fixture churn (x-stainless-runtime-version).

Same note as other HARs: exclude volatile headers from Polly matching to avoid re-recording when Node/SDK varies.


90-96: Sanity-check base64 payload decodes to valid JSON.

Ensure the updated completion payload still decompresses to JSON; see the verification script in the other comment.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (1)

90-96: Validate function-calling response payload integrity.

Same integrity check as other HARs; the base64 chunks should gunzip to valid JSON.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har (2)

65-67: Exclude x-stainless- headers from matching.*

Streaming test fixtures also include volatile x-stainless-* headers; exclude them in Polly matchers to avoid re-recording.


189-193: Optional: mask set-cookie to reduce churn.

Same rationale as other HARs. If you keep it, prefer masking the value.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har (2)

90-96: Check JSON integrity for the first image-generation response.

Run the provided verification script to ensure the base64 payload decodes and parses cleanly.


281-287: Check JSON integrity for the second image-generation response.

Same as above; verify the multi-kB payload isn’t truncated/corrupted.

🧹 Nitpick comments (9)
packages/instrumentation-openai/src/instrumentation.ts (4)

120-123: Loosening the OpenAI version range is OK; consider adding a soft upper bound.

Changing to ">=3.1.0" enables 5.x, but it may unintentionally pick future major versions with breaking APIs. If you want to be conservative, consider "<6" until you explicitly test 6.x.

-      [">=3.1.0"],
+      [">=3.1.0 <6"],

432-437: Avoid broad any; prefer a minimal structural type for the streaming accumulator.

Casting the message to any works but loses type safety. A narrow type keeps compiler help without over-constraining.

-            } as any,
+            } as unknown as {
+              role: "assistant";
+              content: string;
+              tool_calls: Array<{
+                id: string;
+                type: "function" | string;
+                function?: { name: string; arguments: string };
+              }>;
+              // function_call may appear in older tool_call variants
+              function_call?: { name: string; arguments: string };
+            },

456-467: Function-call (legacy) streaming likely needs incremental accumulation.

You only set function_call when both name and arguments are present in the same delta. In practice, arguments stream separately. Consider accumulating name/arguments across chunks as you do for tool_calls.

If supporting legacy function_call streaming matters for 5.x, mirror the tool_calls logic: initialize function_call once present, then append arguments across subsequent deltas.


702-711: Attribute emission guard is good; additionally check function existence.

To avoid potential undefined access, also ensure toolCall.function is truthy.

-              if (toolCall.type === 'function' && 'function' in toolCall) {
+              if (
+                toolCall.type === 'function' &&
+                'function' in toolCall &&
+                (toolCall as any).function
+              ) {
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (1)

97-107: Avoid persisting response.cookies unless tests assert on them.

The same cookie appears in both response.cookies and the set-cookie header. If tests don’t assert cookies, drop this block to reduce churn and PII surface.

Apply this diff if cookies aren’t used in assertions:

-          "cookies": [
-            {
-              "domain": ".api.openai.com",
-              "httpOnly": true,
-              "name": "_cfuvid",
-              "path": "/",
-              "sameSite": "None",
-              "secure": true,
-              "value": "ptcvgRCTCq1IFGlU5o6Np6S2Sc3_QtrOzQVs_FAUe.A-1755159619967-0.0.1.1-604800000"
-            }
-          ],
+          "cookies": [],

If you need the presence but not the values, mask them via a beforePersist hook instead.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (1)

190-193: Mask set-cookie values to reduce fixture churn (optional).

If no test asserts on cookie content, mask the value to a constant to keep diffs minimal between re-recordings.

Apply:

-              "value": "_cfuvid=taZ4ULbzwwbyarAUfW9FCwZOX6yXkJ7aLgBL2DEyid4-1755159626839-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"
+              "value": "_cfuvid=<masked>; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"

Alternatively, mask via a Polly beforePersist hook.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har (1)

158-165: Consider masking or dropping x-content-type-options/set-cookie in image tests.

Token-calculation tests shouldn’t depend on these response headers; masking/dropping them reduces fixture churn and file size.

Proposed minimal diff:

-            {
-              "name": "x-content-type-options",
-              "value": "nosniff"
-            },
-            {
-              "_fromType": "array",
-              "name": "set-cookie",
-              "value": "_cfuvid=duRL0kE4c3tQv8yDth8cAiYfQyZaMw8O8MpX1R4I88A-1755159649952-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"
-            },
+            // removed volatile headers not used by the test

If any test asserts on these, skip this change.

packages/instrumentation-openai/test/instrumentation.test.ts (2)

417-420: Use strict numeric assertion for token counts.

Normalize the attribute to a number and assert strictly to avoid type drift across SDK versions.

-    assert.equal(
-      completionSpan.attributes[`${SpanAttributes.LLM_USAGE_PROMPT_TOKENS}`],
-      82,
-    );
+    assert.strictEqual(
+      Number(completionSpan.attributes[`${SpanAttributes.LLM_USAGE_PROMPT_TOKENS}`]),
+      82,
+    );

517-519: Prefer strict numeric assertion for prompt tokens.

Same rationale as above.

-    assert.equal(
-      completionSpan.attributes[`${SpanAttributes.LLM_USAGE_PROMPT_TOKENS}`],
-      82,
-    );
+    assert.strictEqual(
+      Number(completionSpan.attributes[`${SpanAttributes.LLM_USAGE_PROMPT_TOKENS}`]),
+      82,
+    );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e240945 and e0c4d82.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (17)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har (12 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (8 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har (7 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (8 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (7 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (8 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-editing_4216676336/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har (6 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-variation_2237177737/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat-with-new-API_1429213105/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har (7 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har (7 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har (8 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har (8 hunks)
  • packages/instrumentation-openai/src/instrumentation.ts (5 hunks)
  • packages/instrumentation-openai/test/instrumentation.test.ts (4 hunks)
  • pnpm-workspace.yaml (1 hunks)
💤 Files with no reviewable changes (3)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-variation_2237177737/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat-with-new-API_1429213105/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-editing_4216676336/recording.har
🧰 Additional context used
🧬 Code Graph Analysis (2)
packages/instrumentation-openai/src/instrumentation.ts (2)
packages/traceloop-sdk/src/lib/tracing/decorators.ts (1)
  • tool (276-282)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
packages/instrumentation-openai/test/instrumentation.test.ts (1)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
🔇 Additional comments (10)
packages/instrumentation-openai/src/instrumentation.ts (1)

368-381: Good guard: skip non-function tools to avoid crashes.

The additional checks around tool.type and tool.function are correct and prevent emitting invalid function attributes for non-function tools.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (1)

64-67: No sensitive data found in HAR recordings

I re-scanned all .har files for Authorization headers, API keys, and Bearer tokens. The only hits were CORS header‐filter entries (e.g. Access-Control-Allow-Headers including “Authorization”), not actual secret values. No Authorization or secret material is persisted.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har (1)

64-67: HAR updates align with tool-calling flow; cookie/header churn expected.

Looks good. Same note: confirm no sensitive headers or tokens are recorded.

Use the same scan script from the previous comment to validate this directory as well.

Also applies to: 78-79, 90-96, 99-106, 110-112, 134-136, 146-148, 174-176, 189-193, 199-201, 211-227

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har (1)

64-67: Streaming HAR updated with new chunking and timings; OK.

The event-stream payload changes and header differences are expected with new captures.

Also applies to: 78-79, 90-95, 104-111, 133-135, 145-147, 173-175, 188-192, 198-200, 206-222

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (2)

90-96: HAR payload integrity verified
All recording.har fixtures in packages/instrumentation-openai/recordings/**/recording.har successfully concatenate, gunzip (where applicable), and parse as valid JSON. No silent fixture corruption detected.


65-67: Polly setup already ignores volatile headers

  • In packages/instrumentation-openai/test/instrumentation.test.ts (around lines 59–61), the Polly configuration uses
    matchRequestsBy: { headers: false }
    which disables header matching altogether—covering x-stainless-runtime-version and preventing fixture churn.

No further changes needed.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har (1)

90-95: No direct SSE parsing in instrumentation—ignore metadata-guard suggestion.

The instrumentation code wraps the client’s streaming promise and consumes parsed chunks (via for await), without manually inspecting SSE fields like service_tier, system_fingerprint, or obfuscation. The OpenAI SDK handles SSE parsing, and instrumentation only touches the high-level chunk object (content/role, finish_reason, tool_calls). No changes needed.

Likely an incorrect or invalid review comment.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har (1)

90-95: SSE stream payload update looks consistent and replay-safe.

Chunked event-stream text ends with [DONE] and preserves the expected completion semantics. No issues from a replay standpoint.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har (1)

90-96: Updated base64 response payload is valid and size-aligned.

content.encoding is base64 and size matches the compressed payload length. Fixture is consistent for replay.

packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har (1)

94-95: Tool call streaming chunks are well-formed and reflect incremental arguments assembly.

The sequence shows interleaved tool_calls for two functions with arguments emitted across multiple chunks, terminating with finish_reason "tool_calls". This aligns with expected 4.x/5.x streaming behavior and should exercise the aggregator logic correctly.

Comment on lines 99 to 107
"domain": ".api.openai.com",
"httpOnly": true,
"name": "_cfuvid",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "FCva9.SglahJ3Gfq39cVkMsP46zAGKCEz0lcn8D3jQM-1754899594529-0.0.1.1-604800000"
"value": "iqsdkpeQwMKFjPi0Iv8KlMvb85OgwfiB1PYVE_.dNgc-1755159640202-0.0.1.1-604800000"
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Drop cookies/set-cookie from persisted HARs.

As in other recordings, _cfuvid cookie/set-cookie adds avoidable fixture churn and carries external session identifiers. Please strip them on persist.

See my patch suggestion in the test harness to sanitize cookies and volatile headers at persist time.

Also applies to: 162-165

Comment on lines 96 to 106
"cookies": [
{
"domain": ".api.openai.com",
"expires": "2025-08-11T08:36:13.000Z",
"httpOnly": true,
"name": "__cf_bm",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "7BVkI_Oa29lnPsqHulgW245EtvG64HIEl2Fp7IxZp9I-1754899573-1.0.1.1-kdd4J1X.Jma63pkY2SjVLyddqX97N48JVjPmEOLzulzD.rZiIDkxc04zSITZbl26AssMI9omM7e18ei3dnc5W6H_4xiLGmvNjB4pZIPzlOM"
},
{
"domain": ".api.openai.com",
"httpOnly": true,
"name": "_cfuvid",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "iINbtFQc.yhVxOLsB58zmJ_w777tMCbQwp8i5X9MDmg-1754899573294-0.0.1.1-604800000"
"value": "oJzlXzMbl99K.ngOr11DUoLQkD4oAL_M5u1cOyV1mDM-1755159623848-0.0.1.1-604800000"
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Sanitize cookies in fixtures to prevent churn and avoid storing unnecessary session identifiers.

The recorded cookies and set-cookie headers (_cfuvid) will change frequently and are not used by tests. Keeping them in fixtures increases diff noise and potential for flaky updates. Recommend dropping cookies and set-cookie during recording (see test harness suggestion in instrumentation.test.ts comment).

I’ve proposed a small change to beforePersist in the test file to strip cookies and volatile response headers.

Also applies to: 205-208

🤖 Prompt for AI Agents
In
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har
around lines 96-106 (and similarly lines 205-208), remove cookie entries and any
Set-Cookie/volatile response headers from the persisted fixture to avoid churn;
update the test harness's beforePersist hook to strip the "cookies" array and
any "set-cookie" or other session-specific headers from response objects before
writing the HAR so recordings no longer include _cfuvid or equivalent session
identifiers.

Comment on lines 96 to 106
"cookies": [
{
"domain": ".api.openai.com",
"expires": "2025-08-11T08:36:18.000Z",
"httpOnly": true,
"name": "__cf_bm",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "ir.cz4nw2WPG7xc6lLT75OWqcFetJxpYTAXh9GBFbNY-1754899578-1.0.1.1-16O6NrITpUlIk9x4K3fxzvyvyQhx0_N1cbxbVtmNaCLeXW_OoN12VvQlnCpIjABB.URY8yzVuyLaZ4wLy.3YxO8NJ.wkhoYX1y4V1gpMdpA"
},
{
"domain": ".api.openai.com",
"httpOnly": true,
"name": "_cfuvid",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "daCGg_fpUU3jmxytoFsYXLoU4aQ1zvDMBbYagYlQBfM-1754899578893-0.0.1.1-604800000"
"value": "SfmiOSTiNKwNXeYI28mF.CeV7jmkS4JLhmbnWLlox0I-1755159629164-0.0.1.1-604800000"
}
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid persisting cookies and highly volatile response headers.

The presence of _cfuvid cookie and set-cookie increases fixture volatility and is unnecessary. Recommend sanitizing cookies and volatile headers (date, cf-ray, x-request-id, etc.) in beforePersist.

Patch for the sanitizer is proposed in the test file comment below.

Also applies to: 188-192

🤖 Prompt for AI Agents
In
packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har
around lines 96-106 (and also lines 188-192), the HAR fixture contains volatile
cookies and response headers (e.g., the _cfuvid cookie and Set-Cookie entries,
Date, cf-ray, x-request-id) that make tests brittle; update the test suite's
beforePersist sanitizer to strip or normalize cookies and remove volatile
response headers before saving fixtures — specifically remove entries with names
like _cfuvid or any Set-Cookie header and drop/replace Date, cf-ray,
x-request-id, and similar per-response headers so the persisted HARs are stable.

Comment on lines 87 to 94
if (this.polly) {
const { server } = this.polly as Polly;
server.any().on("beforePersist", (_req, recording) => {
recording.request.headers = recording.request.headers.filter(
({ name }: { name: string }) => name !== "authorization",
);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden Polly sanitizer (strip cookies and volatile headers) and type Mocha context.

Nice guard for this.polly. Expand the sanitizer to drop cookies and volatile headers to reduce fixture churn and avoid persisting session identifiers. Also, type the Mocha context for better TS safety.

Apply this diff:

-  if (this.polly) {
-    const { server } = this.polly as Polly;
-    server.any().on("beforePersist", (_req, recording) => {
-      recording.request.headers = recording.request.headers.filter(
-        ({ name }: { name: string }) => name !== "authorization",
-      );
-    });
-  }
+  if (this.polly) {
+    const { server } = this.polly as Polly;
+    server.any().on("beforePersist", (_req, recording) => {
+      // Drop auth headers, cookies and highly-volatile response headers to reduce fixture churn
+      const dropHeader = (name: string) =>
+        [
+          "authorization",
+          "cookie",
+          "set-cookie",
+          "date",
+          "cf-ray",
+          "x-request-id",
+          "via",
+          "x-envoy-upstream-service-time",
+          "openai-processing-ms",
+        ].includes(name.toLowerCase());
+
+      recording.request.headers = (recording.request.headers || []).filter(
+        ({ name }: { name: string }) => !dropHeader(name),
+      );
+      recording.request.cookies = [];
+
+      recording.response.headers = (recording.response.headers || []).filter(
+        ({ name }: { name: string }) => !dropHeader(name),
+      );
+      recording.response.cookies = [];
+    });
+  }

Additionally, for stricter typing, consider changing the beforeEach signature to:
TypeScript (apply outside this range):
function beforeEach(this: Mocha.Context & { polly?: Polly }) { … }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (this.polly) {
const { server } = this.polly as Polly;
server.any().on("beforePersist", (_req, recording) => {
recording.request.headers = recording.request.headers.filter(
({ name }: { name: string }) => name !== "authorization",
);
});
}
if (this.polly) {
const { server } = this.polly as Polly;
server.any().on("beforePersist", (_req, recording) => {
// Drop auth headers, cookies and highly-volatile response headers to reduce fixture churn
const dropHeader = (name: string) =>
[
"authorization",
"cookie",
"set-cookie",
"date",
"cf-ray",
"x-request-id",
"via",
"x-envoy-upstream-service-time",
"openai-processing-ms",
].includes(name.toLowerCase());
recording.request.headers = (recording.request.headers || []).filter(
({ name }: { name: string }) => !dropHeader(name),
);
recording.request.cookies = [];
recording.response.headers = (recording.response.headers || []).filter(
({ name }: { name: string }) => !dropHeader(name),
);
recording.response.cookies = [];
});
}
🤖 Prompt for AI Agents
In packages/instrumentation-openai/test/instrumentation.test.ts around lines 87
to 94, the Polly sanitizer currently only strips the "authorization" header;
update the sanitizer to also remove "cookie" and other volatile headers (e.g.
x-request-id, x-correlation-id) by filtering out headers whose lower-cased name
matches that block, and keep the existing guard for this.polly. Also add
stronger typing for the Mocha beforeEach by changing its signature (outside this
range) to: function beforeEach(this: Mocha.Context & { polly?: Polly }) so
this.polly is typed as Polly.

Comment on lines 407 to 414
const functionCallArguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.function_call.arguments`
]! as string,
);
assert.ok(functionCallArguments.location);
assert.ok(functionCallArguments.location.includes("Boston"));
assert.ok(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make argument parsing robust across SDK variants (string vs object).

Arguments may be emitted as a JSON string or already-object depending on SDK/version. Parse conditionally to avoid runtime errors.

-    const functionCallArguments = JSON.parse(
-      completionSpan.attributes[
-        `${SpanAttributes.LLM_COMPLETIONS}.0.function_call.arguments`
-      ]! as string,
-    );
-    assert.ok(functionCallArguments.location);
-    assert.ok(functionCallArguments.location.includes("Boston"));
+    const rawFunctionArgs =
+      completionSpan.attributes[
+        `${SpanAttributes.LLM_COMPLETIONS}.0.function_call.arguments`
+      ]!;
+    const functionCallArguments =
+      typeof rawFunctionArgs === "string" ? JSON.parse(rawFunctionArgs) : rawFunctionArgs;
+    assert.ok(functionCallArguments?.location);
+    assert.ok(String(functionCallArguments.location).includes("Boston"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const functionCallArguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.function_call.arguments`
]! as string,
);
assert.ok(functionCallArguments.location);
assert.ok(functionCallArguments.location.includes("Boston"));
assert.ok(
const rawFunctionArgs =
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.function_call.arguments`
]!;
const functionCallArguments =
typeof rawFunctionArgs === "string"
? JSON.parse(rawFunctionArgs)
: rawFunctionArgs;
assert.ok(functionCallArguments?.location);
assert.ok(String(functionCallArguments.location).includes("Boston"));
assert.ok(
🤖 Prompt for AI Agents
In packages/instrumentation-openai/test/instrumentation.test.ts around lines 407
to 414, the test assumes function call arguments are always a JSON string; make
parsing robust by checking the attribute value type before parsing: if it's a
string, JSON.parse it (with try/catch to surface parse errors), otherwise treat
it as an object; then assert on the resulting object (e.g., ensure location
exists and includes "Boston"). Ensure the code handles null/undefined safely and
fails the test with a clear message if parsing or type expectations are not met.

Comment on lines 506 to 513
const toolCallArguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]! as string,
);
assert.ok(toolCallArguments.location);
assert.ok(toolCallArguments.location.includes("Boston"));
assert.ok(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden tool call arguments parsing (string/object).

Mirror the conditional parsing to avoid failures if the SDK changes the type.

-    const toolCallArguments = JSON.parse(
-      completionSpan.attributes[
-        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
-      ]! as string,
-    );
-    assert.ok(toolCallArguments.location);
-    assert.ok(toolCallArguments.location.includes("Boston"));
+    const rawToolArgs0 =
+      completionSpan.attributes[
+        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
+      ]!;
+    const toolCallArguments =
+      typeof rawToolArgs0 === "string" ? JSON.parse(rawToolArgs0) : rawToolArgs0;
+    assert.ok(toolCallArguments?.location);
+    assert.ok(String(toolCallArguments.location).includes("Boston"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const toolCallArguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]! as string,
);
assert.ok(toolCallArguments.location);
assert.ok(toolCallArguments.location.includes("Boston"));
assert.ok(
const rawToolArgs0 =
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]!;
const toolCallArguments =
typeof rawToolArgs0 === "string" ? JSON.parse(rawToolArgs0) : rawToolArgs0;
assert.ok(toolCallArguments?.location);
assert.ok(String(toolCallArguments.location).includes("Boston"));
assert.ok(
🤖 Prompt for AI Agents
In packages/instrumentation-openai/test/instrumentation.test.ts around lines 506
to 513, the test assumes the tool call arguments attribute is always a JSON
string; make the parsing robust by first reading the attribute into a variable,
then if it's a string run JSON.parse on it, otherwise treat it as an object (no
parse), and then proceed with the existing assertions (checking location and
contents).

Comment on lines 600 to 607
const toolCall0Arguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]! as string,
);
assert.ok(toolCall0Arguments.location);
assert.ok(toolCall0Arguments.location.includes("Boston"));
assert.strictEqual(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robust parsing for first streamed tool call arguments.

Guard against string/object variance.

-    const toolCall0Arguments = JSON.parse(
-      completionSpan.attributes[
-        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
-      ]! as string,
-    );
-    assert.ok(toolCall0Arguments.location);
-    assert.ok(toolCall0Arguments.location.includes("Boston"));
+    const rawToolCall0 =
+      completionSpan.attributes[
+        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
+      ]!;
+    const toolCall0Arguments =
+      typeof rawToolCall0 === "string" ? JSON.parse(rawToolCall0) : rawToolCall0;
+    assert.ok(toolCall0Arguments?.location);
+    assert.ok(String(toolCall0Arguments.location).includes("Boston"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const toolCall0Arguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]! as string,
);
assert.ok(toolCall0Arguments.location);
assert.ok(toolCall0Arguments.location.includes("Boston"));
assert.strictEqual(
const rawToolCall0 =
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
]!;
const toolCall0Arguments =
typeof rawToolCall0 === "string" ? JSON.parse(rawToolCall0) : rawToolCall0;
assert.ok(toolCall0Arguments?.location);
assert.ok(String(toolCall0Arguments.location).includes("Boston"));
assert.strictEqual(
🤖 Prompt for AI Agents
In packages/instrumentation-openai/test/instrumentation.test.ts around lines 600
to 607, the test assumes the tool call arguments attribute is always a JSON
string and calls JSON.parse unguarded; update the test to handle both string and
object variants by first reading the attribute into a const, assert it's
defined, then if it's a string call JSON.parse, otherwise if it's already an
object use it directly (and handle any other unexpected types with a
fail/assert), then continue asserting location exists and contains "Boston".

Comment on lines 613 to 620
const toolCall1Arguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.1.arguments`
]! as string,
);
assert.ok(toolCall1Arguments.location);
assert.ok(toolCall1Arguments.location.includes("Chicago"));
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Robust parsing for second streamed tool call arguments.

Same conditional parse as above for consistency.

-    const toolCall1Arguments = JSON.parse(
-      completionSpan.attributes[
-        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.1.arguments`
-      ]! as string,
-    );
-    assert.ok(toolCall1Arguments.location);
-    assert.ok(toolCall1Arguments.location.includes("Chicago"));
+    const rawToolCall1 =
+      completionSpan.attributes[
+        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.1.arguments`
+      ]!;
+    const toolCall1Arguments =
+      typeof rawToolCall1 === "string" ? JSON.parse(rawToolCall1) : rawToolCall1;
+    assert.ok(toolCall1Arguments?.location);
+    assert.ok(String(toolCall1Arguments.location).includes("Chicago"));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const toolCall1Arguments = JSON.parse(
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.1.arguments`
]! as string,
);
assert.ok(toolCall1Arguments.location);
assert.ok(toolCall1Arguments.location.includes("Chicago"));
});
const rawToolCall1 =
completionSpan.attributes[
`${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.1.arguments`
]!;
const toolCall1Arguments =
typeof rawToolCall1 === "string" ? JSON.parse(rawToolCall1) : rawToolCall1;
assert.ok(toolCall1Arguments?.location);
assert.ok(String(toolCall1Arguments.location).includes("Chicago"));
});
🤖 Prompt for AI Agents
In packages/instrumentation-openai/test/instrumentation.test.ts around lines 613
to 620, the code unconditionally JSON.parse's the second streamed tool call
arguments; change it to use the same conditional parsing pattern used earlier
(check the attribute exists and is a string before parsing, e.g. use a ternary
or if to set toolCall1Arguments = attribute ? JSON.parse(attribute) : undefined)
so parsing is robust and consistent, then assert on toolCall1Arguments.location
as before.

Comment on lines 4 to +6
overrides:
form-data@>=4.0.0 <4.0.4: '>=4.0.4'
tmp@<=0.2.3: '>=0.2.4'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Workspace-level overrides look fine; consider pinning to fixed versions for determinism.

  • The overrides are sensible to avoid known issues in form-data <4.0.4 and tmp <=0.2.3.
  • For reproducibility, prefer exact versions (e.g., "4.0.4" and "0.2.4") over ranges, unless you intentionally want to float to newer patch releases.

Run this script to ensure pnpm recognizes workspace-level overrides and to spot any other occurrences of form-data/tmp that could be affected:


🏁 Script executed:

#!/bin/bash
# Show where form-data and tmp appear in lockfile and verify override effect
fd 'pnpm-lock.yaml' -x rg -n 'form-data|tmp' {}

# Sanity check: ensure overrides are set at workspace root package.json or workspace file
rg -n '"overrides"\s*:' -A 5 pnpm-workspace.yaml

Length of output: 1084


🏁 Script executed:

#!/bin/bash
# 1. Confirm that overrides are declared in pnpm-workspace.yaml
echo "---- pnpm-workspace.yaml overrides snippet ----"
rg -n '^overrides:' -A 5 pnpm-workspace.yaml || echo "No overrides block found"

# 2. Show the first 30 lines of pnpm-lock.yaml to inspect the overrides section
echo
echo "---- pnpm-lock.yaml header (first 30 lines) ----"
head -n 30 pnpm-lock.yaml

# 3. Verify that the tmp override entry exists in the lockfile
echo
echo "---- tmp override entries in pnpm-lock.yaml ----"
rg -n 'tmp@<=0.2.3:' pnpm-lock.yaml || echo "No tmp override entry found"

Length of output: 1287


Fix missing tmp override and pin to exact versions for determinism

It looks like your form-data override is applied (seen in pnpm-lock.yaml), but the tmp@<=0.2.3 override never made it into the lockfile. To ensure both packages are correctly overridden and reproducible:

• Adjust your overrides in pnpm-workspace.yaml to use exact versions and a catch-all pattern (which pnpm reliably picks up):

 pnpm-workspace.yaml
 overrides:
-  form-data@>=4.0.0 <4.0.4: '>=4.0.4'
-  tmp@<=0.2.3:           '>=0.2.4'
+  form-data@* :           '4.0.4'
+  tmp@*         :         '0.2.4'

• After updating, run pnpm install to regenerate pnpm-lock.yaml.
• Verify both overrides appear under the top-level overrides: block in pnpm-lock.yaml.

This will both guarantee your overrides take effect and lock to specific versions for full reproducibility.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
overrides:
form-data@>=4.0.0 <4.0.4: '>=4.0.4'
tmp@<=0.2.3: '>=0.2.4'
pnpm-workspace.yaml
overrides:
form-data@* : '4.0.4'
tmp@* : '0.2.4'
🤖 Prompt for AI Agents
In pnpm-workspace.yaml around lines 4 to 6, the tmp override is not being picked
up and your overrides should pin to exact versions and use a catch‑all pattern
so pnpm reliably applies them; update the overrides to use exact version pins
(e.g. form-data@4.0.4 and tmp@0.2.4) or a wildcard pattern that matches all
ranges, save the file, run pnpm install to regenerate pnpm-lock.yaml, and verify
both form-data and tmp appear under the top-level overrides block in
pnpm-lock.yaml.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
packages/instrumentation-openai/src/instrumentation.ts (2)

367-381: Attribute index collision between functions[] and tools[]

You now emit tools under LLM_REQUEST_FUNCTIONS.* with their own 0-based index, which may overwrite attributes emitted for params.functions.*. Use an offset so tools follow functions, and guard undefined parameters to avoid setting undefined attributes.

Apply this diff:

           params.tools?.forEach((tool, index) => {
             if (tool.type !== 'function' || !('function' in tool) || !tool.function) {
               return;
             }

-            attributes[
-              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${index}.name`
-            ] = tool.function.name;
-            attributes[
-              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${index}.description`
-            ] = tool.function.description;
-            attributes[
-              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${index}.arguments`
-            ] = JSON.stringify(tool.function.parameters);
+            attributes[
+              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${(params.functions?.length ?? 0) + index}.name`
+            ] = tool.function.name;
+            attributes[
+              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${(params.functions?.length ?? 0) + index}.description`
+            ] = tool.function.description;
+            attributes[
+              `${SpanAttributes.LLM_REQUEST_FUNCTIONS}.${(params.functions?.length ?? 0) + index}.arguments`
+            ] = JSON.stringify(tool.function.parameters ?? {});
           });

458-466: Streaming function_call assembly loses partial deltas

Requiring both name and arguments in the same chunk misses the common streaming pattern where they arrive separately. Build the function_call incrementally instead.

Apply this diff:

-        if (
-          chunk.choices[0]?.delta.function_call &&
-          chunk.choices[0]?.delta.function_call.arguments &&
-          chunk.choices[0]?.delta.function_call.name
-        ) {
-          // I needed to re-build the object so that Typescript will understand that `name` and `argument` are not null.
-          result.choices[0].message.function_call = {
-            name: chunk.choices[0].delta.function_call.name,
-            arguments: chunk.choices[0].delta.function_call.arguments,
-          };
-        }
+        if (chunk.choices[0]?.delta.function_call) {
+          // Build up function_call across chunks
+          if (!result.choices[0].message.function_call) {
+            result.choices[0].message.function_call = { name: "", arguments: "" };
+          }
+          if (chunk.choices[0].delta.function_call.name) {
+            result.choices[0].message.function_call.name +=
+              chunk.choices[0].delta.function_call.name;
+          }
+          if (chunk.choices[0].delta.function_call.arguments) {
+            result.choices[0].message.function_call.arguments +=
+              chunk.choices[0].delta.function_call.arguments;
+          }
+        }
🧹 Nitpick comments (2)
packages/instrumentation-openai/src/instrumentation.ts (2)

436-436: Avoid as any for the seeded streaming message (type-safety)

Casting the whole message to any hides mistakes. Consider a narrow local type for the mutable message shape to preserve type-safety.

For example:

type MutableAssistantMessage = {
  role: "assistant";
  content: string;
  tool_calls: Array<{
    id: string;
    type: "function";
    function: { name: string; arguments: string };
  }>;
  function_call?: { name: string; arguments: string };
};

Then initialize:

message: {
  role: "assistant",
  content: "",
  tool_calls: [],
} as MutableAssistantMessage

493-497: Tighten typing when mutating tool_calls; optionally avoid as any

The incremental += on function.name and function.arguments is fine for streaming. To reduce reliance on any, introduce a local MutableToolCall type and cast tool_calls once, instead of per-write casts; also ensure the entry has a function object before appending.

Example:

type MutableToolCall = { id: string; type: string; function?: { name: string; arguments: string } };
const calls = result.choices[0].message.tool_calls as unknown as MutableToolCall[];
const entry = calls[toolCall.index];
entry.function ??= { name: "", arguments: "" };
if (toolCall.function?.name) entry.function.name += toolCall.function.name;
if (toolCall.function?.arguments) entry.function.arguments += toolCall.function.arguments;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e0c4d82 and a7153ab.

📒 Files selected for processing (1)
  • packages/instrumentation-openai/src/instrumentation.ts (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/instrumentation-openai/src/instrumentation.ts (2)
packages/traceloop-sdk/src/lib/tracing/decorators.ts (1)
  • tool (276-282)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
🔇 Additional comments (3)
packages/instrumentation-openai/src/instrumentation.ts (3)

122-122: OpenAI 5.x support: version range bump LGTM — confirm wrapper targets still hold in v5

Relaxing to ">=3.1.0 <6" is aligned with the PR goal. Please double-check (in CI matrix or locally) that the prototype paths you wrap (e.g., OpenAI.Chat.Completions.prototype.create) are stable across 5.x minors; the OpenAI SDK has been evolving rapidly.


488-491: Correct fix: assign tool_call.type, don’t concatenate

Switching to assignment for type is the right behavior for streaming tool_calls.


703-711: Good guard: emit tool_calls attributes only for function tools

This prevents spurious attributes for non-function tool types and aligns with the OpenAI tool schema.

result.choices[0].message.tool_calls[
toolCall.index
].function.name += toolCall.function.name;
(result.choices[0].message.tool_calls[toolCall.index] as any).function.name += toolCall.function.name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that it's += no? (I know that it used to be like that)

({ name }: { name: string }) => name !== "authorization",
);
});
if (this.polly) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need this change

]! as string,
),
{ location: "Boston" },
const functionCallArguments = JSON.parse(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd revert it - to make sure it didn't change

],
"get_current_weather",
);
assert.deepEqual(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


overrides:
form-data@>=4.0.0 <4.0.4: '>=4.0.4'
tmp@<=0.2.3: '>=0.2.4'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it solves a vulnerability

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 4e2a79b in 1 minute and 43 seconds. Click for details.
  • Reviewed 2854 lines of code in 13 files
  • Skipped 1 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/package.json:58
  • Draft comment:
    Dependency on "openai" has been updated to version 5.12.2 to support SDK v5. This aligns with issue 642 and ensures compatibility with the new version.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, providing details about a dependency update and its purpose. It does not offer a suggestion, ask for confirmation, or highlight a potential issue.
2. packages/instrumentation-openai/test/instrumentation.test.ts:407
  • Draft comment:
    Changed the assertions for function_call arguments: using assert.deepEqual on the parsed JSON (expecting exactly { location: 'Boston' }) instead of checking with substring. This makes the test more robust and precise with the new SDK behavior.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, explaining the change made in the code. It doesn't provide a suggestion, ask for confirmation, or point out a potential issue. It violates the rule against making purely informative comments.
3. packages/instrumentation-openai/test/instrumentation.test.ts:597
  • Draft comment:
    Updated expected arguments for tool calls in tests for multiple tools. The first tool now expects { location: 'Boston, MA' } and the second { location: 'Chicago, IL' } via assert.deepEqual, reflecting the SDK v5 response structure.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_eMJPlJPYUKuBqPhi

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed a05c4dd in 1 minute and 1 seconds. Click for details.
  • Reviewed 43 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/src/instrumentation.ts:367
  • Draft comment:
    Refactored multi-line if-condition for checking tool validity using consistent double quotes. This is a formatting improvement with no functional change.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/instrumentation-openai/src/instrumentation.ts:490
  • Draft comment:
    Improved multi-line formatting when appending tool call function name and arguments; no logic change, just enhanced readability.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/instrumentation-openai/src/instrumentation.ts:699
  • Draft comment:
    Changed the condition to use consistent double quotes for checking toolCall type before setting span attributes. This is a straightforward formatting update.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_XF0Oh4j7U57d78nG

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/instrumentation-openai/package.json (1)

58-59: Upgrade instrumentation-openai Node engine requirement to ≥18 and correct the description typo

The OpenAI SDK v5+ requires Node 18+, and your project’s .nvmrc pins Node 20—so bumping the engine floor to 18 is safe. I didn’t find any GitHub Actions workflows (or other CI configs) in .github/workflows; please verify your CI runners (including any .travis.yml or .circleci/config.yml) use Node ≥18.

• packages/instrumentation-openai/package.json
– Update engines.node to ">=18" (was ">=14")
– Fix the “Instrumentaion” → “Instrumentation” typo in description
– (Optional) Change the OpenAI SDK devDependency to ^5.12.2 to pick up non-breaking fixes

 packages/instrumentation-openai/package.json
  4c4
-  "description": "OpenAI Instrumentaion",
+  "description": "OpenAI Instrumentation",
 23,25c23,25
-  "engines": {
-    "node": ">=14"
-  },
+  "engines": {
+    "node": ">=18"
+  },
 58c58
-    "openai": "5.12.2",
+    "openai": "^5.12.2",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a7153ab and 4e2a79b.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • packages/instrumentation-openai/package.json (1 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har (0 hunks)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har (0 hunks)
  • packages/instrumentation-openai/test/instrumentation.test.ts (2 hunks)
💤 Files with no reviewable changes (11)
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har
  • packages/instrumentation-openai/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/instrumentation-openai/test/instrumentation.test.ts

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 755cd47 in 1 minute and 3 seconds. Click for details.
  • Reviewed 44 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/src/instrumentation.ts:44
  • Draft comment:
    Define a local type alias APIPromiseType to support both OpenAI v4 and v5+; ensure that all promise objects provide the '_thenUnwrap' method at runtime.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
2. packages/instrumentation-openai/src/instrumentation.ts:419
  • Draft comment:
    Updated the promise type for chat streaming from APIPromise to APIPromiseType. Verify that the _thenUnwrap interface behaves consistently across SDK versions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. packages/instrumentation-openai/src/instrumentation.ts:612
  • Draft comment:
    Changed the _wrapPromise method's promise parameter type to APIPromiseType. Confirm that the promise object’s '_thenUnwrap' method exists and works as expected with the new SDK versions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_afJk6RzDEyOKqPoa

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
packages/instrumentation-openai/src/instrumentation.ts (1)

611-618: Fallback to native Promise when _thenUnwrap is unavailable

Some environments/versions might not expose _thenUnwrap. Add a safe fallback using Promise.then to avoid runtime errors.

   private _wrapPromise<T>(
     type: "chat" | "completion",
     version: "v3" | "v4",
     span: Span,
-    promise: APIPromiseType<T>,
-  ): APIPromiseType<T> {
-    return promise._thenUnwrap((result) => {
+    promise: APIPromiseType<T>,
+  ): APIPromiseType<T> {
+    const unwrap =
+      typeof (promise as any)._thenUnwrap === "function"
+        ? (promise as any)._thenUnwrap.bind(promise)
+        : (onFulfilled: (value: T) => any) =>
+            (Promise.resolve(promise) as Promise<T>).then(onFulfilled);
+    return (unwrap((result: T) => {
       if (version === "v3") {
         if (type === "chat") {
           this._addLogProbsEvent(
             span,
             ((result as any).data as ChatCompletion).choices[0].logprobs,
           );
           this._endSpan({
             type,
             span,
             result: (result as any).data as ChatCompletion,
           });
         } else {
           this._addLogProbsEvent(
             span,
             ((result as any).data as Completion).choices[0].logprobs,
           );
           this._endSpan({
             type,
             span,
             result: (result as any).data as Completion,
           });
         }
       } else {
         if (type === "chat") {
           this._addLogProbsEvent(
             span,
             (result as ChatCompletion).choices[0].logprobs,
           );
           this._endSpan({ type, span, result: result as ChatCompletion });
         } else {
           this._addLogProbsEvent(
             span,
             (result as Completion).choices[0].logprobs,
           );
           this._endSpan({ type, span, result: result as Completion });
         }
       }
 
-      return result;
-    });
+      return result as T;
+    }) as unknown) as APIPromiseType<T>;
   }
♻️ Duplicate comments (1)
packages/instrumentation-openai/src/instrumentation.ts (1)

491-509: Streaming tool_calls assembly: avoid concatenating id and name; only append arguments

It’s incorrect to use += for id and function.name; those should be assigned, not concatenated. Arguments are streamed in chunks and should be concatenated. This also aligns with the prior feedback on "+=" being odd.

-            if (toolCall.id) {
-              result.choices[0].message.tool_calls[toolCall.index].id +=
-                toolCall.id;
-            }
+            if (toolCall.id) {
+              result.choices[0].message.tool_calls[toolCall.index].id =
+                toolCall.id;
+            }
             if (toolCall.type) {
               result.choices[0].message.tool_calls[toolCall.index].type =
                 toolCall.type;
             }
             if (toolCall.function?.name) {
-              (
-                result.choices[0].message.tool_calls[toolCall.index] as any
-              ).function.name += toolCall.function.name;
+              (
+                result.choices[0].message.tool_calls[toolCall.index] as any
+              ).function.name = toolCall.function.name;
             }
             if (toolCall.function?.arguments) {
               (
                 result.choices[0].message.tool_calls[toolCall.index] as any
               ).function.arguments += toolCall.function.arguments;
             }
🧹 Nitpick comments (2)
packages/instrumentation-openai/src/instrumentation.ts (2)

372-389: Guarding tools to only function-type is correct; consider attribute collision with legacy functions

The stricter guard avoids processing non-function tools — good. One consideration: you emit both params.functions and params.tools under the same attribute prefix (llm.request.functions). If both are provided (rare but possible), indices may collide and overwrite attributes.

  • Option A: Use a distinct namespace for tools (e.g., llm.request.tools).
  • Option B: Offset tool indices by params.functions?.length to avoid collisions.

445-445: Avoid broad as any on message if feasible

If you want to avoid the broad cast, introduce a narrow local type for the assistant message that includes optional function_call and tool_calls. This keeps type safety without relying on any.

Example (outside this range):

type AssistantMessage = NonNullable<ChatCompletion["choices"][number]>["message"] & {
  function_call?: { name: string; arguments: string };
  tool_calls?: Array<{
    id: string;
    type: "function";
    function: { name: string; arguments: string };
  }>;
};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a05c4dd and 755cd47.

📒 Files selected for processing (1)
  • packages/instrumentation-openai/src/instrumentation.ts (8 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/instrumentation-openai/src/instrumentation.ts (2)
packages/traceloop-sdk/src/lib/tracing/decorators.ts (1)
  • tool (276-282)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
🔇 Additional comments (3)
packages/instrumentation-openai/src/instrumentation.ts (3)

422-429: Streaming promise typed to APIPromiseType is appropriate

Using APIPromiseType with the async iterator handles both v4/v5 shapes cleanly.


714-723: Verify the necessity of guarding against undefined toolCall.function
I wasn’t able to locate a ToolCall type definition in this package, so it’s unclear whether toolCall.function is ever guaranteed to be non-undefined when toolCall.type === "function". You should:

  • Confirm in your type definitions (or via the OpenAI response schema) that a function property is always present and non-undefined whenever type === "function".

  • If there’s any possibility of it being undefined, update the guard as suggested:

    - if (toolCall.type === "function" && "function" in toolCall) {
    + if (toolCall.type === "function" && "function" in toolCall && toolCall.function) {
        span.setAttribute(
          `${SpanAttributes.LLM_COMPLETIONS}.${index}.tool_calls.${toolIndex}.name`,
          toolCall.function.name,
        );
        span.setAttribute(
          `${SpanAttributes.LLM_COMPLETIONS}.${index}.tool_calls.${toolIndex}.arguments`,
          toolCall.function.arguments,
        );
      }

126-127: Verify OpenAI v5 patch paths
We relaxed the supported range to <6 to include v5, but since this package doesn’t ship the OpenAI client itself, please smoke-test against an openai@5 install to confirm that your patch targets still resolve:

  • openai.chat.completions.create (wrapped via OpenAI.Chat.Completions.prototype)
  • openai.completions.create (wrapped via OpenAI.Completions.prototype)

Run a quick script or REPL session importing openai@5, instantiate a client, and ensure spans are applied to both chat and completion calls.

Comment on lines +42 to +46
// Type definition for APIPromise - compatible with both OpenAI v4 and v5+
// The actual import is handled at runtime via require() calls in the _wrapPromise method
type APIPromiseType<T> = Promise<T> & {
_thenUnwrap: <U>(onFulfilled: (value: T) => U) => APIPromiseType<U>;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make _thenUnwrap optional and fix misleading comment about runtime import

  • The comment says the import is handled via require() calls in _wrapPromise, but there are no such calls. This is misleading.
  • Also, making _thenUnwrap optional provides a safe fallback when the underlying promise is a native Promise in some environments/versions.

Apply this diff to clarify the comment and make _thenUnwrap optional:

-// Type definition for APIPromise - compatible with both OpenAI v4 and v5+
-// The actual import is handled at runtime via require() calls in the _wrapPromise method
-type APIPromiseType<T> = Promise<T> & {
-  _thenUnwrap: <U>(onFulfilled: (value: T) => U) => APIPromiseType<U>;
-};
+// Lightweight shape of OpenAI's APIPromise to support v4/v5 without importing.
+// We only rely on _thenUnwrap when it exists at runtime.
+type APIPromiseType<T> = Promise<T> & {
+  _thenUnwrap?: <U>(onFulfilled: (value: T) => U) => APIPromiseType<U>;
+};
🤖 Prompt for AI Agents
In packages/instrumentation-openai/src/instrumentation.ts around lines 42 to 46,
the comment incorrectly states that an import is handled via require() calls in
_wrapPromise (there are no such calls) and the APIPromiseType marks _thenUnwrap
as required; change the comment to accurately state that the type aims to be
compatible with both OpenAI v4 and v5+ without claiming runtime require() usage,
and make _thenUnwrap optional on the type (i.e., _thenUnwrap?: <U>(onFulfilled:
(value: T) => U) => APIPromiseType<U>) so native Promises that lack this method
are safely supported. Ensure the updated comment is concise and the type change
compiles with existing usages.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 6340b26 in 1 minute and 49 seconds. Click for details.
  • Reviewed 111 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/src/instrumentation.ts:64
  • Draft comment:
    Changing the parameter type from 'typeof openai.OpenAI' to 'unknown' in manuallyInstrument reduces compile-time type safety. Consider using a more specific union or adding runtime checks to ensure the module has the expected structure.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. packages/instrumentation-openai/src/instrumentation.ts:69
  • Draft comment:
    The diff directly wraps openaiModule.Chat.Completions.prototype and openaiModule.Completions.prototype. This assumes these properties always exist. It would be safer to validate the module’s structure before applying instrumentation to prevent potential runtime errors in unexpected versions.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
3. packages/instrumentation-openai/src/instrumentation.ts:80
  • Draft comment:
    The removed branch for the old OpenAIApi (v3) and the revised wrapping for image-related methods now only apply when openaiModule.Images exists. Confirm that dropping backward compatibility for older versions is intentional and that all expected image endpoints (generate, edit, createVariation) are properly covered.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. packages/instrumentation-openai/src/instrumentation.ts:113
  • Draft comment:
    The instrumentation module version range has been updated from [">=3.1.0 <6"] to [">=4 <6"]. This change drops support for v3.1.0 and focuses on newer versions (supporting 5.x). Ensure that this aligns with the desired compatibility policies.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_pkTVEZJZi1KDZ6cH

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
packages/instrumentation-openai/src/instrumentation.ts (2)

479-487: Do not append tool_call.id; assign it.

Using "+=" for tool_call.id risks duplicated IDs when the id is repeated across chunks. IDs are identifiers, not streamed tokens.

-            if (toolCall.id) {
-              result.choices[0].message.tool_calls[toolCall.index].id +=
-                toolCall.id;
-            }
+            if (toolCall.id) {
+              result.choices[0].message.tool_calls[toolCall.index].id =
+                toolCall.id;
+            }

599-646: Fallback to native Promise.then when _thenUnwrap is unavailable.

Calling _thenUnwrap unconditionally will throw when the underlying Promise isn't OpenAI's APIPromise (possible across versions/environments). Provide a safe fallback.

   private _wrapPromise<T>(
     type: "chat" | "completion",
     version: "v3" | "v4",
     span: Span,
-    promise: APIPromiseType<T>,
-  ): APIPromiseType<T> {
-    return promise._thenUnwrap((result) => {
+    promise: APIPromiseType<T>,
+  ): APIPromiseType<T> {
+    const chain =
+      typeof (promise as any)._thenUnwrap === "function"
+        ? (promise as any)._thenUnwrap.bind(promise)
+        : (onFulfilled: (value: T) => any) =>
+            (promise as Promise<T>).then(onFulfilled) as unknown as APIPromiseType<T>;
+
+    return chain((result: T) => {
       if (version === "v3") {
         if (type === "chat") {
           this._addLogProbsEvent(
             span,
             ((result as any).data as ChatCompletion).choices[0].logprobs,
           );
           this._endSpan({
             type,
             span,
             result: (result as any).data as ChatCompletion,
           });
         } else {
           this._addLogProbsEvent(
             span,
             ((result as any).data as Completion).choices[0].logprobs,
           );
           this._endSpan({
             type,
             span,
             result: (result as any).data as Completion,
           });
         }
       } else {
         if (type === "chat") {
           this._addLogProbsEvent(
             span,
             (result as ChatCompletion).choices[0].logprobs,
           );
           this._endSpan({ type, span, result: result as ChatCompletion });
         } else {
           this._addLogProbsEvent(
             span,
             (result as Completion).choices[0].logprobs,
           );
           this._endSpan({ type, span, result: result as Completion });
         }
       }
 
       return result;
-    });
+    }) as unknown as APIPromiseType<T>;
   }
♻️ Duplicate comments (1)
packages/instrumentation-openai/src/instrumentation.ts (1)

42-46: Make _thenUnwrap optional and fix misleading comment (still unresolved).

The comment claims a runtime import via require() in _wrapPromise, which does not exist here. Also, requiring _thenUnwrap at the type level breaks when a native Promise is returned (possible in some environments/versions).

Apply this diff to correct the comment and make _thenUnwrap optional:

-// Type definition for APIPromise - compatible with both OpenAI v4 and v5+
-// The actual import is handled at runtime via require() calls in the _wrapPromise method
-type APIPromiseType<T> = Promise<T> & {
-  _thenUnwrap: <U>(onFulfilled: (value: T) => U) => APIPromiseType<U>;
-};
+// Lightweight shape of OpenAI's APIPromise to support v4/v5 without importing.
+// We only rely on _thenUnwrap when it exists at runtime.
+type APIPromiseType<T> = Promise<T> & {
+  _thenUnwrap?: <U>(onFulfilled: (value: T) => U) => APIPromiseType<U>;
+};
🧹 Nitpick comments (2)
packages/instrumentation-openai/src/instrumentation.ts (2)

433-433: Avoid broad any; narrow the cast to the message type.

Casting the whole message as any reduces type safety. Prefer a narrower cast.

-            } as any,
+            } as Partial<ChatCompletion["choices"][number]["message"]>,

489-497: Reduce any-casts when accumulating function deltas.

You can avoid multiple any-casts by referencing the current tool call and using non-null assertions guarded by your earlier checks.

-            if (toolCall.function?.name) {
-              (
-                result.choices[0].message.tool_calls[toolCall.index] as any
-              ).function.name += toolCall.function.name;
-            }
-            if (toolCall.function?.arguments) {
-              (
-                result.choices[0].message.tool_calls[toolCall.index] as any
-              ).function.arguments += toolCall.function.arguments;
-            }
+            const current = result.choices[0].message.tool_calls![toolCall.index]!;
+            if (toolCall.function?.name) {
+              current.function!.name += toolCall.function.name;
+            }
+            if (toolCall.function?.arguments) {
+              current.function!.arguments += toolCall.function.arguments;
+            }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 755cd47 and 6340b26.

📒 Files selected for processing (1)
  • packages/instrumentation-openai/src/instrumentation.ts (8 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/instrumentation-openai/src/instrumentation.ts (3)
packages/instrumentation-openai/src/image-wrappers.ts (3)
  • wrapImageGeneration (399-456)
  • wrapImageEdit (458-523)
  • wrapImageVariation (525-585)
packages/traceloop-sdk/src/lib/tracing/decorators.ts (1)
  • tool (276-282)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and test
🔇 Additional comments (4)
packages/instrumentation-openai/src/instrumentation.ts (4)

360-377: Good guard on tools to avoid runtime/type errors.

Filtering out non-function tools and validating tool.function prevents undefined access in v5+. This also aligns emitted attributes to function tools only.


410-417: APIPromiseType for streaming is reasonable (await resolves to Stream).

Typing the streaming promise as APIPromiseType<Stream<...>> fits v4/v5 behavior and works with for-await; combined with the _thenUnwrap fallback (see below), this should be robust.


703-711: Good: emit tool_call attributes only for function tools.

This avoids emitting attributes for non-function tools and protects against shape differences in v5.


64-79: Ignore guard suggestion for manuallyInstrument
After reviewing the implementation and usage across our instrumentations, manuallyInstrument is always invoked with the expected module shape (e.g. in manuallyInitInstrumentations we only call it when instrumentModules.openAI is provided). None of our other instrumentations (Anthropic, Cohere, etc.) add optional guarding, so adding those checks here would introduce inconsistency without clear benefit. Keep the current unguarded implementation.

Likely an incorrect or invalid review comment.

Comment on lines +80 to 108
if (openaiModule.Images) {
this._wrap(
(module as any).OpenAIApi.prototype,
"createCompletion",
this.patchOpenAI("completion", "v3"),
openaiModule.Images.prototype,
"generate",
wrapImageGeneration(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);
} else {
this._wrap(
module.Chat.Completions.prototype,
"create",
this.patchOpenAI("chat"),
openaiModule.Images.prototype,
"edit",
wrapImageEdit(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);
this._wrap(
module.Completions.prototype,
"create",
this.patchOpenAI("completion"),
openaiModule.Images.prototype,
"createVariation",
wrapImageVariation(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);

if (module.Images) {
this._wrap(
module.Images.prototype,
"generate",
wrapImageGeneration(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);
this._wrap(
module.Images.prototype,
"edit",
wrapImageEdit(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);
this._wrap(
module.Images.prototype,
"createVariation",
wrapImageVariation(
this.tracer,
this._config.uploadBase64Image,
this._config,
),
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard Images instrumentation and support namespaced module shape.

openaiModule.Images is assumed to exist; if a namespaced shape is passed (e.g., { OpenAI: { Images } }), or Images is absent, _wrap may throw.

Apply this diff to reuse the computed root and guard method presence:

-    if (openaiModule.Images) {
+    const root = (module as any)?.OpenAI ?? (module as any) ?? openaiModule;
+    if (root?.Images?.prototype) {
-      this._wrap(
-        openaiModule.Images.prototype,
+      this._wrap(
+        root.Images.prototype,
         "generate",
         wrapImageGeneration(
           this.tracer,
           this._config.uploadBase64Image,
           this._config,
         ),
       );
-      this._wrap(
-        openaiModule.Images.prototype,
+      this._wrap(
+        root.Images.prototype,
         "edit",
         wrapImageEdit(
           this.tracer,
           this._config.uploadBase64Image,
           this._config,
         ),
       );
-      this._wrap(
-        openaiModule.Images.prototype,
+      this._wrap(
+        root.Images.prototype,
         "createVariation",
         wrapImageVariation(
           this.tracer,
           this._config.uploadBase64Image,
           this._config,
         ),
       );

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/instrumentation-openai/src/instrumentation.ts around lines 80 to
108, the code assumes openaiModule.Images exists and uses it directly which
breaks for namespaced module shapes (e.g., { OpenAI: { Images } }) or when
Images is absent; compute a moduleRoot = openaiModule.OpenAI ?? openaiModule,
then check that moduleRoot.Images is defined before attempting to wrap;
additionally verify the prototype and specific methods (generate, edit,
createVariation) exist on moduleRoot.Images.prototype before calling this._wrap
so you reuse the computed root and guard method presence to avoid throwing.

const module = new InstrumentationNodeModuleDefinition(
"openai",
[">=3.1.0 <5"],
[">=4 <6"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm intentional drop of auto-instrumentation for OpenAI v3.x and update docs accordingly.

The version range changed to ">=4 <6". patch() still contains v3 fallbacks, but init() will no longer auto-patch v3.x. If this is intentional, please update README and migration notes; otherwise, consider keeping ">=3.1.0 <6".

Run this script to spot lingering docs or code paths that still imply v3 auto-support:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n -S 'OpenAIApi|createChatCompletion|createCompletion|v3\.1\.0|>=3\.1\.0 <5|v3' -g '!*node_modules' -g '!*dist'

Length of output: 87657


Confirm Intentional v3 Auto-Instrumentation Removal and Update Docs

It looks like init() now only targets “>=4 <6” (so v3.x will no longer be auto-patched), while the internal patchOpenAI(…, "v3") fallbacks remain available for manual use. If this change is deliberate, please:

• Update your README under packages/instrumentation-openai to call out that v3.x is no longer auto-instrumented and must be patched manually.
• Revise migration notes or CHANGELOG to document the breaking change (dropping v3.x auto-support, new version range “>=4 <6”).
• Optionally, add a helper snippet or example showing how to invoke patchOpenAI("chat", "v3") for users still on v3.x.

If v3.x should remain auto-patched, consider restoring the version range to “>=3.1.0 <6” in the init() matcher.

🤖 Prompt for AI Agents
In packages/instrumentation-openai/src/instrumentation.ts around line 114 the
init() matcher was changed to "[">=4 <6"]" which removes automatic v3.x
patching; if this is intentional, update packages/instrumentation-openai/README
to state v3.x is no longer auto-instrumented and must be patched manually, add a
CHANGELOG/migration note documenting the breaking change and new matcher range,
and add a short example snippet showing how to call patchOpenAI("chat", "v3")
for v3 users; if v3 auto-patching should remain, revert the matcher to include
v3 (e.g., ">=3.1.0 <6") in init().

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 86d5d21 in 2 minutes and 45 seconds. Click for details.
  • Reviewed 2801 lines of code in 13 files
  • Skipped 1 files when reviewing.
  • Skipped posting 10 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har:1
  • Draft comment:
    This HAR file appears to be an auto-generated network recording capturing the expected request and response for token calculation. Details match expectations. No issues were found.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, stating that the HAR file matches expectations and no issues were found. It doesn't provide any actionable feedback or suggestions for improvement.
2. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har:1
  • Draft comment:
    This HAR file correctly captures the network exchange for a chat completion with logprobs enabled. The headers and base64 response are accurate and consistent with expectations.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the HAR file captures the network exchange correctly, which does not align with the rules for useful comments.
3. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har:1
  • Draft comment:
    The recording for stream chat completion with logprobs shows proper segmentation of event stream data and timing. No anomalies were observed.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that no anomalies were observed, which does not align with the rules for useful comments.
4. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har:1
  • Draft comment:
    The HAR file for the chat completion test captures the expected request and response details, including headers and the base64 encoded response.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing what the HAR file captures. It doesn't provide any actionable feedback or suggestions for improvement.
5. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har:1
  • Draft comment:
    This HAR recording for non-chat completion verifies that the prompt parameters are sent correctly and that the base64 encoded response is captured appropriately.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply describes what the HAR recording does, which is not necessary for the PR review process.
6. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har:1
  • Draft comment:
    The HAR file for tool calling in a chat shows the correct structure for function calls. The request and response capture the function definition and its arguments as anticipated.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply states that the HAR file shows the correct structure, which does not align with the rules for useful comments.
7. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har:1
  • Draft comment:
    The image generation HAR file captures an API call for generating an image with accurate request body and response containing the generated image URL.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing what the HAR file captures without providing any actionable feedback or suggestions. It doesn't ask for any specific changes or confirm any intentions, nor does it suggest improvements or optimizations.
8. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har:1
  • Draft comment:
    This HAR document for streaming chat records the expected event stream responses, with correctly segmented data chunks and proper timing information.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative and does not provide any actionable feedback or suggestions for improvement. It simply describes what the HAR document does, which is not necessary for a code review comment.
9. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har:1
  • Draft comment:
    The HAR file for streaming completion (non-chat) captures the event stream responses with base64 encoding, and the token responses along with timing data appear consistent.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment appears to be purely informative, describing the contents of a HAR file without providing any actionable feedback or suggestions for improvement. It does not align with the rules for useful comments, as it does not ask for clarification, suggest changes, or highlight potential issues.
10. packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har:1
  • Draft comment:
    This HAR file for testing function call behavior when multiple tools are involved in a streamed completion shows the correct sequence of function call events and the expected JSON fragments for function call arguments.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, describing the contents of a HAR file without providing any actionable feedback or suggestions. It doesn't ask for any specific changes or confirm any intentions, which violates the rules for comments.

Workflow ID: wflow_XORbZTDTjzaiSumv

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (5)
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (1)

101-121: Sanitize volatile headers and cookies in HAR fixtures

Please update the Polly beforePersist hook in instrumentation.test.ts to strip out all sensitive or ephemeral headers and cookies, not just authorization. Specifically:

  • Filter both request.headers and response.headers to remove:
    • Authorization, Cookie, Set-Cookie, Content-Length, Date, X-Request-Id, CF-Ray
    • Any SDK/env-specific headers starting with x-stainless-
  • Clear out request.cookies and response.cookies arrays to drop Cloudflare cookies (__cf_bm, _cfuvid)
  • After merging, re-record tests or run a one-off sanitizer against existing .har fixtures

Suggested diff in packages/instrumentation-openai/test/instrumentation.test.ts:

--- a/packages/instrumentation-openai/test/instrumentation.test.ts
+++ b/packages/instrumentation-openai/test/instrumentation.test.ts
@@ beforeEach(function () {
-    server.any().on("beforePersist", (_req, recording) => {
-      recording.request.headers = recording.request.headers.filter(
-        ({ name }) => name !== "authorization",
-      );
-    });
+    server.any().on("beforePersist", (_req, recording) => {
+      const volatileNames = [
+        "authorization",
+        "cookie",
+        "set-cookie",
+        "content-length",
+        "date",
+        "x-request-id",
+        "cf-ray",
+      ];
+      const isStainless = (n: string) => n.toLowerCase().startsWith("x-stainless-");
+
+      const sanitize = (hdrs: Array<{ name: string; value: string }>) =>
+        hdrs.filter((h) => {
+          const name = h.name.toLowerCase();
+          return !volatileNames.includes(name) && !isStainless(h.name);
+        });
+
+      recording.request.headers  = sanitize(recording.request.headers);
+      recording.response.headers = sanitize(recording.response.headers);
+      recording.request.cookies  = [];
+      recording.response.cookies = [];
+    });

Apply this change, then re-record or sanitize existing fixtures to remove all volatile values before committing.

packages/instrumentation-openai/test/instrumentation.test.ts (2)

727-733: Fix undefined identifier: imageOpenai is not declared (even though test is skipped).

TypeScript will still flag this unresolved symbol. Use the existing openai client.

-    await imageOpenai.images.edit({
+    await openai.images.edit({

787-791: Same undefined identifier here — use openai.

-    await imageOpenai.images.createVariation({
+    await openai.images.createVariation({
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (1)

103-121: Centralize Fixture Sanitization for Volatile Fields

Rather than manually patching each HAR, add a global sanitizer in your Polly setup to strip/redact these across all recordings. We’ve identified hundreds of fixtures that include volatile fields (cookies, set-cookie headers, Cloudflare tokens, request IDs, dates, project IDs, etc.).

In your test bootstrap (e.g. test/helpers/pollySetup.ts), configure a hook to drop them before persisting:

import { Polly } from '@pollyjs/core';

export function setupPolly() {
  return new Polly('RecordingTests', {
    adapters: ['node-http', 'xhr'],
    persister: 'fs',
    recordIfMissing: true,
    hooks: {
      afterPersist: recording => {
        for (const entry of recording.har.log.entries) {
          const { response } = entry;
          // remove all cookies
          response.cookies = [];
          // strip volatile headers
          const blacklist = [
            'set-cookie',
            'cookie',
            '__cf_bm',
            '_cfuvid',
            'x-request-id',
            'cf-ray',
            'date',
            'openai-project'
          ];
          response.headers = response.headers.filter(
            h => !blacklist.includes(h.name.toLowerCase())
          );
        }
      }
    }
  });
}

Next steps:

  • Import and call setupPolly() in every package’s test setup so all HARs are sanitized automatically.
  • One-time: regenerate affected recordings (or run a script to scrub existing fixtures).
  • Remove manual diffs from individual fixtures.

This will eliminate noise, prevent sensitive data leakage, and keep your snapshots stable.

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (1)

153-159: Sanitize OpenAI and Traceloop identifiers in HAR fixtures

We audited all HAR recordings under packages/**/recordings and discovered real values leaking in tests:

  • openai-organization: "traceloop"
  • openai-project: "proj_tzz1TbPPOXaf6j9tEkVUBIAa"
  • authorization header: "Bearer tl_9981e7218948437584e08e7b724304d8" (in packages/traceloop-sdk/recordings/...)

These identifiers and tokens shouldn’t be committed verbatim. Please replace them with placeholders in your fixtures, for example:

  • openai-organization: <ORGANIZATION_ID>
  • openai-project: <PROJECT_ID>
  • authorization: <AUTH_TOKEN>

Affected files (patterns):

packages/instrumentation-*/**/recordings/**/*.har
packages/traceloop-sdk/recordings/**/*.har

♻️ Duplicate comments (5)
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (1)

101-121: Sanitize HAR: remove cookies and volatile headers to avoid fixture churn and leakage.

Same issues as other fixtures: cookies, date, request-id, cf-ray, content-length, and x-stainless-* should not be persisted. Please rely on the Polly sanitizer to drop them before persisting.

Use the repo-wide scan provided in my other comment to confirm removals.

Also applies to: 197-205, 125-127, 189-191, 219-221, 35-42, 50-57, 69-72, 83-83

packages/instrumentation-openai/test/instrumentation.test.ts (2)

112-129: Harden Polly sanitizer: drop cookies and volatile headers (security + stability).

You only remove authorization. Persisted HARs still contain cookies, request IDs, dates, cf-ray and x-stainless-* SDK/env headers, which are sensitive and/or highly volatile. Strip them to reduce churn and avoid leaking identifiers.

Apply this diff:

       const { server } = this.polly as Polly;
       server.any().on("beforePersist", (_req, recording) => {
-        recording.request.headers = recording.request.headers.filter(
-          ({ name }: { name: string }) => name !== "authorization",
-        );
+        // Drop auth, cookies and volatile headers to reduce fixture churn and avoid persisting identifiers
+        const dropHeader = (name: string) =>
+          [
+            "authorization",
+            "cookie",
+            "set-cookie",
+            "date",
+            "cf-ray",
+            "x-request-id",
+            "via",
+            "x-envoy-upstream-service-time",
+            "openai-processing-ms",
+            "content-length",
+            // OpenAI SDK/client/env specifics that change frequently
+            "user-agent",
+            "x-stainless-arch",
+            "x-stainless-lang",
+            "x-stainless-os",
+            "x-stainless-runtime",
+            "x-stainless-runtime-version",
+            "x-stainless-package-version",
+            "x-stainless-retry-count",
+          ].includes(name.toLowerCase());
+
+        recording.request.headers = (recording.request.headers || []).filter(
+          ({ name }: { name: string }) => !dropHeader(name),
+        );
+        recording.response.headers = (recording.response.headers || []).filter(
+          ({ name }: { name: string }) => !dropHeader(name),
+        );
+        // Drop cookies entirely
+        recording.request.cookies = [];
+        recording.response.cookies = [];
       });

Optionally also type the Mocha context:
TypeScript (apply outside this range):
function beforeEach(this: Mocha.Context & { polly?: Polly }) { … }


542-550: Make tool_call arguments parsing robust (string vs object).

The attribute may be a JSON string or an already-parsed object depending on SDK/version. Parse conditionally to avoid runtime errors.

-    const parsedArgs = JSON.parse(
-      completionSpan.attributes[
-        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
-      ]! as string,
-    );
+    const rawArgs0 =
+      completionSpan.attributes[
+        `${SpanAttributes.LLM_COMPLETIONS}.0.tool_calls.0.arguments`
+      ]!;
+    const parsedArgs =
+      typeof rawArgs0 === "string" ? JSON.parse(rawArgs0) : rawArgs0;
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har (1)

101-121: Sanitize streaming fixture: strip cookies and volatile headers.

This recording also persists __cf_bm/_cfuvid cookies, date, x-request-id, cf-ray, content-length, and x-stainless-* headers. Please rely on the improved Polly sanitizer to drop them before persisting.

Use the scan script from my first HAR comment to confirm these headers/cookies are removed after re-recording.

Also applies to: 197-205, 124-126, 188-190, 218-220, 35-42, 50-57, 69-72, 83-83

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har (1)

101-121: Remove cookies and volatile headers from this completion streaming HAR as well.

Same sanitization needed here to avoid committing sensitive cookies and unstable headers (date, x-request-id, cf-ray, content-length, x-stainless-*).

Reuse the repo-wide rg scan from earlier to validate that fixtures are clean.

Also applies to: 217-224, 124-126, 208-210, 234-236, 35-42, 50-57, 69-72, 83-84

🧹 Nitpick comments (5)
packages/instrumentation-openai/test/instrumentation.test.ts (2)

100-106: Silence noisy console.log in CI (guard with DEBUG).

Keep the helpful local hint but avoid CI noise by gating on an env var.

-    console.log("Using node-fetch for Polly.js compatibility");
+    if (process.env.DEBUG?.includes("polly")) {
+      console.log("Using node-fetch for Polly.js compatibility");
+    }

59-68: Heads-up: requestTimeout/socketTimeout = 0 can hang indefinitely.

This is fine when paired with per-test timeouts (you added for images). Consider adding a global safety timeout or narrowing this to recording mode to avoid deadlocks in CI for other tests.

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (1)

69-72: Drop content-length from request headers to reduce fixture churn.

content-length will change anytime the JSON payload formatting changes. Prefer omitting it from recordings.

Apply this diff to remove it from this fixture:

-            {
-              "_fromType": "array",
-              "name": "content-length",
-              "value": "117"
-            },

Combined with not matching on headers, this makes re-records less brittle.

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (2)

66-67: Avoid pinning runtime version in fixture to reduce churn.

Hardcoding x-stainless-runtime-version to a specific Node version makes the fixture needlessly environment-dependent.

Apply this diff to use a neutral placeholder (or consider dropping the header entirely if Polly matching doesn’t require it):

-              "value": "v20.10.0"
+              "value": "SANITIZED"

247-247: Normalize headersSize and timing fields.

headersSize, startedDateTime, time, and wait add noise and are irrelevant to test semantics; consider zeroing or fixing them.

Apply:

-          "headersSize": 1449,
+          "headersSize": 0,
@@
-        "startedDateTime": "2025-08-14T14:45:17.130Z",
-        "time": 2934,
+        "startedDateTime": "1970-01-01T00:00:00.000Z",
+        "time": 0,
@@
-          "wait": 2934
+          "wait": 0

Also applies to: 253-254, 262-263

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6340b26 and 86d5d21.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (13)
  • packages/instrumentation-openai/package.json (1 hunks)
  • packages/instrumentation-openai/test/instrumentation.test.ts (8 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har (1 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har (1 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-chat_1427850401/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-function-calling_1703714575/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har (1 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har (1 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-completion_1484001317/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har (12 hunks)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-function_call-attributes-in-span-for-stream-completion-when-multiple-tools-cal_2278551756/recording.har (13 hunks)
✅ Files skipped from review due to trivial changes (5)
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-calculate-correct-tokens-for-different-quality-levels_3292443992/recording.har
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-streaming-chat_72904453/recording.har
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-image-generation_2726543658/recording.har
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-stream-chat-completion_2713905208/recording.har
  • packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-tool-calling_1747151373/recording.har
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/instrumentation-openai/package.json
🧰 Additional context used
🧬 Code Graph Analysis (1)
packages/instrumentation-openai/test/instrumentation.test.ts (1)
packages/ai-semantic-conventions/src/SemanticAttributes.ts (1)
  • SpanAttributes (17-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and test
🔇 Additional comments (6)
packages/instrumentation-openai/test/instrumentation.test.ts (1)

662-664: LGTM: Extended timeout for image generation.

Reasonable guard for a slow endpoint during recording/replay.

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har (1)

31-31: UA/version headers reflect SDK 5.12.2 — consistent with PR goal.

The updated User-Agent and x-stainless-package-version align with the SDK bump to 5.x compatibility.

Also applies to: 50-52

packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har (4)

29-37: Headers updated for OpenAI JS v5.12.2 look correct.

The user-agent and x-stainless-* headers reflect the 5.x SDK and are consistent with the PR’s objectives.

Also applies to: 39-42, 50-57


89-90: Minified request body LGTM.

Compact JSON for postData.text is fine and stable for replay.


11-11: Non-actionable: _id changed.

HAR entry _id changes are expected when re-recording. No concerns.


95-101: HAR fixture payload validated successfully

The base64-encoded, gzip-compressed response body decompresses and parses as valid JSON, confirming the recording fixture is intact.

Comment on lines +31 to 37
"value": "OpenAI/JS 5.12.2"
},
{
"_fromType": "array",
"name": "x-stainless-lang",
"value": "js"
"name": "x-stainless-arch",
"value": "arm64"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Don’t match or persist dynamic request headers (user-agent, x-stainless-*, content-length).

These headers vary by SDK patch, machine arch, runtime, and body size. Recording them causes fixture churn; matching on them can break playback when SDK/runtime changes.

Recommended:

  • Ignore request headers in request matching (or at least ignore these specific ones).
  • Optionally strip them before persisting.

Example Polly config in your test setup:

// in test setup where Polly is configured
polly.configure({
  matchRequestsBy: {
    // Avoid matching on headers entirely (simple & robust)
    headers: false,
  },
});

If you must match on headers, provide an allowlist and exclude:

  • user-agent
  • content-length
  • x-stainless-arch
  • x-stainless-lang
  • x-stainless-package-version
  • x-stainless-retry-count
  • x-stainless-runtime-version

Also applies to: 40-42, 50-57, 66-67, 69-72, 83-83

🤖 Prompt for AI Agents
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har
lines ~31-37 (and similarly at 40-42, 50-57, 66-67, 69-72, 83): the recording
currently includes dynamic request headers (user-agent, x-stainless-*,
content-length) which cause fixture churn; update the test/polly setup to stop
matching or persisting these headers by disabling header matching
(matchRequestsBy.headers = false) or by stripping/blacklisting user-agent,
content-length and all x-stainless-* headers before saving the HAR so recordings
are stable across SDK/runtime variations.

Comment on lines +89 to 90
"text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-3.5-turbo\",\"logprobs\":true}"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Future-proof the test model (gpt-3.5-turbo is deprecated).

For any future re-recording against the API, switch to a current model that supports logprobs to avoid breakage.

Example edit in-place (verify the chosen model supports logprobs before changing):

-            "text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-3.5-turbo\",\"logprobs\":true}"
+            "text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-4o-mini\",\"logprobs\":true}"

Would you like me to update the broader test suite to a current, logprobs-capable model and re-record the fixtures?


🌐 Web query:

Which current OpenAI chat completion models support the `logprobs` parameter as of now?

💡 Result:

Short answer — and a quick verification date:

  • As of August 14, 2025, OpenAI’s Chat Completions API exposes the logprobs parameter for chat completions (i.e., you can pass logprobs and top_logprobs). The API docs also call out a specific exception: the gpt-4-vision-preview model does not currently support the chat logprobs option. (community.openai.com)

Notes and next steps

  • In practice logprobs support is model-dependent (text-only chat models generally support it; some preview / vision / beta models may not). If you need a definitive per-model yes/no list I can check specific model names you care about (or run a small compatibility test request for any model and report whether the API accepts logprobs for that model).

Citations:


Update deprecated test model to a current logprobs-compatible model

gpt-3.5-turbo has been deprecated and may break future recordings. As of August 14 2025, all text-only chat models (except gpt-4-vision-preview) support logprobs. Please replace the model in the HAR fixture with a supported one (e.g. gpt-4o-mini) and re-record the test.

• File: packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har
– Lines 89–90: update the model name

Example diff (verify gpt-4o-mini or your chosen model supports logprobs before re-recording):

-            "text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-3.5-turbo\",\"logprobs\":true}"
+            "text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-4o-mini\",\"logprobs\":true}"

Once updated, please re-record the fixture (and consider updating other logprobs tests to use a non-vision, text-only model).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-3.5-turbo\",\"logprobs\":true}"
},
"text": "{\"messages\":[{\"role\":\"user\",\"content\":\"Tell me a joke about OpenTelemetry\"}],\"model\":\"gpt-4o-mini\",\"logprobs\":true}"
},
🤖 Prompt for AI Agents
In
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-emit-logprobs-span-event-for-chat-completion_1028826680/recording.har
around lines 89-90, the HAR fixture uses the deprecated model "gpt-3.5-turbo";
update that model field to a current text-only logprobs-compatible model such as
"gpt-4o-mini" (or another verified logprobs-supported model), save the HAR, then
re-run/re-record the test fixture so the recorded responses match the new model
and update any other logprobs fixtures similarly.

Comment on lines +69 to 72
"_fromType": "array",
"name": "content-length",
"value": "80"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Request header “content-length” is often environment-specific; consider removing or ignoring in matching.

This header tends to vary and can cause fixture churn. If your Polly config matches on headers, consider excluding this one; otherwise, it’s safe to keep.

If you prefer to sanitize it in the recording, apply:

-              "name": "content-length",
-              "value": "80"
+              "name": "content-length",
+              "value": "SANITIZED"

To verify the current matchRequestsBy settings, run:

Also applies to: 83-83


🏁 Script executed:

#!/bin/bash
# Search for Polly setup and matching config
rg -n -A3 -i 'new Polly|setupPolly|matchRequestsBy|matchBy|polly|pollyjs' -g '!node_modules'

Length of output: 102868


Exclude content-length from persisted headers to prevent fixture churn

Your Polly setup already uses matchRequestsBy: { headers: false }, so header differences won’t affect matching. However, recordings still include the content-length header, which can vary by environment and trigger unnecessary diffs. Update your beforePersist hook in packages/instrumentation-openai/test/instrumentation.test.ts (and any other packages with similar hooks) to filter out content-length as well as authorization:

packages/instrumentation-openai/test/instrumentation.test.ts
 server.any().on("beforePersist", (_req, recording) => {
-  recording.request.headers = recording.request.headers.filter(
-    ({ name }: { name: string }) => name !== "authorization"
-  );
+  recording.request.headers = recording.request.headers.filter(
+    ({ name }: { name: string }) =>
+      name.toLowerCase() !== "authorization" &&
+      name.toLowerCase() !== "content-length"
+  );
 });

If you’d rather sanitize existing recordings directly, apply a diff to each HAR:

-              "name": "content-length",
-              "value": "80"
+              "name": "content-length",
+              "value": "SANITIZED"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"_fromType": "array",
"name": "content-length",
"value": "80"
},
server.any().on("beforePersist", (_req, recording) => {
recording.request.headers = recording.request.headers.filter(
({ name }: { name: string }) =>
name.toLowerCase() !== "authorization" &&
name.toLowerCase() !== "content-length"
);
});
🤖 Prompt for AI Agents
In
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har
around lines 69 to 72, the persisted HAR contains a fluctuating "content-length"
header; update the beforePersist hook in
packages/instrumentation-openai/test/instrumentation.test.ts (and any similar
hooks) to remove "content-length" from recorded request/response headers in
addition to "authorization" so recordings no longer include that header, or
alternatively remove the "content-length" entries directly from existing HAR
recordings to prevent fixture churn.

Comment on lines +105 to 112
"expires": "2025-08-14T15:15:20.000Z",
"httpOnly": true,
"name": "__cf_bm",
"path": "/",
"sameSite": "None",
"secure": true,
"value": "dpDj_rzXilyhr6.Gr7BsgjxBWPXkH0QzxFYSJODSL1k-1754899572-1.0.1.1-Dm62jBQ.gvLCAHxZ8fwS7jknxKPsax7.PxNAd66oLv9PlHLplZwv0WKD4PRjjs6VJVQO12VLR6yDX79MAkUA9cRa2K0QQ05.tldNjsAB14U"
"value": "8Zo0W1WdNUEe5N8pLRcrqa6tcfMK1dmXFUFuKsvOciQ-1755182720-1.0.1.1-KC0VBaOYmgjrg8anx2pC19gIaGdhy_5UB3Tw9XHfgK_ehMDzMLdcCHYTKgixPGhAydPOuuNWHT37sV_7FR32jGcfox_SCkkVMgNwG9ZXCsQ"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Redact cookies and Set-Cookie values; these are sensitive and highly ephemeral.

Committing actual cookie values (__cf_bm, _cfuvid) is unnecessary and risky. Redact or drop them to reduce exposure and churn.

Apply:

@@
-              "expires": "2025-08-14T15:15:20.000Z",
+              "expires": "REDACTED",
@@
-              "value": "8Zo0W1WdNUEe5N8pLRcrqa6tcfMK1dmXFUFuKsvOciQ-1755182720-1.0.1.1-KC0VBaOYmgjrg8anx2pC19gIaGdhy_5UB3Tw9XHfgK_ehMDzMLdcCHYTKgixPGhAydPOuuNWHT37sV_7FR32jGcfox_SCkkVMgNwG9ZXCsQ"
+              "value": "<redacted>"
@@
-              "value": "KcexFXGp8n4hoEyANEo3kEN_URABlGh9VeqUBny1ZjY-1755182720191-0.0.1.1-604800000"
+              "value": "<redacted>"
@@
-              "value": "__cf_bm=8Zo0W1WdNUEe5N8pLRcrqa6tcfMK1dmXFUFuKsvOciQ-1755182720-1.0.1.1-KC0VBaOYmgjrg8anx2pC19gIaGdhy_5UB3Tw9XHfgK_ehMDzMLdcCHYTKgixPGhAydPOuuNWHT37sV_7FR32jGcfox_SCkkVMgNwG9ZXCsQ; path=/; expires=Thu, 14-Aug-25 15:15:20 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"
+              "value": "SANITIZED; path=/; expires=Thu, 01-Jan-70 00:00:00 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"
@@
-              "value": "_cfuvid=KcexFXGp8n4hoEyANEo3kEN_URABlGh9VeqUBny1ZjY-1755182720191-0.0.1.1-604800000; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"
+              "value": "SANITIZED; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None"

Also applies to: 114-121, 218-225

🤖 Prompt for AI Agents
In
packages/instrumentation-openai/test/recordings/Test-OpenAI-instrumentation_1770406427/should-set-attributes-in-span-for-completion_322013537/recording.har
around lines 105-112 (and also apply same change to 114-121 and 218-225), the
HAR contains real Set-Cookie cookie values (__cf_bm, _cfuvid) which are
sensitive and ephemeral; remove or redact the cookie "value" fields (or remove
the entire cookie entries) and replace them with a stable placeholder like
"[REDACTED]" or omit the value property so tests keep structure but no real
secrets are committed. Ensure the modified entries remain valid HAR JSON (keep
name, path, sameSite, secure flags if needed) and run the test recordings check
to confirm no structural changes break replay.

{
"name": "date",
"value": "Mon, 11 Aug 2025 08:06:12 GMT"
"value": "Thu, 14 Aug 2025 14:45:20 GMT"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Sanitize volatile response headers to reduce fixture churn.

These server-generated values change per request and don’t affect instrumentation assertions.

Apply:

-              "value": "Thu, 14 Aug 2025 14:45:20 GMT"
+              "value": "Thu, 01 Jan 1970 00:00:00 GMT"
@@
-              "value": "760"
+              "value": "0"
@@
-              "value": "envoy-router-68955c875d-87tvh"
+              "value": "SANITIZED"
@@
-              "value": "816"
+              "value": "0"
@@
-              "value": "req_2ca041b07b24913daf8c1e435a505663"
+              "value": "SANITIZED"
@@
-              "value": "96f13c2eebe7c22f-TLV"
+              "value": "SANITIZED"

Also applies to: 161-163, 177-179, 181-183, 209-211, 235-237

@galkleinman galkleinman merged commit 588bb01 into main Aug 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OpenAI SDK v5

3 participants