Skip to content

Conversation

@sid-rl
Copy link
Contributor

@sid-rl sid-rl commented Dec 1, 2025

No description provided.

@sid-rl sid-rl requested review from dines-rl and james-rl December 1, 2025 21:10
@jrvb-rl jrvb-rl self-requested a review December 1, 2025 21:40
Copy link
Contributor

@james-rl james-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions in review

Comment on lines 17 to 24
"""Asynchronous wrapper around a scenario scorer resource."""

def __init__(
self,
client: AsyncRunloop,
scorer_id: str,
) -> None:
"""Initialize the wrapper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments leak system internals and don't really help a caller understand how to use the code. Can you update them to say something more helpful?

Comment on lines +88 to +90
except InternalServerError:
# Backend may return 500 for validate endpoint - skip if this happens
pytest.skip("Backend returned 500 for scorer validate endpoint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this exposing a real bug?

Comment on lines 35 to 40
result = scorer.get_info(
extra_headers={"X-Custom": "value"},
extra_query={"param": "value"},
extra_body={"key": "value"},
timeout=30.0,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these params both real and specific to the scorer?

If not, consider moving the flag propagation to a dedicated test file instead of duplicating it here

Comment on lines +83 to +90
try:
result = scorer.validate(
scoring_context={},
)
assert result is not None
except InternalServerError:
# Backend may return 500 for validate endpoint - skip if this happens
pytest.skip("Backend returned 500 for scorer validate endpoint")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before -- we shouldn't be throwing 500 errors for validation failures; they should be handled better



class AsyncScorerOps:
"""High-level async manager for creating and managing scenario scorers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to describe as 'benchmark scenario scorers' since 'scenario' isn't particularly obvious outside the benchmark context.

"""Create a new scenario scorer.
:param params: See :typeddict:`~runloop_api_client.sdk._types.SDKScorerCreateParams` for available parameters
:return: Wrapper bound to the newly created scorer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Wrapper bound to..." is a bit of an odd and implementation-specific description. How about "Handle to the newly created scorer"?

"""Test create method."""
mock_async_client.scenarios.scorers.create = AsyncMock(return_value=scorer_view)

client = AsyncScorerOps(mock_async_client)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion later, let's not use 'client' for this, since that looks more like the API client (so the name that made sense in the older tests doesn't really fit here). Something like 'ops' is fine for this.

Comment on lines 547 to 548
async def test_list(self, mock_async_client: AsyncMock, scorer_view: MockScorerView) -> None:
"""Test list method."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to test with 0, 1, and >1 values in the returned list.

Comment on lines 27 to 30
def test_id_property(self, mock_async_client: AsyncMock) -> None:
"""Test id property returns the scorer ID."""
scorer = AsyncScorer(mock_async_client, "scorer_123")
assert scorer.id == "scorer_123"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the test 2 cases above

Comment on lines 25 to 29
def test_id_property(self, mock_client: Mock) -> None:
"""Test id property returns the scorer ID."""
scorer = Scorer(mock_client, "scorer_123")
assert scorer.id == "scorer_123"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate test

@sid-rl sid-rl merged commit 85f798f into next Dec 2, 2025
6 checks passed
@sid-rl sid-rl deleted the siddarth/scorer-sdk branch December 2, 2025 00:24
@stainless-app stainless-app bot mentioned this pull request Dec 2, 2025
stainless-app bot added a commit that referenced this pull request Dec 2, 2025
* chore: hide build context APIs

* fix(devbox): launch parameter typo

* fix(scorer): fixed RL_TEST_CONTEXT to RL_SCORER_CONTEXT

* fix(api): don't ignore devbox keep_alive, suspend and resume in api

* feat(blueprints): Add build context to the OpenAPI spec (#6494)

* chore(mounts): Update documentation for deprecated fields to direct the user to the replacement API

* chore(blueprints): Add build context examples (#694)

* feat(sdk): added scorer classes to sdk (#698)

* added scorer class (kept create and list as static methods for now since we don't know how we're creating scorers yet)

* refactored static methods to ScorerOps class

* fix example docstrings to use correct scorer create params

* scorer tests

* fixed scorer unit test parameters for update and validate

* update scorer and scorer ops docstrings to be more helpful while not exposing system internals

* update docs with scorer classes, methods and types

* remove verbose request options in unit test parameters

* rename client to ops in client test

* rename client test file to ops

* added list_empty, list_single and list_multiple unit tests to all ops class tests

* fix assert_called to assert_awaited

* remove duplicate tests

* release: 1.0.0

---------

Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com>
Co-authored-by: Adam Lesinski <adam@runloop.ai>
Co-authored-by: sid-rl <siddarth@runloop.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants