feat(sdk): added scorer classes to sdk #698

sid-rl · 2025-12-01T21:10:16Z

No description provided.

…nce we don't know how we're creating scorers yet)

james-rl

Some comments and questions in review

james-rl · 2025-12-01T21:41:08Z

src/runloop_api_client/sdk/async_scorer.py

+    """Asynchronous wrapper around a scenario scorer resource."""
+
+    def __init__(
+        self,
+        client: AsyncRunloop,
+        scorer_id: str,
+    ) -> None:
+        """Initialize the wrapper.


These comments leak system internals and don't really help a caller understand how to use the code. Can you update them to say something more helpful?

james-rl · 2025-12-01T21:51:17Z

tests/smoketests/sdk/test_async_scorer.py

+        except InternalServerError:
+            # Backend may return 500 for validate endpoint - skip if this happens
+            pytest.skip("Backend returned 500 for scorer validate endpoint")


isn't this exposing a real bug?

james-rl · 2025-12-01T21:52:27Z

tests/sdk/test_scorer.py

+        result = scorer.get_info(
+            extra_headers={"X-Custom": "value"},
+            extra_query={"param": "value"},
+            extra_body={"key": "value"},
+            timeout=30.0,
+        )


are these params both real and specific to the scorer?

If not, consider moving the flag propagation to a dedicated test file instead of duplicating it here

james-rl · 2025-12-01T21:53:18Z

tests/smoketests/sdk/test_scorer.py

+        try:
+            result = scorer.validate(
+                scoring_context={},
+            )
+            assert result is not None
+        except InternalServerError:
+            # Backend may return 500 for validate endpoint - skip if this happens
+            pytest.skip("Backend returned 500 for scorer validate endpoint")


same comment as before -- we shouldn't be throwing 500 errors for validation failures; they should be handled better

…exposing system internals

jrvb-rl · 2025-12-01T22:12:01Z

src/runloop_api_client/sdk/async_.py



+class AsyncScorerOps:
+    """High-level async manager for creating and managing scenario scorers.


Probably good to describe as 'benchmark scenario scorers' since 'scenario' isn't particularly obvious outside the benchmark context.

jrvb-rl · 2025-12-01T22:17:54Z

src/runloop_api_client/sdk/async_.py

+        """Create a new scenario scorer.
+
+        :param params: See :typeddict:`~runloop_api_client.sdk._types.SDKScorerCreateParams` for available parameters
+        :return: Wrapper bound to the newly created scorer


"Wrapper bound to..." is a bit of an odd and implementation-specific description. How about "Handle to the newly created scorer"?

jrvb-rl · 2025-12-01T22:27:52Z

tests/sdk/test_async_clients.py

+        """Test create method."""
+        mock_async_client.scenarios.scorers.create = AsyncMock(return_value=scorer_view)
+
+        client = AsyncScorerOps(mock_async_client)


To avoid confusion later, let's not use 'client' for this, since that looks more like the API client (so the name that made sense in the older tests doesn't really fit here). Something like 'ops' is fine for this.

jrvb-rl · 2025-12-01T22:33:09Z

tests/sdk/test_async_clients.py

+    async def test_list(self, mock_async_client: AsyncMock, scorer_view: MockScorerView) -> None:
+        """Test list method."""


Would be good to test with 0, 1, and >1 values in the returned list.

jrvb-rl · 2025-12-01T22:34:53Z

tests/sdk/test_async_scorer.py

+    def test_id_property(self, mock_async_client: AsyncMock) -> None:
+        """Test id property returns the scorer ID."""
+        scorer = AsyncScorer(mock_async_client, "scorer_123")
+        assert scorer.id == "scorer_123"


This is the same as the test 2 cases above

jrvb-rl · 2025-12-01T22:42:05Z

tests/sdk/test_scorer.py

+    def test_id_property(self, mock_client: Mock) -> None:
+        """Test id property returns the scorer ID."""
+        scorer = Scorer(mock_client, "scorer_123")
+        assert scorer.id == "scorer_123"
+


duplicate test

… class tests

* chore: hide build context APIs * fix(devbox): launch parameter typo * fix(scorer): fixed RL_TEST_CONTEXT to RL_SCORER_CONTEXT * fix(api): don't ignore devbox keep_alive, suspend and resume in api * feat(blueprints): Add build context to the OpenAPI spec (#6494) * chore(mounts): Update documentation for deprecated fields to direct the user to the replacement API * chore(blueprints): Add build context examples (#694) * feat(sdk): added scorer classes to sdk (#698) * added scorer class (kept create and list as static methods for now since we don't know how we're creating scorers yet) * refactored static methods to ScorerOps class * fix example docstrings to use correct scorer create params * scorer tests * fixed scorer unit test parameters for update and validate * update scorer and scorer ops docstrings to be more helpful while not exposing system internals * update docs with scorer classes, methods and types * remove verbose request options in unit test parameters * rename client to ops in client test * rename client test file to ops * added list_empty, list_single and list_multiple unit tests to all ops class tests * fix assert_called to assert_awaited * remove duplicate tests * release: 1.0.0 --------- Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com> Co-authored-by: Adam Lesinski <adam@runloop.ai> Co-authored-by: sid-rl <siddarth@runloop.ai>

sid-rl added 4 commits December 1, 2025 13:38

added scorer class (kept create and list as static methods for now si…

5e37a2e

…nce we don't know how we're creating scorers yet)

refactored static methods to ScorerOps class

94435a7

fix example docstrings to use correct scorer create params

d2910e7

scorer tests

4194ce1

sid-rl requested review from dines-rl and james-rl December 1, 2025 21:10

jrvb-rl self-requested a review December 1, 2025 21:40

james-rl reviewed Dec 1, 2025

View reviewed changes

sid-rl added 2 commits December 1, 2025 17:02

fixed scorer unit test parameters for update and validate

e8fdc37

update scorer and scorer ops docstrings to be more helpful while not …

c1f617e

…exposing system internals

jrvb-rl approved these changes Dec 1, 2025

View reviewed changes

sid-rl added 7 commits December 1, 2025 18:02

update docs with scorer classes, methods and types

74ca76c

remove verbose request options in unit test parameters

c5bc7d3

rename client to ops in client test

e3e2c14

rename client test file to ops

e37d9e1

added list_empty, list_single and list_multiple unit tests to all ops…

bf11fd2

… class tests

fix assert_called to assert_awaited

92b6130

remove duplicate tests

7b68fa4

sid-rl merged commit 85f798f into next Dec 2, 2025
6 checks passed

sid-rl deleted the siddarth/scorer-sdk branch December 2, 2025 00:24

stainless-app bot mentioned this pull request Dec 2, 2025

release: 1.0.0 #688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sdk): added scorer classes to sdk #698

feat(sdk): added scorer classes to sdk #698

Uh oh!

sid-rl commented Dec 1, 2025

Uh oh!

james-rl left a comment

Uh oh!

james-rl Dec 1, 2025

Uh oh!

james-rl Dec 1, 2025

Uh oh!

james-rl Dec 1, 2025

Uh oh!

james-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

jrvb-rl Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		class AsyncScorerOps:
		"""High-level async manager for creating and managing scenario scorers.

		async def test_list(self, mock_async_client: AsyncMock, scorer_view: MockScorerView) -> None:
		"""Test list method."""

feat(sdk): added scorer classes to sdk #698

feat(sdk): added scorer classes to sdk #698

Uh oh!

Conversation

sid-rl commented Dec 1, 2025

Uh oh!

james-rl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants