-
Notifications
You must be signed in to change notification settings - Fork 2
feat(sdk): added scorer classes to sdk #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nce we don't know how we're creating scorers yet)
james-rl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and questions in review
| """Asynchronous wrapper around a scenario scorer resource.""" | ||
|
|
||
| def __init__( | ||
| self, | ||
| client: AsyncRunloop, | ||
| scorer_id: str, | ||
| ) -> None: | ||
| """Initialize the wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments leak system internals and don't really help a caller understand how to use the code. Can you update them to say something more helpful?
| except InternalServerError: | ||
| # Backend may return 500 for validate endpoint - skip if this happens | ||
| pytest.skip("Backend returned 500 for scorer validate endpoint") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this exposing a real bug?
tests/sdk/test_scorer.py
Outdated
| result = scorer.get_info( | ||
| extra_headers={"X-Custom": "value"}, | ||
| extra_query={"param": "value"}, | ||
| extra_body={"key": "value"}, | ||
| timeout=30.0, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these params both real and specific to the scorer?
If not, consider moving the flag propagation to a dedicated test file instead of duplicating it here
| try: | ||
| result = scorer.validate( | ||
| scoring_context={}, | ||
| ) | ||
| assert result is not None | ||
| except InternalServerError: | ||
| # Backend may return 500 for validate endpoint - skip if this happens | ||
| pytest.skip("Backend returned 500 for scorer validate endpoint") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before -- we shouldn't be throwing 500 errors for validation failures; they should be handled better
src/runloop_api_client/sdk/async_.py
Outdated
|
|
||
|
|
||
| class AsyncScorerOps: | ||
| """High-level async manager for creating and managing scenario scorers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably good to describe as 'benchmark scenario scorers' since 'scenario' isn't particularly obvious outside the benchmark context.
src/runloop_api_client/sdk/async_.py
Outdated
| """Create a new scenario scorer. | ||
| :param params: See :typeddict:`~runloop_api_client.sdk._types.SDKScorerCreateParams` for available parameters | ||
| :return: Wrapper bound to the newly created scorer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Wrapper bound to..." is a bit of an odd and implementation-specific description. How about "Handle to the newly created scorer"?
tests/sdk/test_async_clients.py
Outdated
| """Test create method.""" | ||
| mock_async_client.scenarios.scorers.create = AsyncMock(return_value=scorer_view) | ||
|
|
||
| client = AsyncScorerOps(mock_async_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid confusion later, let's not use 'client' for this, since that looks more like the API client (so the name that made sense in the older tests doesn't really fit here). Something like 'ops' is fine for this.
tests/sdk/test_async_clients.py
Outdated
| async def test_list(self, mock_async_client: AsyncMock, scorer_view: MockScorerView) -> None: | ||
| """Test list method.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to test with 0, 1, and >1 values in the returned list.
tests/sdk/test_async_scorer.py
Outdated
| def test_id_property(self, mock_async_client: AsyncMock) -> None: | ||
| """Test id property returns the scorer ID.""" | ||
| scorer = AsyncScorer(mock_async_client, "scorer_123") | ||
| assert scorer.id == "scorer_123" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as the test 2 cases above
tests/sdk/test_scorer.py
Outdated
| def test_id_property(self, mock_client: Mock) -> None: | ||
| """Test id property returns the scorer ID.""" | ||
| scorer = Scorer(mock_client, "scorer_123") | ||
| assert scorer.id == "scorer_123" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate test
* chore: hide build context APIs * fix(devbox): launch parameter typo * fix(scorer): fixed RL_TEST_CONTEXT to RL_SCORER_CONTEXT * fix(api): don't ignore devbox keep_alive, suspend and resume in api * feat(blueprints): Add build context to the OpenAPI spec (#6494) * chore(mounts): Update documentation for deprecated fields to direct the user to the replacement API * chore(blueprints): Add build context examples (#694) * feat(sdk): added scorer classes to sdk (#698) * added scorer class (kept create and list as static methods for now since we don't know how we're creating scorers yet) * refactored static methods to ScorerOps class * fix example docstrings to use correct scorer create params * scorer tests * fixed scorer unit test parameters for update and validate * update scorer and scorer ops docstrings to be more helpful while not exposing system internals * update docs with scorer classes, methods and types * remove verbose request options in unit test parameters * rename client to ops in client test * rename client test file to ops * added list_empty, list_single and list_multiple unit tests to all ops class tests * fix assert_called to assert_awaited * remove duplicate tests * release: 1.0.0 --------- Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com> Co-authored-by: Adam Lesinski <adam@runloop.ai> Co-authored-by: sid-rl <siddarth@runloop.ai>
No description provided.