-
Notifications
You must be signed in to change notification settings - Fork 12
refactor: Rework client timeout #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updated some clients endpoints with spefici timeouts. Added tests. TODO: Add async tests. Make some deal with Mypy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
src/apify_client/_http_client.py:176
- [nitpick] The exponential backoff calculation would benefit from a clarifying comment or helper function to ensure its intent is immediately clear to future maintainers.
timeout = min(self.timeout_secs, (timeout_secs or self.timeout_secs) * 2 ** (attempt - 1))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you don't use timedelta
because there are already integers utilized, correct?
Yes, I used the same format that was already being used in the rest of the repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, LGTM then
@@ -68,7 +69,7 @@ def __init__( | |||
api_url: str | None = None, | |||
max_retries: int | None = 8, | |||
min_delay_between_retries_millis: int | None = 500, | |||
timeout_secs: int | None = 360, | |||
timeout_secs: int | None = DEFAULT_TIMEOUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you pass None, it will still use the default timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is same as the previous implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I understand it's BC and BC is sacred, but I would expect None to mean "no timeout"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To limit the scope of this change, I will avoid adding unnecessary breaking change to this PR, but it might be added in own PR if needed.
@@ -11,12 +11,13 @@ | |||
class ResourceClient(BaseClient): | |||
"""Base class for sub-clients manipulating a single resource.""" | |||
|
|||
def _get(self) -> dict | None: | |||
def _get(self, timeout_secs: int | None = None) -> dict | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have the default timeout constant as the default as well, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could. I guess this makes only difference if someone decides to use the client directly and not through ApifyClient(A)sync
. I am not strongly against this, but I lean towards not doing it as having same timeout constant used on several levels, can create some confusion about which one is the one being applied.
Currently it set on ApifyClient(A)sync
and it falls through all the way down through the HTTPClient(Async)
@@ -34,7 +37,7 @@ def get(self) -> dict | None: | |||
Returns: | |||
The retrieved dataset, or None, if it does not exist. | |||
""" | |||
return self._get() | |||
return self._get(timeout_secs=_SMALL_TIMEOUT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange to have no way to override this... Could we have a timeout parameter here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any use-case for this?
_timeout_params, | ||
) | ||
@respx.mock | ||
def test_specific_timeouts_for_specific_endpoints_sync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we just test that the timeouts are set to the correct value, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think such test can keep the sync and async clients synchronized and it gives nice overview of the timeouts.
Comparison of this change to master when used in Without the change, the timeout issue caused 2 out of 10 crawler runs to be delayed for several minutes. On average runtime was 110 s
With the change, the timeout issue caused 1 out of 10 crawler runs to be delayed by reduced timeout only for several seconds. On average runtime was 47 s
|
Add optional timeout argument to the individual client calls. Timeout of each call is either this timeout or the default client timeout.
In case of failed, but retry-able request, the timeout is multiplied by 2^attempt in each consecutive attempt, up to the value of client default timeout.
Several storage related endpoints now have reduced timeout value. Mainly the endpoints that are not expected to take long or are considered safe to run twice or idempotent.
Timeouts for the storage related endpoints are all summarized in the new tests where it is all in one place.
Addresses issues found in: apify/crawlee-python#1132