-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #283 : Fixes CPU busy loop(s) seen in request_multiple. #284
base: develop
Are you sure you want to change the base?
Conversation
library/Requests/Transport/cURL.php
Outdated
} | ||
while ($status === CURLM_CALL_MULTI_PERFORM); | ||
while (true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the above approach may be incorrect. Maybe the part that was causing the busy loop wasn't the above do{}while
, but below
The below is probably a better alternative, and can get rid of the above.
while ($status === CURLM_CALL_MULTI_PERFORM);
+ // Insert a 50ms sleep here so that we don't busy loop and chew cpu
+ $selectRes = curl_multi_select($multihandle, 0.05);
+
$to_process = array();
7234c79
to
c7d8ee4
Compare
Codecov Report
@@ Coverage Diff @@
## master #284 +/- ##
==========================================
- Coverage 92.11% 91.73% -0.38%
==========================================
Files 21 21
Lines 1762 1767 +5
==========================================
- Hits 1623 1621 -2
- Misses 139 146 +7
Continue to review full report at Codecov.
|
I'm encountering brand new issues due to php/php-src#3509 (comment) On September 2018 for PHP 7.1.23, after I created this PR, curl_multi_select will now return
I think that the issue might be triggered by timeouts (e.g. connect timeouts on one request, while the other request doesn't timeout). https://bugs.php.net/bug.php?id=76480 mentions the following:
EDIT: Sorry, that would break older versions or code using older libcurl because it returned -1 on timeout previously. https://bugs.php.net/bug.php?id=77030 mentions a better approach $Descriptors = curl_multi_select( $Master, 0.1 );
if( $Descriptors === -1 && curl_multi_errno( $Master ) !== CURLM_OK ) {
Note that I'm not a curl_multi expert (I had a bit of experience using it from university, but that's it), and there may be edge cases I haven't considered |
#284 (comment) applied to an alternative version I was testing out locally without the usleep.
|
The failure mode I'm seeing in the latest php versions may be related to my use of the NOSIGNAL patch, if nobody else sees it |
In order to reproduce the issue I saw, I did the following:
https://gist.github.com/TysonAndre/46a6829d9d8de00e50226890a6a9a63f has the full logs and details about the options used, but the part of interest is that curl_multi_select isn't sleeping for nearly as long as I'd expect it to, and it is neither returning -1 nor returning an error in curl_multi_error.
The below link has the code changes I added to diagnose the issue and the (ineffective) mitigations I'm currently trying. A hardcoded usleep() of 1 millisecond would harm PHP response times if the service would normally complete in less than a millisecond /**
* Send multiple requests simultaneously
*
* @param array $requests Request data
* @param array $options Global options
* @return array Array of Requests_Response objects (may contain Requests_Exception or string responses as well)
*/
public function request_multiple($requests, $options) {
// If you're not requesting, we can't get any responses ¯\_(ツ)_/¯
if (empty($requests)) {
return array();
}
$multihandle = curl_multi_init();
$subrequests = array();
$subhandles = array();
$class = get_class($this);
foreach ($requests as $id => $request) {
$subrequests[$id] = new $class();
$subhandles[$id] = $subrequests[$id]->get_subrequest_handle($request['url'], $request['headers'], $request['data'], $request['options']);
$request['options']['hooks']->dispatch('curl.before_multi_add', array(&$subhandles[$id]));
curl_multi_add_handle($multihandle, $subhandles[$id]);
}
$completed = 0;
$responses = array();
$request['options']['hooks']->dispatch('curl.before_multi_exec', array(&$multihandle));
do {
$active = false;
$is_first_multi_exec = true;
do {
/*
if (!$is_first_multi_exec) {
// Sleep for 1 millisecond to avoid a busy loop that would chew CPU.
// XXX uncomment this to mitigate this a bit - there are better mitigations such as making this sleep after curl_multi_select if it returns 0 or -1.
usleep(1000);
}
*/
$start = microtime(true);
$status = curl_multi_exec($multihandle, $active);
fprintf(STDERR, "curl_multi_exec returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($status), microtime(true) - $start, curl_multi_errno($multihandle));
$is_first_multi_exec = false;
}
while ($status === CURLM_CALL_MULTI_PERFORM);
// Insert a 50ms sleep here so that we don't busy loop and chew cpu
$start = microtime(true);
$selectRes = curl_multi_select($multihandle, 2.0);
fprintf(STDERR, "curl_multi_select returned %s after %.6f seconds: curl_multi_errno=%s\n", json_encode($selectRes), microtime(true) - $start, curl_multi_errno($multihandle));
fflush(STDERR);
$to_process = array();
// Read the information as needed
while ($done = curl_multi_info_read($multihandle)) {
echo "curl_multi_info_read returned a value:\n";
var_dump($done);
echo "curl error name: " . curl_strerror($done['result']) . "\n";
$key = array_search($done['handle'], $subhandles, true);
if (!isset($to_process[$key])) {
$to_process[$key] = $done;
}
}
// Parse the finished requests before we start getting the new ones
foreach ($to_process as $key => $done) {
$options = $requests[$key]['options'];
if (CURLE_OK !== $done['result']) {
//get error string for handle.
$reason = curl_error($done['handle']);
$exception = new Requests_Exception_Transport_cURL(
$reason,
Requests_Exception_Transport_cURL::EASY,
$done['handle'],
$done['result']
);
$responses[$key] = $exception;
$options['hooks']->dispatch('transport.internal.parse_error', array(&$responses[$key], $requests[$key]));
}
else {
$responses[$key] = $subrequests[$key]->process_response($subrequests[$key]->response_data, $options);
$options['hooks']->dispatch('transport.internal.parse_response', array(&$responses[$key], $requests[$key]));
}
curl_multi_remove_handle($multihandle, $done['handle']);
curl_close($done['handle']);
if (!is_string($responses[$key])) {
$options['hooks']->dispatch('multiple.request.complete', array(&$responses[$key], $key));
}
$completed++;
}
}
while ($active || $completed < count($subrequests));
$request['options']['hooks']->dispatch('curl.after_multi_exec', array(&$multihandle));
curl_multi_close($multihandle);
return $responses;
} The endpoint I used to test that connected immediately but failed to respond, running on localhost:
EDIT: This occurs whether or not I set CURLOPT_NOSIGNAL. I'm using a recent version of PHP 7.1 (>= 7.1.23 and curl >= 7.28.0.) |
See WordPress#284 for background information This will require more testing across various libcurl versions. I doubt that the timeout is necessary for curl_multi_select (calls select() if it can), but leaving in one just in case of bugs, so that it will end. - Haven't thoroughly checked for relevant libcurl bugs. Asynchronously wait for events with a short timeout if CURLM_CALL_MULTI_PERFORM fails. Use shorter timeouts when the total time elapsed so far is shorter. Make the largest possible manual usleep 2ms.
c7d8ee4
to
437cecf
Compare
Codecov Report
@@ Coverage Diff @@
## master #284 +/- ##
============================================
+ Coverage 92.12% 92.13% +0.01%
- Complexity 761 764 +3
============================================
Files 21 21
Lines 1764 1781 +17
============================================
+ Hits 1625 1641 +16
- Misses 139 140 +1
Continue to review full report at Codecov.
|
It seems like long-lived request_multiple operations are more affected by the sleep statements that were added.
It seems like the addition of usleep doesn't hurt short requests, but makes long requests take noticeably longer. I should only be performing the usleep if the number returned by curl_multi_select wasn't positive. https://curl.haxx.se/libcurl/c/curl_multi_wait.html
The implementation in php-src for curl_multi_select: #if LIBCURL_VERSION_NUM >= 0x071c00 /* Available since 7.28.0 */
error = curl_multi_wait(mh->multi, NULL, 0, (unsigned long) timeout * 1000.0, &numfds);
if (CURLM_OK != error) {
SAVE_CURLM_ERROR(mh, error);
RETURN_LONG(-1);
}
RETURN_LONG(numfds);
#else
_make_timeval_struct(&to, timeout);
FD_ZERO(&readfds);
FD_ZERO(&writefds);
FD_ZERO(&exceptfds);
error = curl_multi_fdset(mh->multi, &readfds, &writefds, &exceptfds, &maxfd);
SAVE_CURLM_ERROR(mh, error);
if (maxfd == -1) {
RETURN_LONG(-1);
}
RETURN_LONG(select(maxfd + 1, &readfds, &writefds, &exceptfds, &to));
#endif |
No further changes are planned. This seems to be working pretty well to mitigate edge cases causing cpu usage spikes in a fraction of requests.
|
@TysonAndre Hiya! Sorry for the long wait, but we're finally trying to get through the list of open PRs. We're currently planning two triage sessions for Requests 2.0 and would like to invite you to join us in one or both of these sessions to talk us through the PR and discuss it. The triage sessions are currently planned for:
Would you be available to join us during those times on one of those days ? Also: would you be willing and able to add some tests for these changes to the PR ? |
PR #493 is possibly a partial duplicate of this PR. |
This will require more testing across various libcurl versions.
I doubt that the timeout is necessary for curl_multi_select
(calls select() if it can),
but leaving in one just in case of bugs, so that it will end.
Asynchronously wait for events with a short timeout if CURLM_CALL_MULTI_PERFORM
Fixes #283