Fix HttpClientResponse body()/end() race#6064
Fix HttpClientResponse body()/end() race#6064snazy wants to merge 12 commits intoeclipse-vertx:masterfrom
Conversation
This change fixes the race in `HttpClientResponseImpl` described in eclipse-vertx#6038: `handleTrailers()`/`handleException()` can run before `body()`/`end()` creates the promise. If that happens, the completion signal is dropped and the future can hang. This is observed from the call site as a HTTP request timeo ut. This fix is only applied to `HttpClientResponseImpl`: - keep ended/failure state on `HttpClientResponseImpl` - in `body()`/`end()`, create the `HttpEventHandler` promise while holding `synchronized(conn)` - after unlock, replay completion/failure via `handleEnd()`/`handleException()` when needed Although `HttpEventHandler` looks very similar (like it _could_ be affected by the same/similar race condition), it is not affected. I intentionally did not change `HttpEventHandler`, because it is also used in server request processing. The only case when server request processing could be affected , is when `body()` is called from a _different_ thread while `end()` is being processed - that is very unlikely (feels like a misuse). Also added two new tests to `Http1xTest` that reliably fail without the change to `HttpClientResponseImpl` and reliably pass with that change. The added regression tests are `testResponseBodyAfterResponseEnd` and `testResponseEndAfterResponseEnd`. Fixes eclipse-vertx#6038
|
Backport for 5.0 : https://github.com/snazy/vert.x/pull/new/fix-race-6038-5.0 |
|
to simplify, ended should be a Throwable and we should use a specific throwable constant to signal proper ending, e.g. END_SENTINEL |
|
@vietj For the END_SENTINEL, you mean to throw that constant/stackless exception from |
| private Handler<HttpFrame> customFrameHandler; | ||
| private Handler<StreamPriority> priorityHandler; | ||
|
|
||
| static final class RequestEndedException extends IllegalStateException { |
There was a problem hiding this comment.
no need for a specific class here, we can simply use == comparison instead
| if (trailers != null) { | ||
| throw new IllegalStateException(); | ||
| if (ended == ENDED_SENTINEL) { | ||
| throw ENDED_SENTINEL; |
There was a problem hiding this comment.
we should throw a new IllegalStateException each time to give proper stack information and not try to reuse the sentinel
| HttpEventHandler handler; | ||
| synchronized (conn) { | ||
| if (trailers != null) { | ||
| if (ended == ENDED_SENTINEL) { |
There was a problem hiding this comment.
I think here we should ignore any exception if ended is not null and consider that receiving any excxeption will not lead to further event delivery
There was a problem hiding this comment.
Yea, I had that before (and I agree that would be the correct behavior), but some tests like Http1xTest.testInvalidTrailersInHttpClientResponse() started to fail (timeout), so I reverted the previous behavior. Mean, I can try to adopt the test. WDYT?
There was a problem hiding this comment.
Pushed 9a3a0ed with that change and updates to the affected tests.
| checkEnded(); | ||
| return eventHandler(true).end(); | ||
| public Future<Void> end() { | ||
| HttpEventHandler eventHandler; |
There was a problem hiding this comment.
I think this part should be reimplemented more cleanly with an actual Promise set of the client response that becomes failed/succeeded by events
There was a problem hiding this comment.
Hm, yea, a cleaner state handling would be nice. Let me give that a try.
| this.stream = stream; | ||
| this.conn = stream.connection(); | ||
| this.completion = request.context.promise(); | ||
| if (completion.future() != completion) { |
There was a problem hiding this comment.
in which case would we need that ?
There was a problem hiding this comment.
Just a defensive check, if the implementation changes.
The alternative would have been to always have a field Future completionFuture with the same object instance as the Promise completion.
There was a problem hiding this comment.
I don't think this is necessary as this always is the case
| // the completion-future. | ||
| synchronized (conn) { | ||
| this.trailers = trailers; | ||
| if (completionFuture().isComplete()) { |
There was a problem hiding this comment.
I think instead we should return when we call tryComplete which returns false, in that case it means we didn't complete the future, some other event did and we should not try to make more progress
There was a problem hiding this comment.
like it has been done in handleException
There was a problem hiding this comment.
If we return too early here, accesses to an escaped MultiMap from trailers() could miss the the trailers (after observing the completion).
There was a problem hiding this comment.
I think we can still make the update and then check return after state update
There was a problem hiding this comment.
Hm, that could mutate the trailers field/content after the response has been ended (failure). You think that's acceptable? (I guess you're worried about the extra synchronized cost of FutureImpl.isComplete()?)
| } else if (this.trailers != trailers) { | ||
| this.trailers.setAll(trailers); | ||
| } | ||
| if (!completion.tryComplete()) { |
There was a problem hiding this comment.
we should call completion.tryComplete after the synchronized block, because this can call handlers in cascade (side effects), synchronized block should only be used to ensure state visibility.
There was a problem hiding this comment.
Hm, I’m a bit worried, because moving it out of the lock could introduce races with body()/handler setup (missed or duplicate end delivery).
| if (trailers != null) { | ||
| // Only report the first exception, and generally only report when the | ||
| // response has not yet ended. | ||
| if (!completion.tryFail(e)) { |
There was a problem hiding this comment.
move tryFail before the synchronized block to avoid side effects within a synchronized block
There was a problem hiding this comment.
Similar here, moving tryFail() before the lock feels risky for the same ordering reasons.
This change fixes the race in
HttpClientResponseImpldescribed in #6038:handleTrailers()/handleException()can run beforebody()/end()creates the promise. If that happens, the completion signal is dropped and the future can hang. This is observed from the call site as a HTTP request timeo ut.This fix is only applied to
HttpClientResponseImpl:HttpClientResponseImplbody()/end(), create theHttpEventHandlerpromise while holdingsynchronized(conn)handleEnd()/handleException()when neededAlthough
HttpEventHandlerlooks very similar (like it could be affected by the same/similar race condition), it is not affected. I intentionally did not changeHttpEventHandler, because it is also used in server request processing. The only case when server request processing could be affected , is whenbody()is called from a different thread whileend()is being processed - that is very unlikely (feels like a misuse).Also added two new tests to
Http1xTestthat reliably fail without the change toHttpClientResponseImpland reliably pass with that change. The added regression tests aretestResponseBodyAfterResponseEndandtestResponseEndAfterResponseEnd.Fixes #6038