Skip to content

Fix HttpClientResponse body()/end() race#6064

Open
snazy wants to merge 12 commits intoeclipse-vertx:masterfrom
snazy:fix-race-6038-master
Open

Fix HttpClientResponse body()/end() race#6064
snazy wants to merge 12 commits intoeclipse-vertx:masterfrom
snazy:fix-race-6038-master

Conversation

@snazy
Copy link
Copy Markdown

@snazy snazy commented Apr 15, 2026

This change fixes the race in HttpClientResponseImpl described in #6038: handleTrailers()/handleException() can run before body()/end() creates the promise. If that happens, the completion signal is dropped and the future can hang. This is observed from the call site as a HTTP request timeo ut.

This fix is only applied to HttpClientResponseImpl:

  • keep ended/failure state on HttpClientResponseImpl
  • in body()/end(), create the HttpEventHandler promise while holding synchronized(conn)
  • after unlock, replay completion/failure via handleEnd()/handleException() when needed

Although HttpEventHandler looks very similar (like it could be affected by the same/similar race condition), it is not affected. I intentionally did not change HttpEventHandler, because it is also used in server request processing. The only case when server request processing could be affected , is when body() is called from a different thread while end() is being processed - that is very unlikely (feels like a misuse).

Also added two new tests to Http1xTest that reliably fail without the change to HttpClientResponseImpl and reliably pass with that change. The added regression tests are testResponseBodyAfterResponseEnd and testResponseEndAfterResponseEnd.

Fixes #6038

This change fixes the race in `HttpClientResponseImpl` described in eclipse-vertx#6038: `handleTrailers()`/`handleException()` can run before `body()`/`end()` creates the promise. If that happens, the completion signal is dropped and the future can hang. This is observed from the call site as a HTTP request timeo
ut.

This fix is only applied to `HttpClientResponseImpl`:
- keep ended/failure state on `HttpClientResponseImpl`
- in `body()`/`end()`, create the `HttpEventHandler` promise while holding `synchronized(conn)`
- after unlock, replay completion/failure via `handleEnd()`/`handleException()` when needed

Although `HttpEventHandler` looks very similar (like it _could_ be affected by the same/similar race condition), it is not affected. I intentionally did not change `HttpEventHandler`, because it is also used in server request processing. The only case when server request processing could be affected
, is when `body()` is called from a _different_ thread while `end()` is being processed - that is very unlikely (feels like a misuse).

Also added two new tests to `Http1xTest` that reliably fail without the change to `HttpClientResponseImpl` and reliably pass with that change. The added regression tests are `testResponseBodyAfterResponseEnd` and `testResponseEndAfterResponseEnd`.

Fixes eclipse-vertx#6038
@snazy
Copy link
Copy Markdown
Author

snazy commented Apr 15, 2026

@vietj
Copy link
Copy Markdown
Member

vietj commented Apr 15, 2026

to simplify, ended should be a Throwable and we should use a specific throwable constant to signal proper ending, e.g. END_SENTINEL

@snazy
Copy link
Copy Markdown
Author

snazy commented Apr 15, 2026

@vietj For the END_SENTINEL, you mean to throw that constant/stackless exception from checkEnded()?

private Handler<HttpFrame> customFrameHandler;
private Handler<StreamPriority> priorityHandler;

static final class RequestEndedException extends IllegalStateException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for a specific class here, we can simply use == comparison instead

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, updated

if (trailers != null) {
throw new IllegalStateException();
if (ended == ENDED_SENTINEL) {
throw ENDED_SENTINEL;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should throw a new IllegalStateException each time to give proper stack information and not try to reuse the sentinel

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

HttpEventHandler handler;
synchronized (conn) {
if (trailers != null) {
if (ended == ENDED_SENTINEL) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should ignore any exception if ended is not null and consider that receiving any excxeption will not lead to further event delivery

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I had that before (and I agree that would be the correct behavior), but some tests like Http1xTest.testInvalidTrailersInHttpClientResponse() started to fail (timeout), so I reverted the previous behavior. Mean, I can try to adopt the test. WDYT?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 9a3a0ed with that change and updates to the affected tests.

checkEnded();
return eventHandler(true).end();
public Future<Void> end() {
HttpEventHandler eventHandler;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part should be reimplemented more cleanly with an actual Promise set of the client response that becomes failed/succeeded by events

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yea, a cleaner state handling would be nice. Let me give that a try.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented via 0bbc30b

this.stream = stream;
this.conn = stream.connection();
this.completion = request.context.promise();
if (completion.future() != completion) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case would we need that ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a defensive check, if the implementation changes.
The alternative would have been to always have a field Future completionFuture with the same object instance as the Promise completion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary as this always is the case

// the completion-future.
synchronized (conn) {
this.trailers = trailers;
if (completionFuture().isComplete()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead we should return when we call tryComplete which returns false, in that case it means we didn't complete the future, some other event did and we should not try to make more progress

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like it has been done in handleException

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return too early here, accesses to an escaped MultiMap from trailers() could miss the the trailers (after observing the completion).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still make the update and then check return after state update

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that could mutate the trailers field/content after the response has been ended (failure). You think that's acceptable? (I guess you're worried about the extra synchronized cost of FutureImpl.isComplete()?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is acceptable

} else if (this.trailers != trailers) {
this.trailers.setAll(trailers);
}
if (!completion.tryComplete()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should call completion.tryComplete after the synchronized block, because this can call handlers in cascade (side effects), synchronized block should only be used to ensure state visibility.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I’m a bit worried, because moving it out of the lock could introduce races with body()/handler setup (missed or duplicate end delivery).

if (trailers != null) {
// Only report the first exception, and generally only report when the
// response has not yet ended.
if (!completion.tryFail(e)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move tryFail before the synchronized block to avoid side effects within a synchronized block

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, moving tryFail() before the lock feels risky for the same ordering reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in HttpClientResponse.body() / end()

2 participants