Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace HTTPClient request execution #320

Draft
wants to merge 24 commits into
base: tracing-development
Choose a base branch
from

Conversation

slashmo
Copy link

@slashmo slashmo commented Dec 5, 2020

Motivation:

Context Propagation

In order to instrument distributed systems, metadata such as trace ids
must be propagated across network boundaries through HTTP headers.
As HTTPClient operates at one such boundary, it should take care of
injecting metadata into HTTP headers automatically using the configured
instrument.

Built-in tracing

Furthermore, HTTPClient should create a Span for executed requests
under the hood, so that users benefit from tracing effortlessly.

Modifications:

  • Inject instrumentation metadata into HTTP headers
  • Add HTTPClient method overloads accepting LoggingContext
  • Create Span for executed HTTP request

Result:

  • New HTTPClient method overloads accepting LoggingContext
  • Existing overloads accepting Logger construct a DefaultLoggingContext
  • Existing methods that neither take Logger nor LoggingContext construct
    a DefaultLoggingContext

@swift-server-bot
Copy link

Can one of the admins verify this patch?

4 similar comments
@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@slashmo
Copy link
Author

slashmo commented Dec 5, 2020

I chatted with @ktoso earlier to discuss the manual context propagation, and we agreed that we probably shouldn't deprecate the "old" API accepting a Logger for each request overload, as we don't want to push users too much into the direction of manual context passing because that's ideally not necessary once the mentioned language changes have been made: https://github.com/apple/swift-distributed-tracing#important-note-on-adoption

@slashmo slashmo marked this pull request as draft December 5, 2020 17:11
Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since technically we're 0.1 and something may change... how do we want to tackle adoption here.

I was thinking to kick off a branch like tracing for now, so we can polish up there and once we're all confident merge into mainline? We could also tag those tracing releases, they'd follow normal releases e.g. 1.2.2-tracing.

I don't really expect anything breaking in the core APIs but the open telemetry support which we may want to use here could still fluctuate a little bit until they're final hmmm...

Sources/AsyncHTTPClient/HTTPHeadersInjector.swift Outdated Show resolved Hide resolved
Sources/AsyncHTTPClient/HTTPClient.swift Outdated Show resolved Hide resolved
Package.swift Outdated
],
targets: [
.target(
name: "AsyncHTTPClient",
dependencies: ["NIO", "NIOHTTP1", "NIOSSL", "NIOConcurrencyHelpers", "NIOHTTPCompression",
"NIOFoundationCompat", "NIOTransportServices", "Logging"]
"NIOFoundationCompat", "NIOTransportServices", "Logging", "Instrumentation"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we right away go with Tracing and do the full thing in a single PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my intention. I've added a checklist to the PR including creating a Span. I first wanted to get the instrumentation part down and then continue with tracing, but all inside this PR.

@slashmo slashmo force-pushed the feature/tracing branch 2 times, most recently from 047fbb0 to 87085d9 Compare December 7, 2020 17:05
@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

@swift-server-bot add to whitelist

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

I'd like to punt this to a side-branch for iterative development if we can.

@slashmo
Copy link
Author

slashmo commented Dec 8, 2020

I'd like to punt this to a side-branch for iterative development if we can.
@Lukasa

Sure, sounds like a good approach. I can change the target branch once it's created.

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

I've opened up the tracing-development branch.

@slashmo slashmo changed the base branch from main to tracing-development December 8, 2020 09:21
@slashmo
Copy link
Author

slashmo commented Dec 8, 2020

@ktoso The CI seems to fail because the Baggage repo cannot be cloned through the Git URL. Should we pin Tracing to 0.1.1 here in order to get the fix? (apple/swift-distributed-tracing/pull/25)

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

No, we need to tag a 0.1.1, I'll do that in a moment.

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

0.1.1. tagged, please depend on that.

Thanks Cory for the development branch, sounds good 👍

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

@swift-server-bot test this please

@ktoso
Copy link
Contributor

ktoso commented Dec 8, 2020

Can drafts get CI validation? 🤔

@Lukasa
Copy link
Collaborator

Lukasa commented Dec 8, 2020

Yes, they can: I think the CI isn't targeting that branch at the moment.

0xpablo and others added 6 commits January 5, 2021 11:01
Motivation:

Currently when either we or the server send Connection: close, we
correctly do not return that connection to the pool. However, we rely on
the server actually performing the connection closure: we never call
close() ourselves. This is unnecessarily optimistic: a server may
absolutely fail to close this connection. To protect our own file
descriptors, we should make sure that any connection we do not return
the pool is closed.

Modifications:

If we think a connection is closing when we release it, we now call
close() on it defensively.

Result:

We no longer leak connections when the server fails to close them.

Fixes swift-server#324.
Motivation:

Flaky tests are bad.

This test is flaky because the server closes the connection immediately
upon channelActive. In practice this can mean that the handshake never
even gets a chance to start: by the time the SSLHandler ends up
in the pipeline the connection is already dead. Heck, by the time we
attempt to complete the connection the connection might be dead.

Modifications:

- Change the shutdown to be on first read.
- Remove the disabled autoRead.
- Change the expected NIOTS failure mode to connectTimeout,
    which is how this manifests in NIOTS.

Result:

Test is no longer flaky.
Adding the product dependency to the target by name only produces an error in Xcode 12.4. Instead, the product dependency should be given as a `.product`. Updated the README with the new format, so that new user's won't stumble over this.
artemredkin and others added 5 commits March 3, 2021 17:10
Motivation:
When we stream request body, current implementation expects that body
will finish streaming _before_ we start to receive response body parts.
This is not correct, reponse body parts can start to arrive before we
finish sending the request.

Modifications:
 - Simplifies state machine, we only case about request being fully sent
   to prevent sending body parts after .end, but response state machine
   is mostly ignored and correct flow will be handled by NIOHTTP1
   pipeline
 - Adds HTTPEchoHandler, that replies to each response body part
 - Adds bi-directional streaming test

Result:
Closes swift-server#327
Motivation:

HTTPResponseAggregator attempts to build a single, complete response
object. This necessarily means it loads the entire response payload into
memory. It wants to provide this payload as a single contiguous buffer
of data, and it does so by aggregating the data into a single contiguous
buffer as it goes.

Because ByteBuffer does exponential reallocation, the cost of doing this
should be amortised constant-time, even though we do have to copy some
data sometimes. However, if this operation triggers a copy-on-write then
the operation will become quadratic. For large buffers this will rapidly
come to dominate the runtime.

Unfortunately in at least Swift 5.3 Swift cannot safely see that during
the body stanza the state variable is dead. Swift is not necessarily
wrong about this: there's a cross-module call to ByteBuffer.writeBuffer
in place and Swift cannot easily prove that that call will not lead to a
re-entrant access of the `HTTPResponseAggregator` object. For this
reason, during the call to `didReceiveBodyPart` there will be two copies
of the body buffer alive, and so the write will CoW.

This quadratic behaviour is a nasty performance trap that can become
highly apparent even at quite small body sizes.

Modifications:

While Swift can't prove that the `self.state` variable is dead, we can!
To that end, we temporarily set it to a different value that does not
store the buffer in question. This will force Swift to drop the ref on
the buffer, making it uniquely owned and avoiding the CoW.

Sadly, it's extremely difficult to test for "does not CoW", so this
patch does not currently come with any tests. I have experimentally
verified the behaviour.

Result:

No copy-on-write in the HTTPResponseAggregator during body aggregation.
Motivation:

There is an awkward timing window in the TLSEventsHandler flow where it
is possible for the NIOSSLClientHandler to fail the handshake on
handlerAdded. If this happens, the TLSEventsHandler will not be in the
pipeline, and so the handshake failure error will be lost and we'll get
a generic one instead.

This window can be resolved without performance penalty if we use the
new synchronous pipeline operations view to add the two handlers
backwards. If this is done then we can ensure that the TLSEventsHandler
is always in the pipeline before the NIOSSLClientHandler, and so there
is no risk of event loss.

While I'm here, AHC does a lot of pipeline modification. This has led to
lengthy future chains with lots of event loop hops for no particularly
good reason. I've therefore replaced all pipeline operations with their
synchronous counterparts. All but one sequence was happening on the
correct event loop, and for the one that may not I've added a fast-path
dispatch that should tolerate being on the wrong one. The result is
cleaner, more linear code that also reduces the allocations and event
loop hops.

Modifications:

- Use synchronous pipeline operations everywhere
- Change the order of adding TLSEventsHandler and NIOSSLClientHandler

Result:

Faster, safer, fewer timing windows.
Motivation:

AsyncHTTPClient attempts to avoid the problem of Happy Eyeballs making
it hard to know which Channel will be returned by only inserting the
TLSEventsHandler upon completion of the connect promise. Unfortunately,
as this may involve event loop hops, there are some awkward timing
windows in play where the connect may complete before this handler gets
added.

We should remove that timing window by ensuring that all channels always
have this handler in place, and instead of trying to wait until we know
which Channel will win, we can find the TLSEventsHandler that belongs to
the winning channel after the fact.

Modifications:

- TLSEventsHandler no longer removes itself from the pipeline or throws
  away its promise.
- makeHTTP1Channel now searches for the TLSEventsHandler from the
  pipeline that was created and is also responsible for removing it.
- Better sanity checking that the proxy TLS case does not overlap with
  the connection-level TLS case.

Results:

Further shrinking windows for pipeline management issues.
Motivation:

Users of the HTTPClientResponseDelegate expect that the event loop
futures returned from didReceiveHead and didReceiveBodyPart can be used
to exert backpressure. To be fair to them, they somewhat can. However,
the TaskHandler has a bit of a misunderstanding about how NIO
backpressure works, and does not correctly manage the buffer of inbound
data.

The result of this misunderstanding is that multiple calls to
didReceiveBodyPart and didReceiveHead can be outstanding at once. This
would likely lead to severe bugs in most delegates, as they do not
expect it.

We should make things work the way delegate implementers believe it
works.

Modifications:

- Added a buffer to the TaskHandler to avoid delivering data that the
   delegate is not ready for.
- Added a new "pending close" state that keeps track of a state where
   the TaskHandler has received .end but not yet delivered it to the
   delegate. This allows better error management.
- Added some more tests.
- Documented our backpressure commitments.

Result:

Better respect for backpressure.

Resolves swift-server#348
Davidde94 and others added 10 commits April 27, 2021 15:43
motivation: 5.4 is out!

changes:
* update Dockerfile handling of rubygems
* add docker compose setup for ubuntu 20.04 and 5.4 toolchain
motivation: test with nightly toolchain

changes: add docker compose setup for ubuntu 20.04 and nightly toolchain
Adds support for request-specific TLS configuration:
Request(url: "https://webserver.com", tlsConfiguration: .forClient())
Motivation:

At the moment, AHC assumes that creating a `NIOSSLContext` is both cheap
and doesn't block.

Neither of these two assumptions are true.

To create a `NIOSSLContext`, BoringSSL will have to read a lot of
certificates in the trust store (on disk) which require a lot of ASN1
parsing and much much more.

On my Ubuntu test machine, creating one `NIOSSLContext` is about 27,000
allocations!!! To make it worse, AHC allocates a fresh `NIOSSLContext`
for _every single connection_, whether HTTP or HTTPS. Yes, correct.

Modification:

- Cache NIOSSLContexts per TLSConfiguration in a LRU cache
- Don't get an NIOSSLContext for HTTP (plain text) connections

Result:

New connections should be _much_ faster in general assuming that you're
not using a different TLSConfiguration for every connection.
…ver#350)

This PR is a result of another swift-server#321.

In that PR I provided an alternative structure to TLSConfiguration for when connecting with Transport Services.

In this one I construct the NWProtocolTLS.Options from TLSConfiguration. It does mean a little more work for whenever we make a connection, but having spoken to @weissi he doesn't seem to think that is an issue.

Also there is no method to create a SecIdentity at the moment. We need to generate a pkcs#12 from the certificate chain and private key, which can then be used to create the SecIdentity.

This should resolve swift-server#292
…rver#368)

Motivation:

In the vast majority of cases, we'll only ever create one and only one
`NIOSSLContext`. It's therefore wasteful to keep around a whole thread
doing nothing just for that. A `DispatchQueue` is absolutely fine here.

Modification:

Run the `NIOSSLContext` creation on a `DispatchQueue` instead.

Result:

Fewer threads hanging around.
Motivation:

In order to instrument distributed systems, metadata such as trace ids
must be propagated across network boundaries.
As HTTPClient operates at one such boundary, it should take care of
injecting metadata into HTTP headers automatically using the configured
instrument.

Modifications:

HTTPClient gains new method overloads accepting LoggingContext.

Result:

- New HTTPClient method overloads accepting LoggingContext
- Existing overloads accepting Logger construct a DefaultLoggingContext
- Existing methods that neither take Logger nor LoggingContext construct
  a DefaultLoggingContext
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.