-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Diagnostics.ActivitySource support for observability #83
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,70 @@ | |||
namespace FSharp.AWS.DynamoDB | |||
|
|||
module Diagnostics = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also https://github.com/jet/equinox/blob/master/src/Equinox/Tracing.fs https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/Tracing.fs for other helper layout ideas
e.g. see how https://github.com/jet/equinox/blob/master/src/Equinox.MessageDb/MessageDb.fs#L251-L252 puts logic more in your face while still staying relatvely compact
let! ct = Async.CancellationToken | ||
let! response = client.UpdateTableAsync(request, ct) |> Async.AwaitTaskCorrect | ||
use activity = Diagnostics.startTableActivity request.TableName "UpdateTable" [] | ||
let! response = client.UpdateTableAsync(request, ct) |> Task.addActivityException activity |> Async.AwaitTaskCorrect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awaittaskcorrect will do similar unwrapping to what your helper does internally?
Hm, no idea what to do with that fact though - I guess. I'm more wondering why FsAwsDdb bears responsblity for trapping and logging exceptions. Is there some reason that this is falls on this layer (note I have not done one of these integrations personally and no useful research)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most applications will be adding the exception to the parent span as well, but it would usually be added to the span where it originated - the application may catch & do something else, but that doesn’t change that this span failed. We certainly want to intercept & set activity status to Error at the very least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough I guess. @nordfjord should Equinox.MessagDb etc be following suit if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be catching exceptions and adding them to the spans in messagedb. But I also have been procrastinating homogenising the tracing between the js and f# versions
let req = TransactWriteItemsRequest(ReturnConsumedCapacity = returnConsumedCapacity, TransactItems = transactionItems) | ||
clientRequestToken |> Option.iter (fun x -> req.ClientRequestToken <- x) | ||
let! ct = Async.CancellationToken | ||
let! response = client.TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect | ||
if response.HttpStatusCode <> HttpStatusCode.OK then | ||
failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant fail this early obv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not 100% sure this will ever be hit - I’ll have to dig into the SDK code and see if I can work out what it does. Even if it did return a response on a 500 or 503 I’d expect the entire response object would be invalid & we’d get an NRE trying to access ConsumedCapacity
|> Task.addActivityException activity | ||
|> Async.AwaitTaskCorrect | ||
if response.HttpStatusCode <> HttpStatusCode.OK then | ||
failwithf "BatchWriteItem deletion request returned error %O" response.HttpStatusCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if failed, want trace
|
||
let addActivityCapacity (capacity: ConsumedCapacity seq) (activity: Activity) = | ||
if notNull activity then | ||
let value = capacity |> Seq.choose (function | null -> None | c -> Some (JsonSerializer.Serialize c)) |> Seq.toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fels wrong; surely there's a generic dictionary shape or something? or maybe it can be flattened into an entry per table, or... anything?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t like it either, but the semantic conventions specifically define it like this. It would make more sense to me to use dynamic attribute keys - aws.dynamodb.consumed_capacity.{index_name}.write_capacity
etc - other db traces use this for query params & the like. I can’t imagine many observability platforms go to the trouble of parsing JSON DynamoDB consumed capacity and doing anything useful with it.
I’m just as happy to leave it out, and maybe add metrics instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm - I think iterative tweaks make sense for this sort of work anyway - no subsittute for using it IRL to make you realise what works and doesnt
"FSharp.AWS.DynamoDB"
. This is roughly in line with the published semantic conventionsSystem.Diagnostics.DiagnosticSource
- have seen reports that this won’t build on somenetstandard2.0
platforms (Discussion: netstandard2.0 packages don't compile for out-of-support runtimes open-telemetry/opentelemetry-dotnet#3448). I’ve specified 6.0.0 which should work but may cause version conflicts(?)System.Text.Json
, purely for (the questionable utility of) tagging consumed capacity as an array of JSON strings as recommended. I’m unsure how to properly add this for netstandard libs.db.client.operation.duration
which will need a stopwatch around all the client calls. Recording consumed capacity as a metric is probably the more useful value.