Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Diagnostics.ActivitySource support for observability #83

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

samritchie
Copy link
Collaborator

  • Added tracing instrumentation on TableContext methods from ActivitySource "FSharp.AWS.DynamoDB". This is roughly in line with the published semantic conventions
  • Added a dependency on System.Diagnostics.DiagnosticSource - have seen reports that this won’t build on some netstandard2.0 platforms (Discussion: netstandard2.0 packages don't compile for out-of-support runtimes open-telemetry/opentelemetry-dotnet#3448). I’ve specified 6.0.0 which should work but may cause version conflicts(?)
  • Added a dependency on System.Text.Json, purely for (the questionable utility of) tagging consumed capacity as an array of JSON strings as recommended. I’m unsure how to properly add this for netstandard libs.
  • I’ve not attempted to log exceptions from within the TableContext, just the AWS client calls.
  • No Metric support - the only finalised DB metric is db.client.operation.duration which will need a stopwatch around all the client calls. Recording consumed capacity as a metric is probably the more useful value.
  • Started work on adding recorded activities to the test TableFixture but this needs actual assertions.

@samritchie samritchie requested a review from bartelink December 28, 2024 09:18
@@ -0,0 +1,70 @@
namespace FSharp.AWS.DynamoDB

module Diagnostics =
Copy link
Member

@bartelink bartelink Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let! ct = Async.CancellationToken
let! response = client.UpdateTableAsync(request, ct) |> Async.AwaitTaskCorrect
use activity = Diagnostics.startTableActivity request.TableName "UpdateTable" []
let! response = client.UpdateTableAsync(request, ct) |> Task.addActivityException activity |> Async.AwaitTaskCorrect
Copy link
Member

@bartelink bartelink Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awaittaskcorrect will do similar unwrapping to what your helper does internally?

Hm, no idea what to do with that fact though - I guess. I'm more wondering why FsAwsDdb bears responsblity for trapping and logging exceptions. Is there some reason that this is falls on this layer (note I have not done one of these integrations personally and no useful research)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most applications will be adding the exception to the parent span as well, but it would usually be added to the span where it originated - the application may catch & do something else, but that doesn’t change that this span failed. We certainly want to intercept & set activity status to Error at the very least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough I guess. @nordfjord should Equinox.MessagDb etc be following suit if so?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be catching exceptions and adding them to the spans in messagedb. But I also have been procrastinating homogenising the tracing between the js and f# versions

let req = TransactWriteItemsRequest(ReturnConsumedCapacity = returnConsumedCapacity, TransactItems = transactionItems)
clientRequestToken |> Option.iter (fun x -> req.ClientRequestToken <- x)
let! ct = Async.CancellationToken
let! response = client.TransactWriteItemsAsync(req, ct) |> Async.AwaitTaskCorrect
if response.HttpStatusCode <> HttpStatusCode.OK then
failwithf "TransactWriteItems request returned error %O" response.HttpStatusCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant fail this early obv

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not 100% sure this will ever be hit - I’ll have to dig into the SDK code and see if I can work out what it does. Even if it did return a response on a 500 or 503 I’d expect the entire response object would be invalid & we’d get an NRE trying to access ConsumedCapacity

|> Task.addActivityException activity
|> Async.AwaitTaskCorrect
if response.HttpStatusCode <> HttpStatusCode.OK then
failwithf "BatchWriteItem deletion request returned error %O" response.HttpStatusCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if failed, want trace


let addActivityCapacity (capacity: ConsumedCapacity seq) (activity: Activity) =
if notNull activity then
let value = capacity |> Seq.choose (function | null -> None | c -> Some (JsonSerializer.Serialize c)) |> Seq.toArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fels wrong; surely there's a generic dictionary shape or something? or maybe it can be flattened into an entry per table, or... anything?!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t like it either, but the semantic conventions specifically define it like this. It would make more sense to me to use dynamic attribute keys - aws.dynamodb.consumed_capacity.{index_name}.write_capacity etc - other db traces use this for query params & the like. I can’t imagine many observability platforms go to the trouble of parsing JSON DynamoDB consumed capacity and doing anything useful with it.

I’m just as happy to leave it out, and maybe add metrics instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm - I think iterative tweaks make sense for this sort of work anyway - no subsittute for using it IRL to make you realise what works and doesnt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants