Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(host_metrics source): modify the network collector to expose TCP stats #22057

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

aryan9600
Copy link

@aryan9600 aryan9600 commented Dec 18, 2024

Summary

Modify the network collector in the host_metrics source to expose information about the systems's TCP stack. It now exposes three metrics:

  • network_tcp_connections_total: The total number of TCP connections. It includes the state of the connection as a tag.
  • network_tcp_tx_queued_bytes_total: The sum of the number of bytes in the send queue across all connections.
  • network_tcp_rx_queued_bytes_total: The sum of the number of bytes in the receive queue across all connections.

These metrics are only available for Linux systems as it uses the netlink subsystem.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

  • Unit tests for the code interacting with netlink and for the code writing to the metrics buffer.
  • Manual testing by running the change locally with the below config file and comparing the output with ss -a -t on both Linux x86 and arm systems.
sources:
  host_metrics:
    type: host_metrics
    collectors:
      - network
sinks:
  console:
    type: console
    inputs:
      - host_metrics
    encoding:
      codec: text

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

closes #21972

@aryan9600 aryan9600 requested review from a team as code owners December 18, 2024 17:43
@bits-bot
Copy link

bits-bot commented Dec 18, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation labels Dec 18, 2024
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aryan9600 ! Exposing these new metrics seem useful. What do you think of putting them under the existing network collector though? Exporting as network_tcp_*.

@aryan9600
Copy link
Author

What do you think of putting them under the existing network collector though?

i wanted to keep it as a separate collector, as these metrics are only available for linux. also, to avoid any changes to the output of users when they upgrade to the newer version with these changes. if its a blocker, then i can move it to the network collector, no issue. let me know :)

@jszwedko
Copy link
Member

What do you think of putting them under the existing network collector though?

i wanted to keep it as a separate collector, as these metrics are only available for linux. also, to avoid any changes to the output of users when they upgrade to the newer version with these changes. if its a blocker, then i can move it to the network collector, no issue. let me know :)

I think we already have a precedent of collectors having OS-specific metrics (e.g. filesystem_used_ratio only exists for Linux) so I think they'd be a good fit under collector even if they only exist for Linux.

@aryan9600 aryan9600 changed the title feat(host_metrics): add a new tcp collector feat(host_metrics): modify the network collector to expose TCP stats Dec 20, 2024
@aryan9600 aryan9600 requested a review from jszwedko December 20, 2024 19:09
@aryan9600 aryan9600 changed the title feat(host_metrics): modify the network collector to expose TCP stats feat(host_metrics source): modify the network collector to expose TCP stats Dec 21, 2024
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aryan9600, thank you for this PR!

I noticed big change in Cargo.lock, I wonder if you can roll that file back and apply individual updates for the newly added crates only.

Add a new `tcp` collector  to the `host_metrics` source to expose
information about the systems's TCP stack. It exposes three metrics:
* `tcp_connections_total`: The total number of TCP connections. It
  includes the `state` of the connection as a tag.
* `tcp_tx_queued_bytes_total`: The sum of the number of bytes in the
   send queue across all connections.
* `tcp_rx_queued_bytes_total`: The sum of the number of bytes in the
  receive queue across all connections.

The collector is only enabled for Linux as it uses the netlink
subsystem.

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
@aryan9600 aryan9600 requested a review from pront January 3, 2025 13:34
@pront pront self-assigned this Jan 3, 2025
@aryan9600
Copy link
Author

a problem with having these TCP metrics inside the network collector is that the latter allows to filter the output metrics using network devices, but the former doesn't support this filtering. this is due to the fact that netlink doesn't return the network interface in the response for SOCK_DIAG messages. we could modify the tests and docs to accommodate for this or create a new collector for these metrics. open to any other suggestions.
cc: @pront @jszwedko

@pront
Copy link
Member

pront commented Jan 6, 2025

Hi @aryan9600, one test is failing. I will take another look at this PR once that is fixed. You can run any test locally with cargo test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a collector to track TCP connections in the host_metrics source
5 participants