Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snuba admin's query tracing to connect to right storage and query nodes #6268

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

onkar
Copy link
Member

@onkar onkar commented Sep 5, 2024

Snuba admin's tracing tool shows query trace output. In addition to that, it will now show the profile events data. This PR parses the trace output and builds a dict of node name to query id. Then system query is executed on those nodes to get the profile events for the corresponding query id.

Before connecting to the nodes, a socket connection is attempted to see if the hostname resolves. If it does not, then the node is assumed to be 127.0.0.1. This is required because while running locally, snuba admin process will connect to clickhouse container. But the container's hostname does not resolve, defaulting to 127.0.0.1 as the hostname.

To make sure hostnames in the CI jobs resolve, the CI docker-compose file is updated.

@onkar onkar requested a review from a team as a code owner September 5, 2024 02:10
@onkar onkar added do not merge WIP Work in progress labels Sep 5, 2024
@onkar onkar removed WIP Work in progress do not merge labels Sep 6, 2024
Copy link

codecov bot commented Sep 6, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
472 1 471 1
View the top 1 failed tests by shortest run time
tests.admin.test_api test_query_trace
Stack Traces | 0.192s run time
Traceback (most recent call last):
  File ".../tests/admin/test_api.py", line 258, in test_query_trace
    assert response.status_code == 200
AssertionError: assert 500 == 200
 +  where 500 = <WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]>.status_code

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

@onkar onkar requested a review from a team as a code owner September 7, 2024 01:20
@onkar onkar removed the do not merge label Sep 7, 2024
Copy link
Member

@asottile-sentry asottile-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and actually this does not seem like a good idea at all. what is the problem you're solving? having everyone edit /etc/hosts on their mac is not a good solution and is going to cause endless pain for developers

@onkar
Copy link
Member Author

onkar commented Sep 9, 2024

and actually this does not seem like a good idea at all. what is the problem you're solving? having everyone edit /etc/hosts on their mac is not a good solution and is going to cause endless pain for developers

@asottile-sentry thanks for the review. Let's take step back to the larger problem. I will edit the description if this explanation makes sense.

At Sentry, snuba developers use a tool called snuba admin. It has a query tracing section that allows snuba developers to trace how a query is executed in clickhouse. We wanted to enhance that tool to also show clickhouse's profile events. Without this enhancement, snuba developers have to do these manual steps to get profile events:

  1. Manually copy query ids from the raw trace output.
  2. Navigate to system query dashboard.
  3. Type in a different query to get profile events and replace query-id (one at a time) with the strings gathered in step 1.
  4. As if this isn't enough, choose various hosts from the dropdown in systems query dashboard and repeat step 3.

You can see that these manual steps are error-prone and there is scope to automate them.

The query trace output has a filed called summarized_trace_output. Here is a sample of it:

{
    "module": "snuba.admin.views",
    "event": "summarized_trace_output = TracingSummary(query_summaries={'ebaf1a40d262': QuerySummary(node_name='ebaf1a40d262', is_distributed=True, query_id='fc3e8017-51ce-47c0-bb0f-209470fb70c8', execute_summaries=[ExecuteSummary(rows_read=1, memory_size='4.01 KiB', seconds=0.007365, rows_per_second=135.77732518669382, bytes_per_second='544.17 KiB')], select_summaries=None, index_summaries=None, stream_summaries=None, aggregation_summaries=None, sorting_summaries=None)})",
    "severity": "info",
    "user_ip": "127.0.0.1",
    "endpoint": "clickhouse_trace_query",
    "timestamp": "2024-09-05T03:47:13.227537Z"
}

The node_name='ebaf1a40d262' is the container id/hostname of the clickhouse container that snuba admin tool wants to connect to run system queries to get profile events. It has these ports forwarded:
127.0.0.1:8123->8123/tcp, 127.0.0.1:9000->9000/tcp, 127.0.0.1:9009->9009/tcp. In order to reliably connect to this host, we need it to have a proper name that we control (not a container id that docker runtime chooses). In order to resolve that hostname, we need entry in /etc/hosts file. Without name resolution, running system query gives an error like "Host clickhouse.dev.local and port 9000 are not valid."

To be clear, not every developer needs this entry in /etc/hosts. Only those that want to fix issues/enhance snuba admin while running it locally. In the CI pipeline, the configuration is different and so I made changes to the CI clickhouse hostname.

@asottile-sentry
Copy link
Member

wouldn't it be simpler, and a better experience, to write a tool which does all of the manual translation automatically (including replacing with the appropriate localhost / loopback addresses? -- we already utilize host.docker.internal for example)

@onkar
Copy link
Member Author

onkar commented Sep 11, 2024

wouldn't it be simpler, and a better experience, to write a tool which does all of the manual translation automatically (including replacing with the appropriate localhost / loopback addresses? -- we already utilize host.docker.internal for example)

I resolved the issues based on the feedback in our call. No network settings need to be changed to make this work now.

snuba/admin/views.py Outdated Show resolved Hide resolved
@@ -85,6 +92,9 @@ services:
depends_on:
- zookeeper
image: "${CLICKHOUSE_IMAGE:-ghcr.io/getsentry/image-mirror-altinity-clickhouse-server:23.3.19.33.altinitystable}"
hostname: clickhouse-query.local
extra_hosts:
- "clickhouse-query.local:127.0.0.1" # Add entry to /etc/hosts file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rewording the comment to explain why this is desirable, rather than restating what the code does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants