-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(profiling): add a functions summaries dataset #6293
base: master
Are you sure you want to change the base?
Conversation
table_name=local_table_name, | ||
columns=columns, | ||
engine=table_engines.MergeTree( | ||
order_by="(project_id, end_timestamp)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should think about this ORDER BY
a little more as one of the primary use cases will be to find examples by function (probably by fingerprint) so we'd want to optimize for these cases. Not sure what other queries this will need to support yet but ordering by timestamp is not optimizing for the main use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for searching examples by function I think it makes sense to have fingerprint
in there.
I'd still keep the timestamp as well though, but after the fingerprint in that case.
Something like: order_by="(project_id, fingerprint, end_timestamp)
.
For the transaction-based profiling the transaction_name
would probably need to be considered, but transaction_name
is optional as we won't have that for continuous profiling and so can't be part of the primary key.
If we think that for the transaction-based use case, for perf. reasons, we really need transaction_name
, I wonder if it'd make sense to define that as required and just set it to a default hard-coded value for continuous profiling.
snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py
Outdated
Show resolved
Hide resolved
snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py
Outdated
Show resolved
Hide resolved
snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py
Outdated
Show resolved
Hide resolved
* add start_timestamp * add fingerprint to the order by * remove one of the ID columns and add is_continuous
snuba/snuba_migrations/functions_summaries/0001_functions_summaries_create_table.py
Outdated
Show resolved
Hide resolved
This PR has a migration; here is the generated SQL -- start migrations
-- forward migration functions_examples : 0001_functions_examples_create_table
Local op: CREATE TABLE IF NOT EXISTS functions_examples_local (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE ReplicatedMergeTree('/clickhouse/tables/functions_examples/{shard}/default/functions_examples_local', '{replica}') ORDER BY (project_id, fingerprint, start_timestamp) PARTITION BY (retention_days, toMonday(end_timestamp)) TTL end_timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS functions_examples_dist (project_id UInt64, profile_id UUID, is_continuous UInt8, transaction_name LowCardinality(Nullable(String)), fingerprint UInt64, name String, package String, thread_id String, min Float64, max Float64, sum Float64, count UInt64, start_timestamp DateTime, end_timestamp DateTime, platform LowCardinality(String), environment LowCardinality(Nullable(String)), release LowCardinality(Nullable(String)), retention_days UInt16) ENGINE Distributed(`cluster_one_sh`, default, functions_examples_local);
-- end forward migration functions_examples : 0001_functions_examples_create_table
-- backward migration functions_examples : 0001_functions_examples_create_table
Local op: DROP TABLE IF EXISTS functions_examples_local;
Distributed op: DROP TABLE IF EXISTS functions_examples_dist;
-- end backward migration functions_examples : 0001_functions_examples_create_table |
For both transaction-based and continuous profiling, we'd like to add a way to provide examples of profiles which generated some metrics.
The idea is, when we're visualizing profile functions metrics, the user, for example while browsing the slowest functions, would be able to click on them and see a list of (clickable) profiles associated with those values.
For transaction-based profiling we used to store the example along with the metrics directly in the
functions
dataset.Since we have deprecated the
functions
dataset and moved our metrics to the generic metrics dataset, we now need a solution for storing these examples, both for the transaction-based and the continuous profiling.This PR would add a new dataset called
functions_summaries
that serves this purpose.Trying to keep the PR small. Processor and consumer will be added in separate PRs