Chat: Evaluate prompt changes for smart apply #16

umpox · 2024-08-20T08:08:01Z

Description

Evaluating the smart-apply prompt changes.

This prompt change adds the following prefix to chat messages:

When generating fenced code blocks in Markdown, ensure you include the full file path in the tag. The structure should be \`\`\`language:path/to/file\n\`\`\`. You should only do this when generating a code block, the user does not need to be made aware of this in any other way.

The benefit of this is that it allows us to link code blocks with files in the users' file system

umpox · 2024-08-20T12:56:24Z

cc @abeatrix @jtibshirani Here's the eval results from the smart apply prompt changes.

I made a small tweak to the prompt here sourcegraph/cody#5250 (comment)

## Description Enables the "smart apply" feature by default, removing the feature flag. Chat Eval: sourcegraph/cody-leaderboard#16 Looker Dashboard: https://sourcegraph.looker.com/looks/2016 ## Test plan Test Smart Apply on: - [x] PLG - [x] Enterprise (with different model variants) - [X] JetBrains (with WebView enabled) - Note: Further testing will be done by the JetBrains team when the WebView is fully implemented

abeatrix

thanks for adding this!

jtibshirani · 2024-08-21T13:11:52Z

output/chat/claude-3-sonnet/input/chat/question_002/question.yaml

@@ -2,9 +2,24 @@
 # CHAT
 # CHAT_QUESTION what is squirrel?

-# CHAT_REPLY Based on the provided code context, Squirrel is an interface that represents a code intelligence API. It has a single method `symbolInformation` that takes a `file` string as input and returns an object of type `CodeIntelligence`, which contains an array of strings called `symbols`.
+# CHAT_REPLY ```typescript:mocked/enhanced_context/squirrel.ts


Did anything change here besides your prompt? Like how we pass context? Compared to before, it looks like we're much more likely to return quoted code snippets in the response. This response now contains code snippets, as well as several subsequent ones. (That's not necessarily bad, just trying to understand!)

I think this is because we're including this at the end of a transcript, kinda similar to how we append "Answer positively without apologizing."

I'm experimenting with moving this to the system prompt here: sourcegraph/cody#5290

I'll run a few more evals and update this branch

So I've updated the prompt and eval here and, although this response is pretty similar, I think it's much better at providing code snippets when they are relevant.

There were a bunch of issues with the previous prompt where the LLM would prioritise returning a code snippet unnecessarily, it seems much better now (still going through the diff though)

jtibshirani · 2024-08-21T13:13:11Z

output/chat/claude-3-sonnet/input/chat/question_005/question.yaml

 # CHAT_REPLY 
+# CHAT_REPLY ```regex:mocked/enhanced_context/regex.txt


Here's another example where we're inventing a new file/ code snippet. It's not super helpful in this case, and makes the answer more verbose 🤔

umpox · 2024-08-22T12:24:10Z

output/chat/claude-3-sonnet/input/chat/question_009/question.yaml

 # CHAT_REPLY 
-# CHAT_REPLY The eval.go file contains the core search functionality of Zoekt. It defines various data structures and functions for simplifying and evaluating search queries, gathering matches from indexed documents, scoring and ranking search results, and handling different types of queries (e.g., regular expressions, substring matches, branch filters).
+# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This file seems to be a TypeScript file, but the provided context is incomplete, so it's difficult to determine its purpose.


I think the LLM is likely keener to mention files by their full file path with this change. I don't think it's necessarily a bad thing, as ideally we should link these files up in the chat response. (e.g. I know Copilot does this)

output/chat/gemini-1.5-flash-latest/input/chat/question_018/question.yaml

umpox · 2024-08-22T12:45:19Z

@jtibshirani Thanks for looking through this! I've made some more iterations through this PR.

I think it is much better with this new prompt, still reliable producing file paths but does not put as much of a priority on producing a code block as the previous one did.

That being said, I did notice some things:

The LLM seems to have a tendency to refer to files by the full file path now. This is slightly more verbose, but not necessarily a bad thing as we should aim to have this information so we can allow the user to click these paths as links in chat.
We do still seem to get slightly more code snippets and I think the LLM has a tendency to break up code into different snippets. I didn't notice any responses where this seemed out of place or worse though (apart from the next point)
I can see a regression in certain models (e.g. Gemini), where the LLM seems to want to regurgitate the file contents rather than immediately answer a question on it. Looking into if I can tweak anything to improve this, did not notice this in the more powerful models - Seems improved now, I think? Made some minor prompt improvements to make the instruction more concise now it is in a system prompt

…#5290) ## Description This PR improves the smart apply prompt tweak so that: - We use the system prompt when available - We use the same desired Markdown syntax when _we_ include codebase context, which helps steer the LLM onto the right path. I have ran an eval on this here: sourcegraph/cody-leaderboard#16 ## Test plan - [x] Tested creating code blocks on all major models  --------- Co-authored-by: Beatrix <[email protected]>

jtibshirani

The latest results look good! I noticed a couple places where we surprisingly regressed. But I am not super concerned, because Claude Sonnet 3.5 still looks good for these questions.

jtibshirani · 2024-08-26T20:12:24Z

output/chat/claude-3-sonnet/input/chat/question_003/question.yaml

-# CHAT_REPLY The README also mentions that this is a fork of the original Google Zoekt repository (https://github.com/google/zoekt) by Sourcegraph (https://github.com/sourcegraph/zoekt). It outlines some of the differences between this fork and the original, such as the addition of a `zoekt-sourcegraph-indexserver` command, exposing the API via `keegancsmith/rpc`, support for searching across branches/repos, and returning symbol/ctag data in results.
-# CHAT_REPLY 
-# CHAT_REPLY However, the actual source code files are not provided in the context. The README only gives an overview of the project and instructions for using the various components.
+# CHAT_REPLY Based on the provided README file, I do not have direct access to any code files. The README provides an overview of the Zoekt code search engine project and instructions for downloading, indexing, searching, and running the Zoekt services. However, it does not include the actual source code files.


This is a slight regression, but seems okay. (Weird that this would change in the first place 🤔)

jtibshirani · 2024-08-26T20:13:32Z

output/chat/claude-3-sonnet/input/chat/question_005/question.yaml

-# CHAT_REPLY This regex pattern would match strings like `"nb"`, `"nc"`, `"nd"`, `"nf"`, etc.
-# CHAT_REPLY 
-# CHAT_REPLY Without more context about the intended use case, it's difficult to determine the exact meaning of the provided regex pattern.
+# CHAT_REPLY Unfortunately, there is no regex present in the provided code context. The files contain TypeScript interface definitions, Go code related to rate limiting and concurrency handling, and some comments explaining the purpose of the code. Without a specific regex pattern provided, I cannot analyze its functionality.


This is also a regression. This question tests whether we can answer general programming questions even if there is irrelevant context. Now, we refuse to answer the question. I wonder why this changed!

jtibshirani · 2024-08-26T20:15:26Z

output/chat/claude-3-sonnet/input/chat/question_009/question.yaml

 # CHAT_REPLY 
-# CHAT_REPLY The README.md file provides an overview of Zoekt, including instructions for downloading, indexing, and searching code repositories using Zoekt. It also mentions that Zoekt supports symbol search and can be integrated with a web interface or JSON API.
+# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This appears to be a TypeScript file, but the provided context doesn't give enough information to determine its purpose.


Looks like we're including code blocks more often, but the examples I'm seeing are totally sensible and feel like an improvement.

umpox added 2 commits August 19, 2024 11:27

Cody: Chat eval results with smart apply

75dfd6c

Chat Smart Apply Eval

0f2e18f

umpox mentioned this pull request Aug 20, 2024

Smart Apply: Enable by default sourcegraph/cody#5250

Merged

3 tasks

umpox marked this pull request as ready for review August 20, 2024 12:50

abeatrix approved these changes Aug 21, 2024

View reviewed changes

jtibshirani reviewed Aug 21, 2024

View reviewed changes

umpox mentioned this pull request Aug 22, 2024

Smart Apply: Use system prompt to encourage code blocks being created sourcegraph/cody#5290

Merged

1 task

umpox added 3 commits August 22, 2024 09:36

update

44d1202

update for anthropic change

a8fb691

updated prompt eval

67403cc

umpox commented Aug 22, 2024

View reviewed changes

umpox added 4 commits August 22, 2024 13:46

csvs

6226b55

re-run eval

1f757a6

update prompt

0ac9312

eval

ff4763a

umpox force-pushed the tr/chat-smart-apply-eval branch from 0586f19 to ff4763a Compare August 23, 2024 10:37

jtibshirani approved these changes Aug 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat: Evaluate prompt changes for smart apply #16

Chat: Evaluate prompt changes for smart apply #16

umpox commented Aug 20, 2024 •

edited

Loading

umpox commented Aug 20, 2024

abeatrix left a comment

jtibshirani Aug 21, 2024

umpox Aug 22, 2024

umpox Aug 22, 2024 •

edited

Loading

jtibshirani Aug 21, 2024

umpox Aug 22, 2024

umpox commented Aug 22, 2024 •

edited

Loading

jtibshirani left a comment

jtibshirani Aug 26, 2024

jtibshirani Aug 26, 2024

jtibshirani Aug 26, 2024

		# CHAT_REPLY
		# CHAT_REPLY ```regex:mocked/enhanced_context/regex.txt

Chat: Evaluate prompt changes for smart apply #16

Are you sure you want to change the base?

Chat: Evaluate prompt changes for smart apply #16

Conversation

umpox commented Aug 20, 2024 • edited Loading

Description

umpox commented Aug 20, 2024

abeatrix left a comment

Choose a reason for hiding this comment

jtibshirani Aug 21, 2024

Choose a reason for hiding this comment

umpox Aug 22, 2024

Choose a reason for hiding this comment

umpox Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jtibshirani Aug 21, 2024

Choose a reason for hiding this comment

umpox Aug 22, 2024

Choose a reason for hiding this comment

umpox commented Aug 22, 2024 • edited Loading

jtibshirani left a comment

Choose a reason for hiding this comment

jtibshirani Aug 26, 2024

Choose a reason for hiding this comment

jtibshirani Aug 26, 2024

Choose a reason for hiding this comment

jtibshirani Aug 26, 2024

Choose a reason for hiding this comment

umpox commented Aug 20, 2024 •

edited

Loading

umpox Aug 22, 2024 •

edited

Loading

umpox commented Aug 22, 2024 •

edited

Loading