Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat: Evaluate prompt changes for smart apply #16

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

umpox
Copy link

@umpox umpox commented Aug 20, 2024

Description

Evaluating the smart-apply prompt changes.

This prompt change adds the following prefix to chat messages:

When generating fenced code blocks in Markdown, ensure you include the full file path in the tag. The structure should be \`\`\`language:path/to/file\n\`\`\`. You should only do this when generating a code block, the user does not need to be made aware of this in any other way.

The benefit of this is that it allows us to link code blocks with files in the users' file system

@umpox umpox marked this pull request as ready for review August 20, 2024 12:50
@umpox
Copy link
Author

umpox commented Aug 20, 2024

cc @abeatrix @jtibshirani Here's the eval results from the smart apply prompt changes.

I made a small tweak to the prompt here sourcegraph/cody#5250 (comment)

umpox added a commit to sourcegraph/cody that referenced this pull request Aug 20, 2024
## Description

Enables the "smart apply" feature by default, removing the feature flag.

Chat Eval: sourcegraph/cody-leaderboard#16

Looker Dashboard: https://sourcegraph.looker.com/looks/2016


## Test plan

Test Smart Apply on:
- [x] PLG
- [x] Enterprise (with different model variants)
- [X] JetBrains (with WebView enabled)
- Note: Further testing will be done by the JetBrains team when the
WebView is fully implemented

<!-- Required. See
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles.
-->
Copy link
Contributor

@abeatrix abeatrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this!

@@ -2,9 +2,24 @@
# CHAT
# CHAT_QUESTION what is squirrel?

# CHAT_REPLY Based on the provided code context, Squirrel is an interface that represents a code intelligence API. It has a single method `symbolInformation` that takes a `file` string as input and returns an object of type `CodeIntelligence`, which contains an array of strings called `symbols`.
# CHAT_REPLY ```typescript:mocked/enhanced_context/squirrel.ts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anything change here besides your prompt? Like how we pass context? Compared to before, it looks like we're much more likely to return quoted code snippets in the response. This response now contains code snippets, as well as several subsequent ones. (That's not necessarily bad, just trying to understand!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because we're including this at the end of a transcript, kinda similar to how we append "Answer positively without apologizing."

I'm experimenting with moving this to the system prompt here: sourcegraph/cody#5290

I'll run a few more evals and update this branch

Copy link
Author

@umpox umpox Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've updated the prompt and eval here and, although this response is pretty similar, I think it's much better at providing code snippets when they are relevant.

There were a bunch of issues with the previous prompt where the LLM would prioritise returning a code snippet unnecessarily, it seems much better now (still going through the diff though)

# CHAT_REPLY
# CHAT_REPLY ```regex:mocked/enhanced_context/regex.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another example where we're inventing a new file/ code snippet. It's not super helpful in this case, and makes the answer more verbose 🤔

# CHAT_REPLY
# CHAT_REPLY The eval.go file contains the core search functionality of Zoekt. It defines various data structures and functions for simplifying and evaluating search queries, gathering matches from indexed documents, scoring and ranking search results, and handling different types of queries (e.g., regular expressions, substring matches, branch filters).
# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This file seems to be a TypeScript file, but the provided context is incomplete, so it's difficult to determine its purpose.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the LLM is likely keener to mention files by their full file path with this change. I don't think it's necessarily a bad thing, as ideally we should link these files up in the chat response. (e.g. I know Copilot does this)

@umpox
Copy link
Author

umpox commented Aug 22, 2024

@jtibshirani Thanks for looking through this! I've made some more iterations through this PR.

I think it is much better with this new prompt, still reliable producing file paths but does not put as much of a priority on producing a code block as the previous one did.

That being said, I did notice some things:

  • The LLM seems to have a tendency to refer to files by the full file path now. This is slightly more verbose, but not necessarily a bad thing as we should aim to have this information so we can allow the user to click these paths as links in chat.
  • We do still seem to get slightly more code snippets and I think the LLM has a tendency to break up code into different snippets. I didn't notice any responses where this seemed out of place or worse though (apart from the next point)
  • I can see a regression in certain models (e.g. Gemini), where the LLM seems to want to regurgitate the file contents rather than immediately answer a question on it. Looking into if I can tweak anything to improve this, did not notice this in the more powerful models - Seems improved now, I think? Made some minor prompt improvements to make the instruction more concise now it is in a system prompt

umpox added a commit to sourcegraph/cody that referenced this pull request Aug 23, 2024
…#5290)

## Description

This PR improves the smart apply prompt tweak so that:
- We use the system prompt when available
- We use the same desired Markdown syntax when _we_ include codebase
context, which helps steer the LLM onto the right path.

I have ran an eval on this here:
sourcegraph/cody-leaderboard#16


## Test plan

- [x] Tested creating code blocks on all major models

<!-- Required. See
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles.
-->

---------

Co-authored-by: Beatrix <[email protected]>
umpox added a commit to sourcegraph/cody that referenced this pull request Aug 23, 2024
…#5290)

## Description

This PR improves the smart apply prompt tweak so that:
- We use the system prompt when available
- We use the same desired Markdown syntax when _we_ include codebase
context, which helps steer the LLM onto the right path.

I have ran an eval on this here:
sourcegraph/cody-leaderboard#16


## Test plan

- [x] Tested creating code blocks on all major models

<!-- Required. See
https://docs-legacy.sourcegraph.com/dev/background-information/testing_principles.
-->

---------

Co-authored-by: Beatrix <[email protected]>
Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest results look good! I noticed a couple places where we surprisingly regressed. But I am not super concerned, because Claude Sonnet 3.5 still looks good for these questions.

# CHAT_REPLY The README also mentions that this is a fork of the original Google Zoekt repository (https://github.com/google/zoekt) by Sourcegraph (https://github.com/sourcegraph/zoekt). It outlines some of the differences between this fork and the original, such as the addition of a `zoekt-sourcegraph-indexserver` command, exposing the API via `keegancsmith/rpc`, support for searching across branches/repos, and returning symbol/ctag data in results.
# CHAT_REPLY
# CHAT_REPLY However, the actual source code files are not provided in the context. The README only gives an overview of the project and instructions for using the various components.
# CHAT_REPLY Based on the provided README file, I do not have direct access to any code files. The README provides an overview of the Zoekt code search engine project and instructions for downloading, indexing, searching, and running the Zoekt services. However, it does not include the actual source code files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight regression, but seems okay. (Weird that this would change in the first place 🤔)

# CHAT_REPLY This regex pattern would match strings like `"nb"`, `"nc"`, `"nd"`, `"nf"`, etc.
# CHAT_REPLY
# CHAT_REPLY Without more context about the intended use case, it's difficult to determine the exact meaning of the provided regex pattern.
# CHAT_REPLY Unfortunately, there is no regex present in the provided code context. The files contain TypeScript interface definitions, Go code related to rate limiting and concurrency handling, and some comments explaining the purpose of the code. Without a specific regex pattern provided, I cannot analyze its functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a regression. This question tests whether we can answer general programming questions even if there is irrelevant context. Now, we refuse to answer the question. I wonder why this changed!

# CHAT_REPLY
# CHAT_REPLY The README.md file provides an overview of Zoekt, including instructions for downloading, indexing, and searching code repositories using Zoekt. It also mentions that Zoekt supports symbol search and can be integrated with a web interface or JSON API.
# CHAT_REPLY 1. `mocked/enhanced_context/vscode/src/external-services.ts`: This appears to be a TypeScript file, but the provided context doesn't give enough information to determine its purpose.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're including code blocks more often, but the examples I'm seeing are totally sensible and feel like an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants