Skip to content

Fix extended thinking and allow per-repo context files (AGENTS.md, CLAUDE.md, etc.) to be included as context#2387

Open
avidspartan1 wants to merge 4 commits into
The-PR-Agent:mainfrom
avidspartan1:feat/improvements
Open

Fix extended thinking and allow per-repo context files (AGENTS.md, CLAUDE.md, etc.) to be included as context#2387
avidspartan1 wants to merge 4 commits into
The-PR-Agent:mainfrom
avidspartan1:feat/improvements

Conversation

@avidspartan1
Copy link
Copy Markdown

@avidspartan1 avidspartan1 commented May 11, 2026

Resolves #2357.

I backed out my other PR because it was too many changes (previously included "de-dupe inline comments") in one PR.

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Support Claude model overrides and repository context files in AI prompts

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Add support for overriding Claude extended thinking models via config
• Include repository context files (AGENTS.md, CLAUDE.md, etc.) in AI prompts
• Extend git provider interface with repo file content retrieval capability
• Add comprehensive tests for new features and fix minor code issues
Diagram
flowchart LR
  A["Config Settings"] -->|claude_extended_thinking_models_override| B["LiteLLM Handler"]
  C["Repo Context Files"] -->|repo_context_files config| D["build_repo_context"]
  D -->|fetches files| E["Git Provider"]
  D -->|renders context| F["Instruction Files"]
  F -->|injected into| G["AI Prompts"]
  B -->|uses models| G
Loading

Grey Divider

File Changes

1. pr_agent/algo/__init__.py ✨ Enhancement +10/-1

Expand Claude extended thinking model list

pr_agent/algo/init.py


2. pr_agent/algo/ai_handlers/litellm_ai_handler.py ✨ Enhancement +16/-2

Add config override for Claude extended thinking models

pr_agent/algo/ai_handlers/litellm_ai_handler.py


3. pr_agent/algo/repo_context.py ✨ Enhancement +64/-0

New module for repository context file handling

pr_agent/algo/repo_context.py


View more (13)
4. pr_agent/git_providers/git_provider.py ✨ Enhancement +3/-0

Add abstract method for repo file content retrieval

pr_agent/git_providers/git_provider.py


5. pr_agent/git_providers/github_provider.py ✨ Enhancement +12/-1

Implement repo file content retrieval for GitHub

pr_agent/git_providers/github_provider.py


6. pr_agent/git_providers/gitlab_provider.py 🐞 Bug fix +2/-3

Remove duplicate file lookup condition

pr_agent/git_providers/gitlab_provider.py


7. pr_agent/tools/pr_code_suggestions.py ✨ Enhancement +3/-1

Integrate repository context into code suggestions

pr_agent/tools/pr_code_suggestions.py


8. pr_agent/tools/pr_description.py ✨ Enhancement +2/-0

Integrate repository context into PR descriptions

pr_agent/tools/pr_description.py


9. pr_agent/tools/pr_reviewer.py ✨ Enhancement +2/-0

Integrate repository context into PR reviews

pr_agent/tools/pr_reviewer.py


10. pr_agent/settings/code_suggestions/pr_code_suggestions_prompts.toml ✨ Enhancement +9/-0

Add repository context section to code suggestions prompt

pr_agent/settings/code_suggestions/pr_code_suggestions_prompts.toml


11. pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml ✨ Enhancement +9/-0

Add repository context section to non-decoupled prompt

pr_agent/settings/code_suggestions/pr_code_suggestions_prompts_not_decoupled.toml


12. pr_agent/settings/configuration.toml ⚙️ Configuration changes +6/-0

Add repo context and Claude override configuration options

pr_agent/settings/configuration.toml


13. pr_agent/settings/pr_description_prompts.toml ✨ Enhancement +7/-0

Add repository context section to description prompt

pr_agent/settings/pr_description_prompts.toml


14. pr_agent/settings/pr_reviewer_prompts.toml ✨ Enhancement +9/-0

Add repository context section to review prompt

pr_agent/settings/pr_reviewer_prompts.toml


15. tests/unittest/test_litellm_claude_extended_thinking.py 🧪 Tests +61/-0

Add tests for Claude extended thinking model override

tests/unittest/test_litellm_claude_extended_thinking.py


16. tests/unittest/test_repo_context.py 🧪 Tests +192/-0

Add comprehensive tests for repository context functionality

tests/unittest/test_repo_context.py


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (2) 📎 Requirement gaps (0)

Context used

Grey Divider


Action required

1. Unbounded context process cache 🐞 Bug ☼ Reliability ⭐ New
Description
build_repo_context() stores repo context in a module-level _repo_context_process_cache keyed by PR
URL without TTL/eviction, so long-lived servers can accumulate entries indefinitely and can keep
serving stale instruction files for a PR until process restart.
Code

pr_agent/algo/repo_context.py[R136-140]

+    cache_key = _get_repo_context_cache_key(context_files, max_lines)
+    process_cache_key = _get_repo_context_process_cache_key(git_provider, context_files, max_lines)
+    if process_cache_key is not None and process_cache_key in _repo_context_process_cache:
+        return _repo_context_process_cache[process_cache_key]
+
Evidence
The repo context is cached in a process-global dict and returned early on subsequent calls; since
webhook servers run continuously, this cache will grow with each PR URL and can keep returning old
content for the same PR URL until restart.

pr_agent/algo/repo_context.py[13-14]
pr_agent/algo/repo_context.py[136-140]
pr_agent/algo/repo_context.py[175-185]
pr_agent/servers/github_app.py[423-434]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` uses a module-level dict (`_repo_context_process_cache`) as a process-wide cache keyed by PR URL. Because it has no TTL/max-size eviction, long-lived servers will:
1) grow memory unbounded as more PR URLs are processed, and
2) potentially serve stale instruction-file contents for the same PR URL even if the repo default-branch instruction files change.

## Issue Context
This cache is independent of any single provider instance and persists for the lifetime of the Python process. The repo runs as a long-lived FastAPI/uvicorn app in webhook deployments, so the cache lifetime is effectively “forever”.

## Fix Focus Areas
- pr_agent/algo/repo_context.py[13-14]
- pr_agent/algo/repo_context.py[136-140]
- pr_agent/algo/repo_context.py[175-185]

## How to fix
- Replace `_repo_context_process_cache` with a bounded/evicting structure (e.g., `functools.lru_cache(maxsize=...)`, or a TTL cache).
- If keeping cross-instance caching, include an invalidation strategy (TTL is simplest) so updated repo instruction files eventually refresh.
- Ensure empty-string entries also follow the eviction policy to avoid permanently pinning “missing” results for long-running processes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. repo_context_files type unchecked ✓ Resolved 📘 Rule violation ☼ Reliability
Description
build_repo_context() assumes repo_context_files is an iterable of file paths without
validating/normalizing its container type. If it is misconfigured as a string (common via
env/Dynaconf), it will iterate characters and attempt to fetch bogus paths, causing many useless
provider calls and missing context instead of failing safely with a targeted warning.
Code

pr_agent/algo/repo_context.py[R26-41]

+def build_repo_context(git_provider) -> str:
+    context_files = get_settings().config.get("repo_context_files", [])
+    if not context_files:
+        return ""
+
+    max_lines = get_settings().config.get("repo_context_max_lines", 500)
+    try:
+        max_lines = max(0, int(max_lines))
+    except (TypeError, ValueError):
+        max_lines = 500
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
Evidence
PR Compliance ID 20 requires boundary validation/normalization of configuration inputs, but in
build_repo_context() the code pulls repo_context_files from config and iterates it directly,
only validating that each iterated element is a non-empty string. This means a mis-typed
configuration such as a single string will be treated as an iterable of characters, each character
will pass the per-element string validation, and get_repo_file_content() will be called repeatedly
with incorrect one-character “paths” rather than rejecting the configuration or applying a safe
default.

pr_agent/algo/repo_context.py[26-41]
pr_agent/algo/repo_context.py[26-45]
pr_agent/settings/configuration.toml[5-30]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` reads `repo_context_files` from configuration and immediately iterates it, but does not validate/normalize the container type to ensure it is a list of file paths. When misconfigured as a single string (a common outcome of env/Dynaconf overrides), the loop iterates characters and triggers many bogus `get_repo_file_content()` calls instead of fetching the intended file once or failing safely with a targeted warning.
## Issue Context
Compliance requires validating and normalizing configuration inputs at boundaries, with safe defaults and targeted warnings. Although `repo_context_files` is documented as a list in the default configuration, overrides can easily supply a string; the current implementation only checks each iterated element is a non-empty string, which allows the string-misconfiguration to slip through per-character.
## Fix Focus Areas
- pr_agent/algo/repo_context.py[26-45]
- pr_agent/settings/configuration.toml[5-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Repo context truncates fences ✓ Resolved 🐞 Bug ≡ Correctness
Description
build_repo_context() enforces repo_context_max_lines by slicing the already-rendered lines, which
can cut off mid-<file> / mid-`````markdown fence and make the remainder of the system prompt parse
as a code block or malformed markup.
Code

pr_agent/algo/repo_context.py[R31-64]

+    max_lines = get_settings().config.get("repo_context_max_lines", 500)
+    try:
+        max_lines = max(0, int(max_lines))
+    except (TypeError, ValueError):
+        max_lines = 500
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
+
+        file_path = file_path.strip()
+        try:
+            content = git_provider.get_repo_file_content(file_path)
+        except Exception as e:
+            get_logger().warning(f"Failed to load repo context file: {file_path}", artifact={"error": str(e)})
+            continue
+
+        if not content:
+            get_logger().debug(f"Repo context file is empty or missing: {file_path}")
+            continue
+
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+
+        files[file_path] = str(content).rstrip()
+
+    if not files:
+        return ""
+
+    rendered_lines = render_instruction_files(files).splitlines()
+
+    return "\n".join(rendered_lines[:max_lines]).strip()
Evidence
The implementation slices rendered lines without ensuring structural closure, and the unit test
demonstrates truncation ending at an opening markdown fence with no closing fence or closing tags.

pr_agent/algo/repo_context.py[26-64]
tests/unittest/test_repo_context.py[96-110]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` truncates the rendered instruction-files payload by line count after rendering, which can leave unclosed `<instruction_files>`/`<file>` tags and unclosed `\`\`\`\`\`\`markdown` fences. This can cause the rest of the prompt to be interpreted as part of a code block and materially change model behavior.
### Issue Context
Current behavior is explicitly tested to allow truncation mid-fence, which is unsafe for prompt formatting.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[26-64]
- tests/unittest/test_repo_context.py[96-110]
### Implementation notes
- Build the output incrementally while tracking remaining line budget, and **always** append closing fence + closing tags (even when truncating file content).
- If truncation occurs, add an explicit marker (e.g., `…(truncated)…`) inside the fenced block, then close the fence and tags.
- Update/replace `test_build_repo_context_enforces_total_line_cap` to assert the truncated output still ends with a closed fence and `</instruction_files>`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Invalid repo_context_max_lines silent default 📘 Rule violation ☼ Reliability ⭐ New
Description
repo_context_max_lines is coerced to int, but invalid values fall back to 500 without a
warning, so misconfiguration can silently change prompt size behavior. This violates the requirement
to validate/normalize config inputs at boundaries with targeted warnings/errors.
Code

pr_agent/algo/repo_context.py[R130-134]

+    max_lines = get_settings().config.get("repo_context_max_lines", 500)
+    try:
+        max_lines = max(0, int(max_lines))
+    except (TypeError, ValueError):
+        max_lines = 500
Evidence
PR Compliance ID 18 requires invalid configuration values to be validated and to emit targeted
warnings/errors rather than silently changing behavior. The new code catches invalid
repo_context_max_lines values and silently resets to 500 without any logging, making the
misconfiguration hard to detect.

pr_agent/algo/repo_context.py[130-134]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`repo_context_max_lines` falls back to `500` on invalid values without emitting a warning, which can hide misconfiguration and silently alter runtime behavior.

## Issue Context
Compliance requires configuration inputs to be validated/normalized at boundaries and to emit targeted warnings/errors when invalid while using safe defaults.

## Fix Focus Areas
- pr_agent/algo/repo_context.py[130-134]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Override entries not stripped 📘 Rule violation ☼ Reliability
Description
claude_extended_thinking_models_override values are validated but not normalized (e.g., whitespace
is not stripped), so a config like " claude-sonnet-4-6 " will silently fail the `model in
self.claude_extended_thinking_models` eligibility check. This violates the requirement to normalize
configuration at the boundary and can prevent extended-thinking from enabling despite being
configured.
Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R235-250]

+        # Models that support extended thinking (config override replaces the built-in list when non-empty)
+        override = get_settings().config.get("claude_extended_thinking_models_override", []) or []
+        if override and not isinstance(override, list):
+            get_logger().warning(
+                "Invalid claude_extended_thinking_models_override in config; expected a list of model names. "
+                "Falling back to the built-in Claude extended-thinking model list."
+            )
+            override = []
+        elif override and not all(isinstance(model, str) and model.strip() for model in override):
+            get_logger().warning(
+                "Invalid claude_extended_thinking_models_override in config; "
+                "expected a list of model name strings. "
+                "Falling back to the built-in Claude extended-thinking model list."
+            )
+            override = []
+        self.claude_extended_thinking_models = list(override) if override else CLAUDE_EXTENDED_THINKING_MODELS
Evidence
Rule 21 requires configuration inputs to be validated and normalized at the boundary. The new
override handling validates list/type and checks model.strip() only for non-empty validation, but
then assigns self.claude_extended_thinking_models = list(override) without stripping/normalizing
values, while later eligibility relies on exact string membership.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[235-250]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[517-520]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`claude_extended_thinking_models_override` is validated but not normalized before use. Because eligibility uses exact string membership (`model in self.claude_extended_thinking_models`), leading/trailing whitespace (and other non-normalized forms) can cause the override to appear “enabled” but not actually match any runtime model IDs.
## Issue Context
This is a configuration boundary; per compliance, inputs should be normalized (at least `.strip()`), so operators don’t get surprising behavior from common TOML/env formatting mistakes.
## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[235-250]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Repo context line cap wrong ✓ Resolved 🐞 Bug ≡ Correctness
Description
repo_context_max_lines is documented as a maximum total rendered lines, but
render_instruction_files_with_line_budget() can still output more than that due to unconditional
wrapper/header/footer lines. This can cause prompts to exceed the configured budget and break
operators’ expectations when setting small caps.
Code

pr_agent/algo/repo_context.py[R47-82]

+def render_instruction_files_with_line_budget(files: dict[str, str], max_lines: int) -> str:
+    parts = [
+        INSTRUCTION_FILES_INTRO,
+        "<instruction_files>",
+    ]
+
+    for path, content in files.items():
+        scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root"
+        fence = _get_markdown_fence(content)
+        file_header = [
+            f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">',
+            f"{fence}markdown",
+        ]
+        file_footer = [
+            fence,
+            "</file>",
+            "",
+        ]
+        content_lines = content.rstrip().splitlines()
+        reserved_closing_lines = len(file_header) + len(file_footer) + 1
+        available_content_lines = max_lines - len(parts) - reserved_closing_lines
+
+        parts.extend(file_header)
+        if available_content_lines >= len(content_lines):
+            parts.extend(content_lines)
+        else:
+            if available_content_lines > 1:
+                parts.extend(content_lines[: available_content_lines - 1])
+            parts.append(TRUNCATION_MARKER)
+            parts.extend(file_footer)
+            break
+
+        parts.extend(file_footer)
+
+    parts.append("</instruction_files>")
+    return "\n".join(parts).strip()
Evidence
The config explicitly claims a maximum total line budget, but the renderer always emits wrapper
lines and the unit test demonstrates output that exceeds a max_lines=4 setting, proving the cap is
not enforced as documented.

pr_agent/settings/configuration.toml[25-33]
pr_agent/algo/repo_context.py[47-83]
tests/unittest/test_repo_context.py[191-212]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`repo_context_max_lines` is described as a maximum total number of rendered lines, but the current implementation only limits the *file content* lines and can exceed the cap by a fixed wrapper overhead (intro + `<instruction_files>` + `<file>` header/footer + closing tag). This makes the configured budget unreliable.
### Issue Context
- `repo_context_max_lines` is defined/documented as a total rendered-line cap in configuration defaults.
- `render_instruction_files_with_line_budget()` always emits wrapper lines even when `max_lines` is very small.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[47-82]
- pr_agent/settings/configuration.toml[28-29]
### Implementation notes
- Decide on intended semantics (strict cap vs. best-effort content cap).
- If strict: before adding a file header/footer, check whether the remaining budget can accommodate them and the closing `</instruction_files>`; if not, return an empty string (or a minimal, well-formed wrapper) rather than exceeding the cap.
- Update the configuration comment if semantics remain best-effort (so operators aren’t misled).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (4)
7. Empty context cached on errors 🐞 Bug ☼ Reliability
Description
build_repo_context() caches an empty string when no files are successfully loaded, including cases
where all fetches fail due to exceptions. If the first attempt fails transiently, subsequent calls
with the same config on the same provider instance will never retry and will silently omit repo
context for the rest of that run.
Code

pr_agent/algo/repo_context.py[R118-150]

+    cache_key = _get_repo_context_cache_key(context_files, max_lines)
+    repo_context_cache = getattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, None)
+    if repo_context_cache is None:
+        repo_context_cache = {}
+        setattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, repo_context_cache)
+    if cache_key in repo_context_cache:
+        return repo_context_cache[cache_key]
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
+
+        file_path = file_path.strip()
+        try:
+            content = git_provider.get_repo_file_content(file_path)
+        except Exception as e:
+            get_logger().warning(f"Failed to load repo context file: {file_path}", artifact={"error": str(e)})
+            continue
+
+        if not content:
+            get_logger().debug(f"Repo context file is empty or missing: {file_path}")
+            continue
+
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+
+        files[file_path] = str(content).rstrip()
+
+    if not files:
+        repo_context_cache[cache_key] = ""
+        return ""
Evidence
The function short-circuits on cache hits and stores an empty string when files is empty, while
treating exceptions as skip-and-continue; together this proves a transient all-files failure becomes
a persistent empty result for the provider instance.

pr_agent/algo/repo_context.py[118-154]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` writes `""` into the provider cache when no files were loaded. Because exceptions are handled by skipping the file and continuing, a transient provider/API failure that prevents *all* files from loading results in an empty cached value; later calls won’t retry and will return the cached empty context.
### Issue Context
This matters specifically because the code *intentionally* supports multiple calls (it has a cache) and because provider calls can fail transiently.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[118-154]
### Implementation notes
- Track whether any exception occurred during the fetch loop (e.g., `had_fetch_error = True` in the `except Exception` path).
- If `not files` and `had_fetch_error` is true, return `""` **without caching**, so a later call can retry.
- Alternatively, cache negative results only for clear “missing file” outcomes (requires providers to distinguish 404 vs transient errors), or cache empty results with a short TTL.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Fixed fence can break ✓ Resolved 🐞 Bug ☼ Reliability
Description
repo_context renders instruction files using a constant 5-backtick markdown fence and inserts file
content unescaped, so any instruction file containing the same fence sequence can terminate the
block early and corrupt the intended <instruction_files>/<file> structure in the prompt. This can
make downstream prompt parsing/formatting unreliable and can change how the model interprets
subsequent prompt text.
Code

pr_agent/algo/repo_context.py[R11-27]

+MARKDOWN_FENCE = "`````"
+_unsupported_repo_context_provider_classes = set()
+
+
+def render_instruction_files(files: dict[str, str]) -> str:
+    parts = [
+        INSTRUCTION_FILES_INTRO,
+        "<instruction_files>",
+    ]
+
+    for path, content in files.items():
+        scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root"
+        parts.append(f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">')
+        parts.append(f"{MARKDOWN_FENCE}markdown")
+        parts.append(content.rstrip())
+        parts.append(MARKDOWN_FENCE)
+        parts.append("</file>")
Evidence
repo_context.py defines a fixed fence and inserts raw file content between the opening/closing
fence; prompt templates then include repo_context verbatim, so a fence collision directly affects
the system prompt text structure.

pr_agent/algo/repo_context.py[11-27]
pr_agent/algo/repo_context.py[34-66]
pr_agent/settings/pr_reviewer_prompts.toml[72-79]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`render_instruction_files()` / `render_instruction_files_with_line_budget()` wrap file content with a constant fence (`MARKDOWN_FENCE = "`````"`). If any included repo file contains that same sequence, the fence can be closed early, producing malformed prompt structure.
### Issue Context
Repo-context content is injected verbatim into system prompts via `{{ repo_context }}`. The current formatting relies on fences to keep `<file ...>` wrappers and surrounding text structurally consistent.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[11-68]
### Implementation notes
- Compute a per-file fence that is guaranteed not to appear in the content (e.g., start with 3 backticks and keep increasing length until it’s not a substring of the content).
- Use that computed fence for both the opening and closing fence for that file.
- Apply the same logic in both rendering functions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Repo context fetched repeatedly 🐞 Bug ➹ Performance
Description
build_repo_context() is called in the constructors of pr_reviewer, pr_description, and
pr_code_suggestions, so the same configured repo files can be fetched multiple times from the git
provider in a single run. This adds avoidable latency and increases API/rate-limit pressure for
GitHub/GitLab providers.
Code

pr_agent/tools/pr_reviewer.py[R95-97]

        "extra_instructions": get_settings().pr_reviewer.extra_instructions,
+            "repo_context": build_repo_context(self.git_provider),
        "commit_messages_str": self.git_provider.get_commit_messages(),
Evidence
Each tool constructor populates vars["repo_context"] by calling build_repo_context(), and
build_repo_context() loops over configured files and calls get_repo_file_content per file, so
repeating this across tools repeats the remote fetch work.

pr_agent/tools/pr_reviewer.py[81-104]
pr_agent/tools/pr_description.py[62-72]
pr_agent/tools/pr_code_suggestions.py[61-77]
pr_agent/algo/repo_context.py[71-125]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Multiple tools call `build_repo_context(self.git_provider)` during initialization. Since `build_repo_context()` performs remote fetches (`git_provider.get_repo_file_content(...)` per configured file), running multiple commands in one process can redundantly fetch the same files.
### Issue Context
This PR wires `repo_context` into several tools’ prompt variable dictionaries.
### Fix Focus Areas
- pr_agent/tools/pr_reviewer.py[92-100]
- pr_agent/tools/pr_description.py[64-73]
- pr_agent/tools/pr_code_suggestions.py[61-72]
- pr_agent/algo/repo_context.py[71-129]
### Implementation notes
- Add a simple memoization layer (e.g., attach a `_repo_context_cache` attribute to the `git_provider` instance keyed by `(repo_context_files, repo_context_max_lines)`), or cache inside `build_repo_context()` with an LRU keyed by provider identity + config.
- Ensure cache invalidates when config changes between calls.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. Non-GitHub context unsupported ✓ Resolved 🐞 Bug ⚙ Maintainability
Description
Only GithubProvider overrides get_repo_file_content(); other providers (e.g., GitLab) inherit
GitProvider.get_repo_file_content() which returns an empty string, so repo_context_files is silently
ignored outside GitHub.
Code

pr_agent/git_providers/git_provider.py[R277-278]

+    def get_repo_file_content(self, file_path: str):
+        return ""
Evidence
The base provider supplies a default empty implementation, GitHub has a concrete implementation, and
GitLab lacks an override near its other repo-level methods—so it will use the empty default and
never load configured context files.

pr_agent/git_providers/git_provider.py[269-282]
pr_agent/git_providers/github_provider.py[733-753]
pr_agent/git_providers/gitlab_provider.py[791-801]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` is wired into multiple tools, but repository file fetching is only implemented in `GithubProvider`. For other providers, the base `GitProvider.get_repo_file_content()` returns `""`, so configured repo context never appears and the feature fails silently.
### Issue Context
This is especially confusing when users configure `repo_context_files` on GitLab/Bitbucket/Azure and see no effect.
### Fix Focus Areas
- pr_agent/git_providers/git_provider.py[273-282]
- pr_agent/git_providers/gitlab_provider.py[791-801]
- pr_agent/algo/repo_context.py[26-64]
### Implementation notes
Choose one (or both):
1) Implement `get_repo_file_content()` for other providers (at least GitLab, Bitbucket, AzureDevOps, Local).
2) Detect unsupported providers and log a **warning once** when `repo_context_files` is configured (e.g., if `type(git_provider).get_repo_file_content is GitProvider.get_repo_file_content`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit e93cad2

Results up to commit N/A


🐞 Bugs (2) 📘 Rule violations (1) 📎 Requirement gaps (0)


Action required
1. repo_context_files type unchecked ✓ Resolved 📘 Rule violation ☼ Reliability
Description
build_repo_context() assumes repo_context_files is an iterable of file paths without
validating/normalizing its container type. If it is misconfigured as a string (common via
env/Dynaconf), it will iterate characters and attempt to fetch bogus paths, causing many useless
provider calls and missing context instead of failing safely with a targeted warning.
Code

pr_agent/algo/repo_context.py[R26-41]

+def build_repo_context(git_provider) -> str:
+    context_files = get_settings().config.get("repo_context_files", [])
+    if not context_files:
+        return ""
+
+    max_lines = get_settings().config.get("repo_context_max_lines", 500)
+    try:
+        max_lines = max(0, int(max_lines))
+    except (TypeError, ValueError):
+        max_lines = 500
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
Evidence
PR Compliance ID 20 requires boundary validation/normalization of configuration inputs, but in
build_repo_context() the code pulls repo_context_files from config and iterates it directly,
only validating that each iterated element is a non-empty string. This means a mis-typed
configuration such as a single string will be treated as an iterable of characters, each character
will pass the per-element string validation, and get_repo_file_content() will be called repeatedly
with incorrect one-character “paths” rather than rejecting the configuration or applying a safe
default.

pr_agent/algo/repo_context.py[26-41]
pr_agent/algo/repo_context.py[26-45]
pr_agent/settings/configuration.toml[5-30]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` reads `repo_context_files` from configuration and immediately iterates it, but does not validate/normalize the container type to ensure it is a list of file paths. When misconfigured as a single string (a common outcome of env/Dynaconf overrides), the loop iterates characters and triggers many bogus `get_repo_file_content()` calls instead of fetching the intended file once or failing safely with a targeted warning.
## Issue Context
Compliance requires validating and normalizing configuration inputs at boundaries, with safe defaults and targeted warnings. Although `repo_context_files` is documented as a list in the default configuration, overrides can easily supply a string; the current implementation only checks each iterated element is a non-empty string, which allows the string-misconfiguration to slip through per-character.
## Fix Focus Areas
- pr_agent/algo/repo_context.py[26-45]
- pr_agent/settings/configuration.toml[5-30]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Repo context truncates fences ✓ Resolved 🐞 Bug ≡ Correctness
Description
build_repo_context() enforces repo_context_max_lines by slicing the already-rendered lines, which
can cut off mid-<file> / mid-`````markdown fence and make the remainder of the system prompt parse
as a code block or malformed markup.
Code

pr_agent/algo/repo_context.py[R31-64]

+    max_lines = get_settings().config.get("repo_context_max_lines", 500)
+    try:
+        max_lines = max(0, int(max_lines))
+    except (TypeError, ValueError):
+        max_lines = 500
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
+
+        file_path = file_path.strip()
+        try:
+            content = git_provider.get_repo_file_content(file_path)
+        except Exception as e:
+            get_logger().warning(f"Failed to load repo context file: {file_path}", artifact={"error": str(e)})
+            continue
+
+        if not content:
+            get_logger().debug(f"Repo context file is empty or missing: {file_path}")
+            continue
+
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+
+        files[file_path] = str(content).rstrip()
+
+    if not files:
+        return ""
+
+    rendered_lines = render_instruction_files(files).splitlines()
+
+    return "\n".join(rendered_lines[:max_lines]).strip()
Evidence
The implementation slices rendered lines without ensuring structural closure, and the unit test
demonstrates truncation ending at an opening markdown fence with no closing fence or closing tags.

pr_agent/algo/repo_context.py[26-64]
tests/unittest/test_repo_context.py[96-110]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` truncates the rendered instruction-files payload by line count after rendering, which can leave unclosed `<instruction_files>`/`<file>` tags and unclosed `\`\`\`\`\`\`markdown` fences. This can cause the rest of the prompt to be interpreted as part of a code block and materially change model behavior.
### Issue Context
Current behavior is explicitly tested to allow truncation mid-fence, which is unsafe for prompt formatting.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[26-64]
- tests/unittest/test_repo_context.py[96-110]
### Implementation notes
- Build the output incrementally while tracking remaining line budget, and **always** append closing fence + closing tags (even when truncating file content).
- If truncation occurs, add an explicit marker (e.g., `…(truncated)…`) inside the fenced block, then close the fence and tags.
- Update/replace `test_build_repo_context_enforces_total_line_cap` to assert the truncated output still ends with a closed fence and `</instruction_files>`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended
3. Override entries not stripped 📘 Rule violation ☼ Reliability
Description
claude_extended_thinking_models_override values are validated but not normalized (e.g., whitespace
is not stripped), so a config like " claude-sonnet-4-6 " will silently fail the `model in
self.claude_extended_thinking_models` eligibility check. This violates the requirement to normalize
configuration at the boundary and can prevent extended-thinking from enabling despite being
configured.
Code

pr_agent/algo/ai_handlers/litellm_ai_handler.py[R235-250]

+        # Models that support extended thinking (config override replaces the built-in list when non-empty)
+        override = get_settings().config.get("claude_extended_thinking_models_override", []) or []
+        if override and not isinstance(override, list):
+            get_logger().warning(
+                "Invalid claude_extended_thinking_models_override in config; expected a list of model names. "
+                "Falling back to the built-in Claude extended-thinking model list."
+            )
+            override = []
+        elif override and not all(isinstance(model, str) and model.strip() for model in override):
+            get_logger().warning(
+                "Invalid claude_extended_thinking_models_override in config; "
+                "expected a list of model name strings. "
+                "Falling back to the built-in Claude extended-thinking model list."
+            )
+            override = []
+        self.claude_extended_thinking_models = list(override) if override else CLAUDE_EXTENDED_THINKING_MODELS
Evidence
Rule 21 requires configuration inputs to be validated and normalized at the boundary. The new
override handling validates list/type and checks model.strip() only for non-empty validation, but
then assigns self.claude_extended_thinking_models = list(override) without stripping/normalizing
values, while later eligibility relies on exact string membership.

pr_agent/algo/ai_handlers/litellm_ai_handler.py[235-250]
pr_agent/algo/ai_handlers/litellm_ai_handler.py[517-520]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`claude_extended_thinking_models_override` is validated but not normalized before use. Because eligibility uses exact string membership (`model in self.claude_extended_thinking_models`), leading/trailing whitespace (and other non-normalized forms) can cause the override to appear “enabled” but not actually match any runtime model IDs.
## Issue Context
This is a configuration boundary; per compliance, inputs should be normalized (at least `.strip()`), so operators don’t get surprising behavior from common TOML/env formatting mistakes.
## Fix Focus Areas
- pr_agent/algo/ai_handlers/litellm_ai_handler.py[235-250]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Repo context line cap wrong ✓ Resolved 🐞 Bug ≡ Correctness
Description
repo_context_max_lines is documented as a maximum total rendered lines, but
render_instruction_files_with_line_budget() can still output more than that due to unconditional
wrapper/header/footer lines. This can cause prompts to exceed the configured budget and break
operators’ expectations when setting small caps.
Code

pr_agent/algo/repo_context.py[R47-82]

+def render_instruction_files_with_line_budget(files: dict[str, str], max_lines: int) -> str:
+    parts = [
+        INSTRUCTION_FILES_INTRO,
+        "<instruction_files>",
+    ]
+
+    for path, content in files.items():
+        scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root"
+        fence = _get_markdown_fence(content)
+        file_header = [
+            f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">',
+            f"{fence}markdown",
+        ]
+        file_footer = [
+            fence,
+            "</file>",
+            "",
+        ]
+        content_lines = content.rstrip().splitlines()
+        reserved_closing_lines = len(file_header) + len(file_footer) + 1
+        available_content_lines = max_lines - len(parts) - reserved_closing_lines
+
+        parts.extend(file_header)
+        if available_content_lines >= len(content_lines):
+            parts.extend(content_lines)
+        else:
+            if available_content_lines > 1:
+                parts.extend(content_lines[: available_content_lines - 1])
+            parts.append(TRUNCATION_MARKER)
+            parts.extend(file_footer)
+            break
+
+        parts.extend(file_footer)
+
+    parts.append("</instruction_files>")
+    return "\n".join(parts).strip()
Evidence
The config explicitly claims a maximum total line budget, but the renderer always emits wrapper
lines and the unit test demonstrates output that exceeds a max_lines=4 setting, proving the cap is
not enforced as documented.

pr_agent/settings/configuration.toml[25-33]
pr_agent/algo/repo_context.py[47-83]
tests/unittest/test_repo_context.py[191-212]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`repo_context_max_lines` is described as a maximum total number of rendered lines, but the current implementation only limits the *file content* lines and can exceed the cap by a fixed wrapper overhead (intro + `<instruction_files>` + `<file>` header/footer + closing tag). This makes the configured budget unreliable.
### Issue Context
- `repo_context_max_lines` is defined/documented as a total rendered-line cap in configuration defaults.
- `render_instruction_files_with_line_budget()` always emits wrapper lines even when `max_lines` is very small.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[47-82]
- pr_agent/settings/configuration.toml[28-29]
### Implementation notes
- Decide on intended semantics (strict cap vs. best-effort content cap).
- If strict: before adding a file header/footer, check whether the remaining budget can accommodate them and the closing `</instruction_files>`; if not, return an empty string (or a minimal, well-formed wrapper) rather than exceeding the cap.
- Update the configuration comment if semantics remain best-effort (so operators aren’t misled).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Empty context cached on errors 🐞 Bug ☼ Reliability
Description
build_repo_context() caches an empty string when no files are successfully loaded, including cases
where all fetches fail due to exceptions. If the first attempt fails transiently, subsequent calls
with the same config on the same provider instance will never retry and will silently omit repo
context for the rest of that run.
Code

pr_agent/algo/repo_context.py[R118-150]

+    cache_key = _get_repo_context_cache_key(context_files, max_lines)
+    repo_context_cache = getattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, None)
+    if repo_context_cache is None:
+        repo_context_cache = {}
+        setattr(git_provider, REPO_CONTEXT_CACHE_ATTRIBUTE, repo_context_cache)
+    if cache_key in repo_context_cache:
+        return repo_context_cache[cache_key]
+
+    files = {}
+    for file_path in context_files:
+        if not isinstance(file_path, str) or not file_path.strip():
+            get_logger().warning("Skipping invalid repo context file path", artifact={"file_path": file_path})
+            continue
+
+        file_path = file_path.strip()
+        try:
+            content = git_provider.get_repo_file_content(file_path)
+        except Exception as e:
+            get_logger().warning(f"Failed to load repo context file: {file_path}", artifact={"error": str(e)})
+            continue
+
+        if not content:
+            get_logger().debug(f"Repo context file is empty or missing: {file_path}")
+            continue
+
+        if isinstance(content, bytes):
+            content = content.decode("utf-8", errors="replace")
+
+        files[file_path] = str(content).rstrip()
+
+    if not files:
+        repo_context_cache[cache_key] = ""
+        return ""
Evidence
The function short-circuits on cache hits and stores an empty string when files is empty, while
treating exceptions as skip-and-continue; together this proves a transient all-files failure becomes
a persistent empty result for the provider instance.

pr_agent/algo/repo_context.py[118-154]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`build_repo_context()` writes `""` into the provider cache when no files were loaded. Because exceptions are handled by skipping the file and continuing, a transient provider/API failure that prevents *all* files from loading results in an empty cached value; later calls won’t retry and will return the cached empty context.
### Issue Context
This matters specifically because the code *intentionally* supports multiple calls (it has a cache) and because provider calls can fail transiently.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[118-154]
### Implementation notes
- Track whether any exception occurred during the fetch loop (e.g., `had_fetch_error = True` in the `except Exception` path).
- If `not files` and `had_fetch_error` is true, return `""` **without caching**, so a later call can retry.
- Alternatively, cache negative results only for clear “missing file” outcomes (requires providers to distinguish 404 vs transient errors), or cache empty results with a short TTL.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
6. Fixed fence can break ✓ Resolved 🐞 Bug ☼ Reliability
Description
repo_context renders instruction files using a constant 5-backtick markdown fence and inserts file
content unescaped, so any instruction file containing the same fence sequence can terminate the
block early and corrupt the intended <instruction_files>/<file> structure in the prompt. This can
make downstream prompt parsing/formatting unreliable and can change how the model interprets
subsequent prompt text.
Code

pr_agent/algo/repo_context.py[R11-27]

+MARKDOWN_FENCE = "`````"
+_unsupported_repo_context_provider_classes = set()
+
+
+def render_instruction_files(files: dict[str, str]) -> str:
+    parts = [
+        INSTRUCTION_FILES_INTRO,
+        "<instruction_files>",
+    ]
+
+    for path, content in files.items():
+        scope = path.rsplit("/", 1)[0] if "/" in path else "repo-root"
+        parts.append(f'<file path="{escape(path, quote=True)}" scope="{escape(scope, quote=True)}">')
+        parts.append(f"{MARKDOWN_FENCE}markdown")
+        parts.append(content.rstrip())
+        parts.append(MARKDOWN_FENCE)
+        parts.append("</file>")
Evidence
repo_context.py defines a fixed fence and inserts raw file content between the opening/closing
fence; prompt templates then include repo_context verbatim, so a fence collision directly affects
the system prompt text structure.

pr_agent/algo/repo_context.py[11-27]
pr_agent/algo/repo_context.py[34-66]
pr_agent/settings/pr_reviewer_prompts.toml[72-79]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`render_instruction_files()` / `render_instruction_files_with_line_budget()` wrap file content with a constant fence (`MARKDOWN_FENCE = "`````"`). If any included repo file contains that same sequence, the fence can be closed early, producing malformed prompt structure.
### Issue Context
Repo-context content is injected verbatim into system prompts via `{{ repo_context }}`. The current formatting relies on fences to keep `<file ...>` wrappers and surrounding text structurally consistent.
### Fix Focus Areas
- pr_agent/algo/repo_context.py[11-68]
### Implementation notes
- Compute a per-file fence that is guaranteed not to appear in the content (e.g., start with 3 backticks and keep increasing length until it’s not a substring of the content).
- Use that computed fence for both the opening and closing fence for that file.
- Apply the same logic in both rendering functions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Repo context fetched repeatedly 🐞 Bug ➹ Performance
Description
build_repo_context() is called in the constructors of pr_reviewer, pr_description, and
pr_code_suggestions, so the same configured repo files can be fetched multiple times from the git
provider in a single run. This adds avoidable latency and increases API/rate-limit pressure for
GitHub/GitLab providers.
Code

pr_agent/tools/pr_reviewer.py[R95-97]

         "extra_instructions": get_settings().pr_reviewer.extra_instructions,
+            "repo_context": build_repo_context(self.git_provider),
         "commit_messages_str": self.git_provider.get_commit_messages(),
Evidence
Each tool constructor populates vars["repo_context"] by calling build_repo_context(), and
build_repo_context() loops over configured files and calls get_repo_file_content per file, so
repeating this across tools repeats the remote fetch work.

pr_agent/tools/pr_reviewer.py[81-104]
[pr_agent/tools/pr_description.py[62-72]](https://github.com/The-PR-Agent/pr-agent/blob/e93cad2ec82ba424363b05c390150a2ef59d8e15/pr_agent/tools/pr_descri...

Comment thread pr_agent/algo/repo_context.py
Comment thread pr_agent/algo/repo_context.py Outdated
@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Persistent review updated to latest commit 2b313ad

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Persistent review updated to latest commit 50de2c6

@qodo-free-for-open-source-projects
Copy link
Copy Markdown
Contributor

qodo-free-for-open-source-projects Bot commented May 11, 2026

Persistent review updated to latest commit e93cad2

Comment on lines +136 to +140
cache_key = _get_repo_context_cache_key(context_files, max_lines)
process_cache_key = _get_repo_context_process_cache_key(git_provider, context_files, max_lines)
if process_cache_key is not None and process_cache_key in _repo_context_process_cache:
return _repo_context_process_cache[process_cache_key]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Unbounded context process cache 🐞 Bug ☼ Reliability

build_repo_context() stores repo context in a module-level _repo_context_process_cache keyed by PR
URL without TTL/eviction, so long-lived servers can accumulate entries indefinitely and can keep
serving stale instruction files for a PR until process restart.
Agent Prompt
## Issue description
`build_repo_context()` uses a module-level dict (`_repo_context_process_cache`) as a process-wide cache keyed by PR URL. Because it has no TTL/max-size eviction, long-lived servers will:
1) grow memory unbounded as more PR URLs are processed, and
2) potentially serve stale instruction-file contents for the same PR URL even if the repo default-branch instruction files change.

## Issue Context
This cache is independent of any single provider instance and persists for the lifetime of the Python process. The repo runs as a long-lived FastAPI/uvicorn app in webhook deployments, so the cache lifetime is effectively “forever”.

## Fix Focus Areas
- pr_agent/algo/repo_context.py[13-14]
- pr_agent/algo/repo_context.py[136-140]
- pr_agent/algo/repo_context.py[175-185]

## How to fix
- Replace `_repo_context_process_cache` with a bounded/evicting structure (e.g., `functools.lru_cache(maxsize=...)`, or a TTL cache).
- If keeping cross-instance caching, include an invalidation strategy (TTL is simplest) so updated repo instruction files eventually refresh.
- Ensure empty-string entries also follow the eviction policy to avoid permanently pinning “missing” results for long-running processes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Claude Sonnet extended thinking models

1 participant