Skip to content

Bug + feature: -i (incremental review) crashes on Azure DevOps; needs full incremental support in AzureDevopsProvider #2379

@agonzalesipcoop-cmyk

Description

@agonzalesipcoop-cmyk

Bug + feature request: incremental review (-i) crashes on Azure DevOps; needs full incremental support in AzureDevopsProvider

Repo: https://github.com/the-pr-agent/pr-agent
Affected version: pr-agent==0.34.3 (latest at time of writing)
Provider: Azure DevOps (AzureDevopsProvider)


TL;DR

Running pr-agent review -i against an Azure DevOps PR raises:

ERROR | pr_agent.tools.pr_reviewer:run:184 - Failed to review PR: object of type 'NoneType' has no len()

The crash is silent (only the message, no traceback) because pr_reviewer.py:184 catches Exception and logs only str(e). The underlying call site is pr_reviewer.py:341:

num_new_commits = len(self.incremental.commits_range)  # commits_range is None

AzureDevopsProvider does not implement get_incremental_commits. It inherits the no-op stub from GitProvider.get_incremental_commits (just pass). The hasattr(provider, "get_incremental_commits") guard at pr_reviewer.py:337 does not protect against this — the inherited stub satisfies hasattr — so commits_range remains None and the next line blows up.

Only GitHubProvider ships a real implementation.


Steps to reproduce

pip install pr-agent==0.34.3

# Configure azure_devops.org and credentials, then:
pr-agent --pr_url 'https://dev.azure.com/<org>/<project>/_git/<repo>/pullrequest/<id>' review -i

Expected: incremental review (only changes since the last review).
Actual: Failed to review PR: object of type 'NoneType' has no len(). No traceback.


Two requests

  1. Bug: make _can_run_incremental_review resilient when the provider has not populated commits_range. The hasattr check at line 337 is the wrong heuristic — should also confirm self.incremental.commits_range is not None. Suggested:

    if self.incremental.commits_range is None:
        get_logger().info(
            f"Incremental review not supported by {get_settings().config.git_provider}; skipping -i."
        )
        return False
  2. Feature: implement get_incremental_commits on AzureDevopsProvider, mirroring the GitHub implementation, so -i actually works on Azure DevOps.


Implementation findings (for the feature)

I prototyped the port in our internal monkey-patch layer (happy to upstream as a PR if welcome). Below are the differences vs GitHubProvider that matter for a clean port.

1. Commit list ordering is reversed

  • PyGithub.PullRequest.get_commits() returns commits oldest-first. github_provider.get_commit_range iterates with range(len-1, -1, -1) (newest to oldest) and breaks on the first commit older than previous_review.created_at.
  • azure_devops_client.get_pull_request_commits() returns commits newest-first. Confirmed by AzureDevopsProvider.get_latest_commit_url which treats commits[0] as latest.
  • Naive port → loop direction is inverted → breaks immediately on the oldest commit → commits_range = [] → review reports "no new files" even when there are.
  • Fix: raw_commits.reverse() after fetch, or invert the loop.

2. Commit object shape

GitHub Commit exposes commit.commit.author.date, commit.commit.message, commit.sha, and commit.files.

Azure SDK GitCommitRef exposes commit.author.date (one level shallower), commit.comment (not .message), commit.commit_id (not .sha), and no .files attribute (you must call get_changes(commit_id=...) separately).

The shared abstraction layer (IncrementalPR.first_new_commit_sha and pr_reviewer.py:349) assumes the GitHub shape — last_seen_commit.sha and last_seen_commit.commit.author.date. Recommend either:

  • (A) Rename to provider-neutral attributes (commit_id, author_date) and update both providers, or
  • (B) Wrap Azure commits in a small adapter class that exposes the GitHub-shape attributes.

We used approach B as a runtime patch.

3. Datetime tz-awareness mismatch

  • Azure SDK returns Comment.published_date and commit.author.date as tz-aware UTC datetimes.
  • pr_reviewer.py:345-352 compares to datetime.now() - timedelta(...) which is naive.
  • Result: TypeError: can't compare offset-naive and offset-aware datetimes.

PyGithub returns naive UTC for created_at, so GitHub side is fine.

Fix: either make pr_reviewer.py use datetime.now(timezone.utc) everywhere, or have the provider normalize Azure datetimes to naive UTC at the seam.

4. Comment object has no html_url

pr_reviewer.py:146 reads self.git_provider.previous_review.html_url when unreviewed_files_set is empty. PyGithub IssueComment.html_url exists; Azure Comment has no equivalent attribute.

AzureDevopsProvider already defines get_comment_url(comment) that builds a discussion URL. The Azure get_previous_review should set comment.html_url = self.get_comment_url(comment) before returning, or pr_reviewer.run should use git_provider.get_comment_url(...) instead of touching .html_url directly.

5. get_changes change-entry shape

The commit-level azure_devops_client.get_changes(commit_id=...) returns objects whose .changes items behave like dicts (change["item"]["path"]) — see existing AzureDevopsProvider.get_files.

The PR-iteration-level get_pull_request_iteration_changes instead returns change_entries whose items use change.additional_properties["item"] — see get_diff_files. A robust port should handle both shapes.

6. get_diff_files does not honor incremental mode

Even after unreviewed_files_set is populated, AzureDevopsProvider.get_diff_files always builds patches against pr.last_merge_target_commit → pr.last_merge_commit (full base→head). For incremental review the correct base is last_seen_commit.commit_id, otherwise the LLM still sees the entire PR diff and -i provides no token savings (only file-list filtering).

GitHub's get_diff_files already handles this branch:

if self.incremental.is_incremental and self.unreviewed_files_set:
    original_file_content_str = self._get_pr_file_content(file, self.incremental.last_seen_commit_sha)
    patch = load_large_diff(file.filename, new_file_content_str, original_file_content_str)
    self.unreviewed_files_set[file.filename] = patch

Equivalent Azure path needs to call get_item(version=last_seen_commit_id, version_type="commit") for the original file content.

7. Pre-existing nuisance bug surfaced by -i

When last_merge_commit is None (some PR states), AzureDevopsProvider.get_diff_files raises:

AttributeError: 'NoneType' object has no attribute 'commit_id'

at line 273. Already swallowed by the function's own try/except returning [], but worth tightening: check self.pr.last_merge_commit is not None upfront and return early with a useful log if not.


Suggested patch outline

# pr_agent/git_providers/azuredevops_provider.py

def get_incremental_commits(self, incremental=IncrementalPR(False)):
    self.incremental = incremental
    if self.incremental.is_incremental:
        self.unreviewed_files_set = {}
        self._get_incremental_commits()

def _get_incremental_commits(self):
    if not getattr(self, "pr_commits", None):
        raw = list(self.azure_devops_client.get_pull_request_commits(
            project=self.workspace_slug,
            repository_id=self.repo_slug,
            pull_request_id=self.pr_num,
        ))
        raw.reverse()  # Azure returns newest-first; normalize to oldest-first.
        self.pr_commits = raw

    self.previous_review = self.get_previous_review(full=True, incremental=True)
    if not self.previous_review:
        get_logger().info("No previous review found, will review the entire PR")
        self.incremental.is_incremental = False
        return

    self.incremental.commits_range = self._get_commit_range()
    for commit in self.incremental.commits_range:
        if commit.parents and len(commit.parents) > 1:
            continue  # Skip merge commits (multiple parents).
        try:
            changes = self.azure_devops_client.get_changes(
                project=self.workspace_slug,
                repository_id=self.repo_slug,
                commit_id=commit.commit_id,
            )
        except Exception as e:
            get_logger().warning(f"Failed to fetch changes for {commit.commit_id}: {e}")
            continue
        for change in (changes.changes or []):
            try:
                item = change["item"]
            except (KeyError, TypeError):
                item = getattr(change, "additional_properties", {}).get("item", {}) or {}
            if not isinstance(item, dict) or item.get("gitObjectType") == "tree":
                continue
            path = item.get("path")
            if path:
                self.unreviewed_files_set[path] = path

def _get_commit_range(self):
    last_review_time = _normalize_naive_utc(self.previous_review.published_date)
    first_new_commit_index = None
    for i in range(len(self.pr_commits) - 1, -1, -1):
        commit_date = _normalize_naive_utc(self.pr_commits[i].author.date) if self.pr_commits[i].author else None
        if commit_date is None:
            continue
        if commit_date > last_review_time:
            self.incremental.first_new_commit = self.pr_commits[i]
            first_new_commit_index = i
        else:
            self.incremental.last_seen_commit = self.pr_commits[i]
            break
    return self.pr_commits[first_new_commit_index:] if first_new_commit_index is not None else []

def get_previous_review(self, *, full: bool, incremental: bool):
    if not (full or incremental):
        raise ValueError("At least one of full or incremental must be True")
    prefixes = []
    if full:        prefixes.append(PRReviewHeader.REGULAR.value)
    if incremental: prefixes.append(PRReviewHeader.INCREMENTAL.value)
    for comment in self.get_issue_comments():
        body = getattr(comment, "body", None)
        if body and any(body.startswith(p) for p in prefixes):
            comment.html_url = self.get_comment_url(comment)
            return comment
    return None

Plus the commits_range is None guard in _can_run_incremental_review (defense in depth, regardless of provider), and incremental support in get_diff_files.

I am happy to open a PR if there's interest.


Workaround (for anyone hitting this today)

Drop -i from your invocation. Full review on every iteration costs more tokens but is functional. Our team has done this until upstream incremental support exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions