Skip to content

fix(iobuf): translate index by diskBufferSize in BufferedReadSeeker.Read#4939

Open
SAY-5 wants to merge 3 commits into
trufflesecurity:mainfrom
SAY-5:fix/bufferedreader-branch1-index-translation
Open

fix(iobuf): translate index by diskBufferSize in BufferedReadSeeker.Read#4939
SAY-5 wants to merge 3 commits into
trufflesecurity:mainfrom
SAY-5:fix/bufferedreader-branch1-index-translation

Conversation

@SAY-5
Copy link
Copy Markdown

@SAY-5 SAY-5 commented May 3, 2026

Description:

BufferedReadSeeker.Read mistranslates the virtual stream index when the in-memory buffer has been flushed to disk and then refilled with post-flush data:

  • Buffer branch (bufferedreaderseeker.go line 146 in main):

    if br.index < int64(br.buf.Len()) {
        totalBytesRead = copy(out, br.buf.Bytes()[br.index:])

    br.index is the logical stream position, not a buffer offset. After a flush + refill (e.g. diskBufferSize=32, buf.Len()=16), a backward read at index 0 satisfies 0 < 16 and slices buf.Bytes()[0:], returning the post-flush tail (bytes 32..47) instead of the bytes that were flushed to disk.

  • Temp-file branch (line 156 in main):

    if _, err := br.tempFile.Seek(br.index-int64(br.buf.Len()), io.SeekStart); err != nil {

    Subtracting buf.Len() is only correct when the buffer is empty. Once the buffer carries any post-flush data, this produces a negative offset and a seek error (or a wrong-region read).

Both branches now use the same anchor diskBufferSize. The invariant is:

  • Bytes [0, diskBufferSize) live in tempFile.
  • Bytes [diskBufferSize, diskBufferSize + buf.Len()) live in buf.

Reproducer (now a regression test in bufferedreaderseeker_test.go):

  1. Wrap a non-seekable reader of 48 bytes, set threshold = 32.
  2. Read 32 bytes (triggers flush to disk).
  3. Read 16 more (refills the buffer).
  4. Seek(0, io.SeekStart) then Read(8), previously returned bytes [32..39] instead of [0..7].

Distinct from #4914, which fixed a per-phase counter inside the same Read method; that change did not touch the index translation.

Checklist:

  • Tests passing (go test ./pkg/iobuf/... -race)?
  • Lint passing (make lint this requires golangci-lint)?

Documentation:

N/A. Internal package, behaviour-only fix.


Note

Medium Risk
Touches core buffered I/O/seek behavior for non-seekable readers; incorrect boundary handling could cause silent data corruption or EOF behavior regressions if edge cases remain.

Overview
Fixes BufferedReadSeeker.Read for non-seekable readers after the in-memory buffer has been flushed to disk and then refilled.

The read path now treats the virtual stream as disk region [0,diskBufferSize) plus post-flush in-memory tail, seeking into the temp file using br.index, capping disk reads to diskBufferSize, and translating buffer reads by diskBufferSize so reads/rewinds return the correct bytes.

Adds regression tests covering backward seeks after a flush and a single read spanning the disk→buffer boundary, ensuring buffered tail bytes remain accessible and aren’t orphaned by an early underlying-reader EOF.

Reviewed by Cursor Bugbot for commit ce21fb8. Bugbot is set up for automated code reviews on this repo. Configure here.

After the in-memory buffer is flushed to disk and then refilled with
post-flush data, BufferedReadSeeker.Read mistranslated the virtual
stream index in two places:

- The buffer branch sliced buf.Bytes()[br.index:], treating br.index as
  a buffer-relative offset; a backward seek into the on-disk region
  satisfied br.index < buf.Len() and silently returned the wrong bytes.
- The temp-file branch seeked to br.index - buf.Len(), producing a
  negative offset (and seek error) once the buffer carried any data.

Both branches now use the diskBufferSize anchor: the buffer holds
bytes [diskBufferSize, diskBufferSize+buf.Len()) and the temp file
holds bytes [0, diskBufferSize). Adds a regression test covering a
backward seek and a buffered-tail read after a flush.

Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
@SAY-5 SAY-5 requested a review from a team May 3, 2026 06:45
@SAY-5 SAY-5 requested review from a team as code owners May 3, 2026 06:45
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


SAY-5 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit caa5723. Configure here.

Comment thread pkg/iobuf/bufferedreaderseeker.go
A single Read that begins in the on-disk region and extends into the
post-flush in-memory buffer used to skip the buffer entirely. The
buffer branch ran first and was skipped because index < diskBufferSize;
the disk branch then filled only the disk portion and the call fell
through to the underlying reader. Once the reader returned io.EOF the
code set sizeKnown=true without flushing buf to disk, and the fast
path at the top of Read started serving exclusively from tempFile —
permanently orphaning the buffered tail.

Run the disk branch first so a spanning read leaves index aligned
with the buffer region (index == diskBufferSize), then let the
buffer branch finish the request in the same call. Cap the disk
read at diskBufferSize-index so it never reaches into bytes that
will later be appended/flushed. Adds a regression test that seeks
into the disk tail, reads across the boundary, and then re-reads
the buffered tail to confirm it stays accessible.

Signed-off-by: SAY-5 <say.apm35@gmail.com>
@SAY-5
Copy link
Copy Markdown
Author

SAY-5 commented May 3, 2026

Addressed Cursor Bugbot finding in a5f19c3.

The bot was right: this is real data loss, not a benign partial read. With diskBufferSize=32, buf.Len()=16, then Seek(28) + Read(12), the buffer branch fell through (28 < 32), the disk branch filled only the 4 bytes left in tempFile and advanced index to 32, and the call then hit the underlying reader. EOF flipped sizeKnown=true without flushing the 16 buffered bytes to disk, and the sizeKnown fast path on subsequent Reads served only the 32-byte tempFile, orphaning bytes [32,48) for the lifetime of the BufferedReadSeeker.

Fix runs the disk branch first so a spanning read leaves index == diskBufferSize and the buffer branch can finish the same call. The disk read is also capped at diskBufferSize - index so it can never reach into bytes that writeData will later append or flush. Added a regression test (TestBufferedReaderSeekerReadSpansDiskAndBuffer) that reproduces the exact scenario above; verified it fails on the prior commit (returns zeros for the orphaned tail) and passes after the fix. go test ./pkg/iobuf/... -race -count=2 clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants