fix(iobuf): translate index by diskBufferSize in BufferedReadSeeker.Read#4939
fix(iobuf): translate index by diskBufferSize in BufferedReadSeeker.Read#4939SAY-5 wants to merge 3 commits into
Conversation
After the in-memory buffer is flushed to disk and then refilled with post-flush data, BufferedReadSeeker.Read mistranslated the virtual stream index in two places: - The buffer branch sliced buf.Bytes()[br.index:], treating br.index as a buffer-relative offset; a backward seek into the on-disk region satisfied br.index < buf.Len() and silently returned the wrong bytes. - The temp-file branch seeked to br.index - buf.Len(), producing a negative offset (and seek error) once the buffer carried any data. Both branches now use the diskBufferSize anchor: the buffer holds bytes [diskBufferSize, diskBufferSize+buf.Len()) and the temp file holds bytes [0, diskBufferSize). Adds a regression test covering a backward seek and a buffered-tail read after a flush. Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
|
SAY-5 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit caa5723. Configure here.
A single Read that begins in the on-disk region and extends into the post-flush in-memory buffer used to skip the buffer entirely. The buffer branch ran first and was skipped because index < diskBufferSize; the disk branch then filled only the disk portion and the call fell through to the underlying reader. Once the reader returned io.EOF the code set sizeKnown=true without flushing buf to disk, and the fast path at the top of Read started serving exclusively from tempFile — permanently orphaning the buffered tail. Run the disk branch first so a spanning read leaves index aligned with the buffer region (index == diskBufferSize), then let the buffer branch finish the request in the same call. Cap the disk read at diskBufferSize-index so it never reaches into bytes that will later be appended/flushed. Adds a regression test that seeks into the disk tail, reads across the boundary, and then re-reads the buffered tail to confirm it stays accessible. Signed-off-by: SAY-5 <say.apm35@gmail.com>
|
Addressed Cursor Bugbot finding in a5f19c3. The bot was right: this is real data loss, not a benign partial read. With Fix runs the disk branch first so a spanning read leaves |

Description:
BufferedReadSeeker.Readmistranslates the virtual stream index when the in-memory buffer has been flushed to disk and then refilled with post-flush data:Buffer branch (
bufferedreaderseeker.goline 146 in main):br.indexis the logical stream position, not a buffer offset. After a flush + refill (e.g.diskBufferSize=32,buf.Len()=16), a backward read at index 0 satisfies0 < 16and slicesbuf.Bytes()[0:], returning the post-flush tail (bytes 32..47) instead of the bytes that were flushed to disk.Temp-file branch (line 156 in main):
Subtracting
buf.Len()is only correct when the buffer is empty. Once the buffer carries any post-flush data, this produces a negative offset and a seek error (or a wrong-region read).Both branches now use the same anchor
diskBufferSize. The invariant is:[0, diskBufferSize)live intempFile.[diskBufferSize, diskBufferSize + buf.Len())live inbuf.Reproducer (now a regression test in
bufferedreaderseeker_test.go):threshold = 32.Seek(0, io.SeekStart)thenRead(8), previously returned bytes[32..39]instead of[0..7].Distinct from #4914, which fixed a per-phase counter inside the same
Readmethod; that change did not touch the index translation.Checklist:
go test ./pkg/iobuf/... -race)?make lintthis requires golangci-lint)?Documentation:
N/A. Internal package, behaviour-only fix.
Note
Medium Risk
Touches core buffered I/O/seek behavior for non-seekable readers; incorrect boundary handling could cause silent data corruption or EOF behavior regressions if edge cases remain.
Overview
Fixes
BufferedReadSeeker.Readfor non-seekable readers after the in-memory buffer has been flushed to disk and then refilled.The read path now treats the virtual stream as disk region
[0,diskBufferSize)plus post-flush in-memory tail, seeking into the temp file usingbr.index, capping disk reads todiskBufferSize, and translating buffer reads bydiskBufferSizeso reads/rewinds return the correct bytes.Adds regression tests covering backward seeks after a flush and a single read spanning the disk→buffer boundary, ensuring buffered tail bytes remain accessible and aren’t orphaned by an early underlying-reader EOF.
Reviewed by Cursor Bugbot for commit ce21fb8. Bugbot is set up for automated code reviews on this repo. Configure here.