Skip to content

Conversation

@christian-roggia
Copy link
Contributor

@christian-roggia christian-roggia commented Aug 9, 2025

This pull request introduces full support for the TREE, REUC, LINK, UNTR, FSMN, IEOT, and EOIE index extensions. Partial decoding support for the TREE, EOIE and REUC extensions already existed, but encoding was missing. There are a few other official extensions not yet implemented, which can be added in future updates.

I would appreciate an initial review of these changes as I continue testing and validating support for the new index extensions in our environment.

@christian-roggia
Copy link
Contributor Author

christian-roggia commented Aug 9, 2025

NOTE: The TREE extension decoder has been updated to match the behavior of the C implementation. The decoder should continue reading the entry value until a newline is encountered to ensure the buffer advances correctly to the next entry. Additionally, invalidated TREE entries should be preserved rather than discarded as in the original logic. Preserving these entries retains valuable information and enables re-encoding the index byte-for-byte exactly as intended.

@christian-roggia
Copy link
Contributor Author

NOTE: The REUC decoder has been updated to correctly decode stages in the intended order. Previously, iterating over the map caused a random order since maps are unordered and iteration order is not guaranteed.

@christian-roggia christian-roggia changed the title plumbing: fully support TREE, REUC, LINK, UNTR, EOIE index extensions [DRAFT] plumbing: fully support TREE, REUC, LINK, UNTR, EOIE, FSMN, IEOT index extensions Aug 27, 2025
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christian-roggia thanks for looking into this. The changes are looking good, although I need to take a closer look around the extensions on a follow-up review.

Please add some tests around the ewah code and rebase the PR.

)

func ReadFrom(r io.Reader) (*Bitmap, error) {
var bits uint32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a nil check for r.

RLWLargestLiteralCount = (1 << RLWLiteralBits) - 1
)

func GetRunBit(rlw uint64) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetRunBit(rlw uint64) bool {
// RunBit returns whether the run bit in rlw is set.
func RunBit(rlw uint64) bool {

return rlw&1 != 0
}

func GetRunningLen(rlw uint64) uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetRunningLen(rlw uint64) uint64 {
// RunningLen extracts rlw's running length.
func RunningLen(rlw uint64) uint64 {

return uint64((rlw >> 1) & RLWLargestRunningCount)
}

func GetLiteralWords(rlw uint64) uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func GetLiteralWords(rlw uint64) uint64 {
// LiteralWords extracts the number of literal words in rlw.
func LiteralWords(rlw uint64) uint64 {

return false
}

// ForEach calls fn() for each set bit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ForEach calls fn() for each set bit.
// ForEach calls fn() for each set bit.
// The returning bool from fn defines whether iteration should continue.

}
}

func (b *Bitmap) NumBits() uint64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the core difference between NumBits and Bits? Please document both funcs.

return uint64(rlw >> (1 + RLWRunningBits))
}

func (b *Bitmap) Get(pos uint64) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't At be a better name here? I'm assuming this checks whether a bit is set at a given position. Is that right?

Suggested change
func (b *Bitmap) Get(pos uint64) bool {
func (b *Bitmap) At(pos uint64) bool {

Please document this func.

Comment on lines +239 to +244
if idx.ResolveUndo != nil {
if err := e.encodeREUC(idx.ResolveUndo); err != nil {
return err
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not duplicated with L233-L237?

Suggested change
if idx.ResolveUndo != nil {
if err := e.encodeREUC(idx.ResolveUndo); err != nil {
return err
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants