-
Notifications
You must be signed in to change notification settings - Fork 848
[DRAFT] plumbing: fully support TREE, REUC, LINK, UNTR, EOIE, FSMN, IEOT index extensions #1622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
NOTE: The TREE extension decoder has been updated to match the behavior of the C implementation. The decoder should continue reading the entry value until a newline is encountered to ensure the buffer advances correctly to the next entry. Additionally, invalidated TREE entries should be preserved rather than discarded as in the original logic. Preserving these entries retains valuable information and enables re-encoding the index byte-for-byte exactly as intended. |
|
NOTE: The REUC decoder has been updated to correctly decode stages in the intended order. Previously, iterating over the map caused a random order since maps are unordered and iteration order is not guaranteed. |
pjbgf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christian-roggia thanks for looking into this. The changes are looking good, although I need to take a closer look around the extensions on a follow-up review.
Please add some tests around the ewah code and rebase the PR.
| ) | ||
|
|
||
| func ReadFrom(r io.Reader) (*Bitmap, error) { | ||
| var bits uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a nil check for r.
| RLWLargestLiteralCount = (1 << RLWLiteralBits) - 1 | ||
| ) | ||
|
|
||
| func GetRunBit(rlw uint64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func GetRunBit(rlw uint64) bool { | |
| // RunBit returns whether the run bit in rlw is set. | |
| func RunBit(rlw uint64) bool { |
| return rlw&1 != 0 | ||
| } | ||
|
|
||
| func GetRunningLen(rlw uint64) uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func GetRunningLen(rlw uint64) uint64 { | |
| // RunningLen extracts rlw's running length. | |
| func RunningLen(rlw uint64) uint64 { |
| return uint64((rlw >> 1) & RLWLargestRunningCount) | ||
| } | ||
|
|
||
| func GetLiteralWords(rlw uint64) uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func GetLiteralWords(rlw uint64) uint64 { | |
| // LiteralWords extracts the number of literal words in rlw. | |
| func LiteralWords(rlw uint64) uint64 { |
| return false | ||
| } | ||
|
|
||
| // ForEach calls fn() for each set bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // ForEach calls fn() for each set bit. | |
| // ForEach calls fn() for each set bit. | |
| // The returning bool from fn defines whether iteration should continue. |
| } | ||
| } | ||
|
|
||
| func (b *Bitmap) NumBits() uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the core difference between NumBits and Bits? Please document both funcs.
| return uint64(rlw >> (1 + RLWRunningBits)) | ||
| } | ||
|
|
||
| func (b *Bitmap) Get(pos uint64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't At be a better name here? I'm assuming this checks whether a bit is set at a given position. Is that right?
| func (b *Bitmap) Get(pos uint64) bool { | |
| func (b *Bitmap) At(pos uint64) bool { |
Please document this func.
| if idx.ResolveUndo != nil { | ||
| if err := e.encodeREUC(idx.ResolveUndo); err != nil { | ||
| return err | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not duplicated with L233-L237?
| if idx.ResolveUndo != nil { | |
| if err := e.encodeREUC(idx.ResolveUndo); err != nil { | |
| return err | |
| } | |
| } |
This pull request introduces full support for the TREE, REUC, LINK, UNTR, FSMN, IEOT, and EOIE index extensions. Partial decoding support for the TREE, EOIE and REUC extensions already existed, but encoding was missing. There are a few other official extensions not yet implemented, which can be added in future updates.
I would appreciate an initial review of these changes as I continue testing and validating support for the new index extensions in our environment.