-
Notifications
You must be signed in to change notification settings - Fork 539
Eradicate the IdList bottleneck #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Edit: seems like this issue. |
|
:-D everyone comes to more or less the same idea :-) I'll push mine tomorrow - just for comparison... But I haven't finished the iterator. |
Maintain an array that points to IdList elements in the order of their ID, not used for anything so far.
Use an iterator class internally to index elements using the array established in the previous commit. Remove a few unused methods.
Rather than keeping `elems` in ID order, add new elements to the end and just replace removed elements with the last one. The complexity is shifted to the `indexes` permutating array, which is typically faster. While using no more additional memory, RemoveTagged() is slightly slowed down in this commit.
Allocate a bit of memory to make it O(n).
Resolves another performance regression of iteration being O(n * log n) rather than O(n).
|
Here is mine https://github.com/ruevs/solvespace/tree/FixIdList_TheNextAttempt working and tests passing. Very similar to yours with some differences - I used I do not have the last huge "Remove Without this commit it is almost as fast (or barely slower) on this https://github.com/solvespace/solvespace/files/5962704/case-vents.slvs.gz from here #932 (comment) compared to current master. But it is drastically faster on setting the last translate to 20 copies in this https://github.com/solvespace/solvespace/files/5974674/LeliaManya2.zip. It takes about 2s while setting it to 6 copies takes 15s on master (I did not wait for 20 to finish)! Next I will try yours on Windows with Visual Studio 2019. |
|
There should be a test for possible corner cases in I see how the freelist simplifies the algorithm in that method (at the cost of some more memory). Like you, I also thought about putting some assertions in the iterator but... if they weren't there before... maybe it's not important. I tried to change as little as possible but Doing benchmarks was on my list, though the improvement was so immense that I just declared victory. The next thing that irritates me on Solvespace is omnipresent NURBS failures and I can't really do math. How much do you with @phkahler charge? :) Triangle meshes also take a long time to compute. |
@pjanx There is a NURBS tag for issues. The failures have been getting better and most are tracked in #738. It's an area of the code that is a bit outside most peoples experience, but @ruevs and I have been chipping away at it for a while now. A few other people have had a look too which has been very helpful even when we didn't take their changes directly. Triangle meshes are IMHO not to be used except when NURBS fail. The code there has not had the optimizations or OMP love applied so it could be slower now than doing the same with NURBS. When STL linking lands I suppose we'll hear more reports of mesh bugs. SolveSpace usage has been increasing as have the number of contributors, so there is hope ;-) |
|
Yeah, I'm having some seemingly randomized mesh issues with my most complex model already (it NURBS-fails very early). Luckily Slicer mostly takes care of it on import, or the issues aren't too severe in 3D print. |
|
@pjanx the evil spirit patches replaces the list of things with a list of pointers to things. That made all the moving faster and probably reduced cache misses while searching. |
|
Replicating IDs in the index like he seems to be doing would eliminate some cache misses during search-by-ID, at the cost of even more memory. I'm not sure how much it is okay to waste on this--it'd be nice to have a memory profile. Binary search isn't particularly cache-friendly in any case, as it jumps around and prefetching fails to work optimally. In that sense, this general approach will always be slower during whole-list iteration, where it'd be good to always keep it in ID order. A lot of theorising here, besides the general observation of "large models are fairly snappy now". |
Unfortunately the test suite is far, far from complete/exhaustive/ achieving good code coverage or being able to catch regressions :-( Still it's better than the "nothing" we had before Whitequark wrote it :-) This is one reason it would have been extremely hard to merge your last commit here when she was the maintainer. One reason SolveSpace is so stable is that she was (quite rightly) very careful about merging such all encompassing changes that are hard to regression test and also "dirty up" git blame, thus making regressions harder to track down. That said, I totally agree that I broke RemoveTagged as well until my last commit (the 0xDEADBEEF caught it ;-) ) To debug the iterator I wrote a quick and dirty "unit test" of sorts for IdList with a simple (int, char) struct. But there is no good place to commit this. |
For me it's just for fun ;-) Not to mention that I don't really feel qualified when hacking on the NURBS backend, I've forgotten all the linear algebra and calculus so long ago that I constantly have to consult references :-) But having another mind to bounce ideas off of is extremely helpful. Thank you for that. For example you starting to work on the IdList - it has been in the back of my mind since the Whitequark commit and EvilSpirits patch (so at least four years), but you stirred things up, which lead to this pull request and my similar one. Another one is Eigen - search the issues and pull requests ;-) I'll make a tracking issue for performance - just as I did for NURBS in #738 . |
As far as my experience goes, 2.3 was much less stable, and a performance minefield that made me re-learn to save often. Converting models to master is a pain, but the software is usable. I'm not sure whom to blame for this newfound stability. :) I don't really like the "git blame" argument, it often seems like a universal excuse to never change anything. Bisection should take care of tracking regressions. What I've noticed is that the codebase is in the process of losing some sense of coherence, which this PR doesn't particularly help. And cleaning that up is certainly an offensive against "git blame".
I just read the source code one more time and found it. I've spent quite a few hours trying to visualise that function and put it to code, mostly because I said I would do it. Sadly I don't have any good methods for making things simpler. |
Whiteqark ;-)
In general I agree, but contingent upon available maintainer time vs. desire for stability. Anyway, I'm pretty sure either this/yours or mine, or a cleaned up amalgamation of the two (they are pretty much the same) will get merged after 3.0. ( Both @phkahler and I have both been known to merge "stupid" stuff in the past ;-) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so overall we did roughly the same. I'll steal your "remove NextAfter" commit and try mine with it tomorrow.
I think we should try to "converge" on a single solution. If you have time I would love to hear your comments/critique on my version.
| int offset; | ||
|
|
||
| Iterator(IdList<T, H> &list, int offset) : list(&list), offset(offset) {} | ||
| T &operator*() { return list->elem[list->indexes[offset]]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On mine I kept a local T *elem updated when the iterator is changed so that * and -> can be fast. They may be called many times per loop iteration, and ++ usually only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one thing optimizers should be good at, unless some aliasing rules break it. I'd be somewhat surprised if it'd be measurable for the added complexity. Making offset point directly into the indexes array is more obvious. And both of these suffer if the array is resized during iteration, which STL just declares as undefined behaviour in such situations.
| Iterator(IdList<T, H> &list, int offset) : list(&list), offset(offset) {} | ||
| T &operator*() { return list->elem[list->indexes[offset]]; } | ||
| T *operator->() { return &list->elem[list->indexes[offset]]; } | ||
| const T &operator*() const { return list->elem[list->indexes[offset]]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should carefully review the need for const methods and in which of the former NextAfter loops we can add const. Luckily this is just a lot of clicking and testing if the compiler will agree ;-)
What do you think - should we remove the unused overloads in the end? I'm split between having a short iterator and a "complete" one with mostly unused methods. @phkahler ? @rpavlik ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC required me to add const versions here, for std::lower_bound. See the first comment of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to think just implement the code that's needed on the YAGNI principle. Unused methods are untested and possible traps for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all unused methods from mine.
| Iterator &operator++() { ++offset; return *this; } | ||
| Iterator &operator+=(int diff) { offset += diff; return *this; } | ||
| int operator-(const Iterator &i) const { return offset - i.offset; } | ||
| bool operator==(const Iterator &i) { return list == i.list && offset == i.offset; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should fix this - I'm not comparing the lists for equality after I removed the bounds checking in my last commit ruevs@6bc1aa9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should rather be an assertion rather than a direct test like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an assertion before I removed it ;-) Maybe I'll put them back in.
| auto it = std::lower_bound(begin(), end(), h, Compare()); | ||
| return it; | ||
| Iterator LowerBound(H const& h) { | ||
| return std::lower_bound(begin(), end(), h, Compare()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to use iterators on the index and complicated the predicate on the theory that the vector<int> iterator is simpler than my IdList one and thus hoping it would be faster. But I have not profiled to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect a meaningful difference, it boils down to a few pointer operations. Didn't you have to add a forward declaration and complicate CompareId because of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I called `CompareId" the predicate.
| void ReserveMore(int howMuch) { | ||
| if(n + howMuch > elemsAllocated) { | ||
| elemsAllocated = n + howMuch; | ||
| int *newIndexes = (int *)::operator new[]((size_t)elemsAllocated*sizeof(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using vector for the storage gets rid of the manual memory management, and hopefully at least in release mode should be as fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one of the things that should be unified throughout the codebase, I'm just using what's already there.
I don't even know how the custom allocator library is hooked up.
@Evil-Spirit wanted to use realloc, which should be a trick that std::vector could be able to do as well, though I'm dubious whether it's useful for geometric steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important bit is that idlist is reference semantics, not value semantics, so you can't embed a std::vector, but rather a std::shared_ptrstd::vector - a lesson I learned painfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpavlik Do you mean to say that once a pointer to an element is "taken out' of an IdList it may be kept around and dereferenced after the list has been modified? And thus the pointer may be invalid?
If this is the case can you point to an example?
But this does not make sense - the original implementation of IdList did not guarantee that the elemnts will not move - in fact it moved them around all the time!
In fact my implementation is "better" in this respect since it never moves elements once they are added because the elemstore is not rearranged when elements are removed:
https://github.com/ruevs/solvespace/blob/3c6e2be4b920d706d8231030467c5271a6d87d85/src/dsc.h#L570
the removed elements are simply added to the freelist.
| // Find out where the added element should be. | ||
| // Shift indexes to the end of the array, if needed. | ||
| int pos = LowerBoundIndex(*t); | ||
| memmove(indexes + pos + 1, indexes + pos, sizeof *indexes * (n - pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if insert in a vector<int> is as fast. I'll debug it tomorrow to see what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should optimize that way.
|
|
||
| T *begin() { return IsEmpty() ? nullptr : &elem[0]; } | ||
| T *end() { return IsEmpty() ? nullptr : &elem[0] + n; } | ||
| const T *begin() const { return IsEmpty() ? nullptr : &elem[0]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - the const iterators may be useful for compiler optimisations, if we can add const in some places where IdLists are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested in seeing evidence of any difference in this case.
| elem[dest] = elem[src]; | ||
| } | ||
| dest++; | ||
| std::vector<int> elem2indexes(n); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to trace through this with the debugger to really get it. Tomorrow.
By the way, why is this? Models from 2.0-2.3 should "just open™". If you have some that don't (and are not this #120) open an issue. |
|
Mostly getting rid of all those extra degrees of freedom, rotation takes a while to figure out. I don't remember much else. Edit: and some constraint failures, so they had to be redone. I didn't keep the old models. |
|
@pjanx I created a pull request with my version with a cleaned up history: #1002 Next I will try to see if I can measure speed differences between our implementations but I highly doubt it that there will be any. After that it boils down to style/aesthetics etc. |
|
I think we need an arbiter. It'd be nice to have a memory profile as well, Valgrind can also do that, IIRC. |
|
Hi there! Things I remeber:
|
|
A bit of my input. With concerns from #932 I didn't check the memory usage, but did a bit of speed testing. For changing constraints and these PRs. Just dragging unconstrainted extrusion length by mouse, and looking when the UI is fully interactive again. I used gcc 10.2.1 on Linux, amd64. master, -O3 -flto: ~6.6s, 2.78s for Generate::DIRTY That doesn't sounds like a lot to me, and I don't think it is worth merging any of them it at the current state. Don't get me wrong, these is good code, and a move in a right direction, but it might require more profiling, tuning and base data structure selection. (No OpenMP used). UPDATE: LeilaManya2, changing last translate group from 3 to 20 repeats. master: ~138s, 126119ms for Generate::DIRTY Nice! Ok. I might be convinced. |
I was just about to mention https://github.com/solvespace/solvespace/files/5974674/LeliaManya2.zip when I noticed you updated your comment ;-) Thanks for running it to 20 repeats on master - I did not have the patience :-) Would you please do the same timing comparison when modifying some dimension in the first group of this model? #841 (comment) How do you do the measurements on Linux? |
Off topic, but I know some Lathe groups can have issues. Lathe used to create 2 copies of the sketch entities like at the start and end of an extrude or revolve, even though both copies were exactly the same. I removed those redundant entities so any sketch that constrained against them could break in 3.0 depending which copy the constraint used. |
|
Going with the @ruevs version unless something bad happens. |
Should fix #939 and fix #932 almost entirely, shifting the bottleneck for large models. Passes tests and memcheck.
Properties of this changeset:
n * sizeof(int)bytes, which is about as low as you can go hereIdList::Addnow relies on a singlememmoveof PODs, which is faster thanstd::move-ing singularelemsIdList::RemoveTaggeduses a bit of additional temporary memory to keep its complexityOverall, I'm happy with this now, and I wouldn't be afraid of merging it. Commits should be reasonably atomic.
I think I inadvertently replicated half of Evil-Spirit's change. I haven't understood yet what he's doing there, though, even if it looks vaguely similar.