Eradicate the IdList bottleneck #992

pjanx · 2021-04-03T18:08:51Z

Should fix #939 and fix #932 almost entirely, shifting the bottleneck for large models. Passes tests and memcheck.

Properties of this changeset:

memory complexity of IdLists increases by n * sizeof(int) bytes, which is about as low as you can go here
IdList::Add now relies on a single memmove of PODs, which is faster than std::move-ing singular elems
IdList::RemoveTagged uses a bit of additional temporary memory to keep its complexity

Overall, I'm happy with this now, and I wouldn't be afraid of merging it. Commits should be reasonably atomic.

I think I inadvertently replicated half of Evil-Spirit's change. I haven't understood yet what he's doing there, though, even if it looks vaguely similar.

pjanx · 2021-04-03T18:22:27Z

~~I would appreciate it if someone tried to resolve the Windows build failure, I don't speak MSVC++ error messages.~~

Edit: seems like this issue.

ruevs · 2021-04-03T19:33:32Z

:-D everyone comes to more or less the same idea :-) I'll push mine tomorrow - just for comparison... But I haven't finished the iterator.

Maintain an array that points to IdList elements in the order of their ID, not used for anything so far.

Use an iterator class internally to index elements using the array established in the previous commit. Remove a few unused methods.

Rather than keeping `elems` in ID order, add new elements to the end and just replace removed elements with the last one. The complexity is shifted to the `indexes` permutating array, which is typically faster. While using no more additional memory, RemoveTagged() is slightly slowed down in this commit.

Allocate a bit of memory to make it O(n).

Resolves another performance regression of iteration being O(n * log n) rather than O(n).

ruevs · 2021-04-04T15:30:38Z

Here is mine https://github.com/ruevs/solvespace/tree/FixIdList_TheNextAttempt working and tests passing.

Very similar to yours with some differences - I used vectors as containers and I have a freelist to reduce reallocations in some cases.

I do not have the last huge "Remove NextAfter" (3b395bb) commit in the branch, but it is critical for some workloads.

Without this commit it is almost as fast (or barely slower) on this https://github.com/solvespace/solvespace/files/5962704/case-vents.slvs.gz from here #932 (comment) compared to current master.

But it is drastically faster on setting the last translate to 20 copies in this https://github.com/solvespace/solvespace/files/5974674/LeliaManya2.zip. It takes about 2s while setting it to 6 copies takes 15s on master (I did not wait for 20 to finish)!

Next I will try yours on Windows with Visual Studio 2019.

pjanx · 2021-04-04T16:06:03Z

There should be a test for possible corner cases in RemoveTagged, I made a mistake there that passed but crashed on models.

I see how the freelist simplifies the algorithm in that method (at the cost of some more memory).

Like you, I also thought about putting some assertions in the iterator but... if they weren't there before... maybe it's not important.

I tried to change as little as possible but NextAfter just has to go.

Doing benchmarks was on my list, though the improvement was so immense that I just declared victory.

The next thing that irritates me on Solvespace is omnipresent NURBS failures and I can't really do math. How much do you with @phkahler charge? :) Triangle meshes also take a long time to compute.

phkahler · 2021-04-04T17:18:45Z

The next thing that irritates me on Solvespace is omnipresent NURBS failures and I can't really do math. How much do you with @phkahler charge? :) Triangle meshes also take a long time to compute.

@pjanx There is a NURBS tag for issues. The failures have been getting better and most are tracked in #738. It's an area of the code that is a bit outside most peoples experience, but @ruevs and I have been chipping away at it for a while now. A few other people have had a look too which has been very helpful even when we didn't take their changes directly.

Triangle meshes are IMHO not to be used except when NURBS fail. The code there has not had the optimizations or OMP love applied so it could be slower now than doing the same with NURBS. When STL linking lands I suppose we'll hear more reports of mesh bugs.

SolveSpace usage has been increasing as have the number of contributors, so there is hope ;-)

pjanx · 2021-04-04T19:21:22Z

Yeah, I'm having some seemingly randomized mesh issues with my most complex model already (it NURBS-fails very early). Luckily Slicer mostly takes care of it on import, or the issues aren't too severe in 3D print.

phkahler · 2021-04-04T19:49:28Z

@pjanx the evil spirit patches replaces the list of things with a list of pointers to things. That made all the moving faster and probably reduced cache misses while searching.

pjanx · 2021-04-04T20:12:52Z

Replicating IDs in the index like he seems to be doing would eliminate some cache misses during search-by-ID, at the cost of even more memory. I'm not sure how much it is okay to waste on this--it'd be nice to have a memory profile. Binary search isn't particularly cache-friendly in any case, as it jumps around and prefetching fails to work optimally. In that sense, this general approach will always be slower during whole-list iteration, where it'd be good to always keep it in ID order. A lot of theorising here, besides the general observation of "large models are fairly snappy now".

ruevs · 2021-04-04T20:53:14Z

There should be a test for possible corner cases in RemoveTagged, I made a mistake there that passed but crashed on models.

Unfortunately the test suite is far, far from complete/exhaustive/ achieving good code coverage or being able to catch regressions :-( Still it's better than the "nothing" we had before Whitequark wrote it :-)

This is one reason it would have been extremely hard to merge your last commit here when she was the maintainer. One reason SolveSpace is so stable is that she was (quite rightly) very careful about merging such all encompassing changes that are hard to regression test and also "dirty up" git blame, thus making regressions harder to track down. That said, I totally agree that NextAfrer absolutely has to go - it simply makes it impossible to make IdList efficient for all cases.

I broke RemoveTagged as well until my last commit (the 0xDEADBEEF caught it ;-) )
elemidx.resize(n); // Clear left over elements at the end.

To debug the iterator I wrote a quick and dirty "unit test" of sorts for IdList with a simple (int, char) struct. But there is no good place to commit this.

ruevs · 2021-04-04T21:18:45Z

How much do you with @phkahler charge? :)

For me it's just for fun ;-) Not to mention that I don't really feel qualified when hacking on the NURBS backend, I've forgotten all the linear algebra and calculus so long ago that I constantly have to consult references :-)

But having another mind to bounce ideas off of is extremely helpful. Thank you for that. For example you starting to work on the IdList - it has been in the back of my mind since the Whitequark commit and EvilSpirits patch (so at least four years), but you stirred things up, which lead to this pull request and my similar one.

Another one is Eigen - search the issues and pull requests ;-)

I'll make a tracking issue for performance - just as I did for NURBS in #738 .

pjanx · 2021-04-04T22:04:36Z

This is one reason it would have been extremely hard to merge your last commit here when she was the maintainer. One reason SolveSpace is so stable is that she was (quite rightly) very careful about merging such all encompassing changes that are hard to regression test and also "dirty up" git blame, thus making regressions harder to track down. That said, I totally agree that NextAfrer absolutely has to go - it simply makes it impossible to make IdList efficient for all cases.

As far as my experience goes, 2.3 was much less stable, and a performance minefield that made me re-learn to save often. Converting models to master is a pain, but the software is usable. I'm not sure whom to blame for this newfound stability. :)

I don't really like the "git blame" argument, it often seems like a universal excuse to never change anything. Bisection should take care of tracking regressions.

What I've noticed is that the codebase is in the process of losing some sense of coherence, which this PR doesn't particularly help. And cleaning that up is certainly an offensive against "git blame".

I broke RemoveTagged as well until my last commit (the 0xDEADBEEF caught it ;-) )
elemidx.resize(n); // Clear left over elements at the end.

I just read the source code one more time and found it. I've spent quite a few hours trying to visualise that function and put it to code, mostly because I said I would do it. Sadly I don't have any good methods for making things simpler.

ruevs · 2021-04-04T22:25:14Z

Converting models to master is a pain, but the software is usable. I'm not sure whom to blame for this newfound stability. :)

Whiteqark ;-)

I don't really like the "git blame" argument, it often seems like a universal excuse to never change anything. Bisection should take care of tracking regressions.

In general I agree, but contingent upon available maintainer time vs. desire for stability.

Anyway, I'm pretty sure either this/yours or mine, or a cleaned up amalgamation of the two (they are pretty much the same) will get merged after 3.0.

( Both @phkahler and I have both been known to merge "stupid" stuff in the past ;-) )

ruevs

OK so overall we did roughly the same. I'll steal your "remove NextAfter" commit and try mine with it tomorrow.

I think we should try to "converge" on a single solution. If you have time I would love to hear your comments/critique on my version.

ruevs · 2021-04-04T22:41:49Z

src/dsc.h

+        int offset;
+
+        Iterator(IdList<T, H> &list, int offset) : list(&list), offset(offset) {}
+        T &operator*() { return list->elem[list->indexes[offset]]; }


On mine I kept a local T *elem updated when the iterator is changed so that * and -> can be fast. They may be called many times per loop iteration, and ++ usually only once.

That's one thing optimizers should be good at, unless some aliasing rules break it. I'd be somewhat surprised if it'd be measurable for the added complexity. Making offset point directly into the indexes array is more obvious. And both of these suffer if the array is resized during iteration, which STL just declares as undefined behaviour in such situations.

ruevs · 2021-04-04T22:47:39Z

src/dsc.h

+        Iterator(IdList<T, H> &list, int offset) : list(&list), offset(offset) {}
+        T &operator*() { return list->elem[list->indexes[offset]]; }
+        T *operator->() { return &list->elem[list->indexes[offset]]; }
+        const T &operator*() const { return list->elem[list->indexes[offset]]; }


We should carefully review the need for const methods and in which of the former NextAfter loops we can add const. Luckily this is just a lot of clicking and testing if the compiler will agree ;-)

What do you think - should we remove the unused overloads in the end? I'm split between having a short iterator and a "complete" one with mostly unused methods. @phkahler ? @rpavlik ?

MSVC required me to add const versions here, for std::lower_bound. See the first comment of the PR.

I'm inclined to think just implement the code that's needed on the YAGNI principle. Unused methods are untested and possible traps for the future.

Removed all unused methods from mine.

ruevs · 2021-04-04T22:51:10Z

src/dsc.h

+        Iterator &operator++() { ++offset; return *this; }
+        Iterator &operator+=(int diff) { offset += diff; return *this; }
+        int operator-(const Iterator &i) const { return offset - i.offset; }
+        bool operator==(const Iterator &i) { return list == i.list && offset == i.offset; }


I should fix this - I'm not comparing the lists for equality after I removed the bounds checking in my last commit ruevs@6bc1aa9

It should rather be an assertion rather than a direct test like this.

It was an assertion before I removed it ;-) Maybe I'll put them back in.

ruevs · 2021-04-04T22:59:02Z

src/dsc.h

-        auto it = std::lower_bound(begin(), end(), h, Compare());
-        return it;
+    Iterator LowerBound(H const& h) {
+        return std::lower_bound(begin(), end(), h, Compare());


I opted to use iterators on the index and complicated the predicate on the theory that the vector<int> iterator is simpler than my IdList one and thus hoping it would be faster. But I have not profiled to confirm.

I wouldn't expect a meaningful difference, it boils down to a few pointer operations. Didn't you have to add a forward declaration and complicate CompareId because of that?

Yes. I called `CompareId" the predicate.

ruevs · 2021-04-04T23:02:23Z

src/dsc.h

    void ReserveMore(int howMuch) {
        if(n + howMuch > elemsAllocated) {
            elemsAllocated = n + howMuch;
+            int *newIndexes = (int *)::operator new[]((size_t)elemsAllocated*sizeof(int));


Using vector for the storage gets rid of the manual memory management, and hopefully at least in release mode should be as fast.

That's one of the things that should be unified throughout the codebase, I'm just using what's already there.

I don't even know how the custom allocator library is hooked up.

@Evil-Spirit wanted to use realloc, which should be a trick that std::vector could be able to do as well, though I'm dubious whether it's useful for geometric steps.

The important bit is that idlist is reference semantics, not value semantics, so you can't embed a std::vector, but rather a std::shared_ptrstd::vector - a lesson I learned painfully.

@rpavlik Do you mean to say that once a pointer to an element is "taken out' of an IdList it may be kept around and dereferenced after the list has been modified? And thus the pointer may be invalid?

If this is the case can you point to an example?

But this does not make sense - the original implementation of IdList did not guarantee that the elemnts will not move - in fact it moved them around all the time!

In fact my implementation is "better" in this respect since it never moves elements once they are added because the elemstore is not rearranged when elements are removed:
https://github.com/ruevs/solvespace/blob/3c6e2be4b920d706d8231030467c5271a6d87d85/src/dsc.h#L570
the removed elements are simply added to the freelist.

ruevs · 2021-04-04T23:05:39Z

src/dsc.h

-        // Find out where the added element should be.
+        // Shift indexes to the end of the array, if needed.
        int pos = LowerBoundIndex(*t);
+        memmove(indexes + pos + 1, indexes + pos, sizeof *indexes * (n - pos));


I wonder if insert in a vector<int> is as fast. I'll debug it tomorrow to see what it does.

It should optimize that way.

ruevs · 2021-04-04T23:08:42Z

src/dsc.h


-    T *begin() { return IsEmpty() ? nullptr : &elem[0]; }
-    T *end() { return IsEmpty() ? nullptr : &elem[0] + n; }
-    const T *begin() const { return IsEmpty() ? nullptr : &elem[0]; }


As above - the const iterators may be useful for compiler optimisations, if we can add const in some places where IdLists are used.

I'd be interested in seeing evidence of any difference in this case.

ruevs · 2021-04-04T23:10:27Z

src/dsc.h

-                    elem[dest] = elem[src];
-                }
-                dest++;
+        std::vector<int> elem2indexes(n);


I need to trace through this with the debugger to really get it. Tomorrow.

ruevs · 2021-04-04T23:38:50Z

Converting models to master is a pain

By the way, why is this? Models from 2.0-2.3 should "just open™". If you have some that don't (and are not this #120) open an issue.

pjanx · 2021-04-04T23:45:45Z

Mostly getting rid of all those extra degrees of freedom, rotation takes a while to figure out. I don't remember much else.

Edit: and some constraint failures, so they had to be redone. I didn't keep the old models.

ruevs · 2021-04-05T16:19:58Z

@pjanx
Tried your version - it seems to work just as fast and as correctly as mine. RemoveTagged seems correct, but it's a bit complicated without a free element list.

I created a pull request with my version with a cleaned up history: #1002
It uses your "remove NextAfter" commit.

Next I will try to see if I can measure speed differences between our implementations but I highly doubt it that there will be any.
I will also profile to see if there are some more bottlenecks related to IdList.

After that it boils down to style/aesthetics etc.

pjanx · 2021-04-05T18:57:09Z

I think we need an arbiter. It'd be nice to have a memory profile as well, Valgrind can also do that, IIRC.

Evil-Spirit · 2021-04-07T11:32:03Z

Hi there!

Things I remeber:

memove and realloc is superfast
if you are doing containers yourself it is a bad idea to use another containers like std::vector. you can't control things under the hood of this. If you afraid of manual memory management - just remember, you are doing container for not doing manual memory management around whole program, but inside - you have to do it - instead, you will lost all the benefits of doing this job
when you need map-lookup, you should use binary search. when you need iterating over elements - you may iterate only linear array of storage data - welcome to cache. But NOT ALL THE TIME. Some operations should be performed on sorted arrays, like saving file. You can just break round-trip tests by not iterating in the hv alpahbetic order.

baryluk · 2021-04-18T01:14:35Z

A bit of my input. With concerns from #932

I didn't check the memory usage, but did a bit of speed testing. For changing constraints and these PRs. Just dragging unconstrainted extrusion length by mouse, and looking when the UI is fully interactive again. I used gcc 10.2.1 on Linux, amd64. -O3 -flto -march=native. With no optimisations, these PRs are actually slower than current master, because of extra small functions and indirections in the iterators. But with optimizations on, they are a tiny bit faster.

master, -O3 -flto: ~6.6s, 2.78s for Generate::DIRTY
ruevs, -O3 -flto: ~5.4s, 2.31s for Generate::DIRTY
pjanx, -O3 -flto: ~5.5s, 2.27s for Generate::DIRTY

That doesn't sounds like a lot to me, and I don't think it is worth merging any of them it at the current state. Don't get me wrong, these is good code, and a move in a right direction, but it might require more profiling, tuning and base data structure selection.

(No OpenMP used).

UPDATE:

LeilaManya2, changing last translate group from 3 to 20 repeats.

master: ~138s, 126119ms for Generate::DIRTY
pjanx: ~1.2s, 427ms for Generate::DIRTY
ruevs: ~1.1s, 403ms for Generate::DIRTY

Nice! Ok. I might be convinced.

ruevs · 2021-04-20T20:36:15Z

A bit of my input. With concerns from #932

I didn't check the memory usage, but did a bit of speed testing. For changing constraints and these PRs. Just dragging unconstrainted extrusion length by mouse, and looking when the UI is fully interactive again. I used gcc 10.2.1 on Linux, amd64. -O3 -flto -march=native. With no optimisations, these PRs are actually slower than current master, because of extra small functions and indirections in the iterators. But with optimizations on, they are a tiny bit faster.

master, -O3 -flto: ~6.6s, 2.78s for Generate::DIRTY
ruevs, -O3 -flto: ~5.4s, 2.31s for Generate::DIRTY
pjanx, -O3 -flto: ~5.5s, 2.27s for Generate::DIRTY

That doesn't sounds like a lot to me, and I don't think it is worth merging any of them it at the current state. Don't get me wrong, these is good code, and a move in a right direction, but it might require more profiling, tuning and base data structure selection.

(No OpenMP used).

UPDATE:

LeilaManya2, changing last translate group from 3 to 20 repeats.

master: ~138s, 126119ms for Generate::DIRTY
pjanx: ~1.2s, 427ms for Generate::DIRTY
ruevs: ~1.1s, 403ms for Generate::DIRTY

Nice! Ok. I might be convinced.

I was just about to mention https://github.com/solvespace/solvespace/files/5974674/LeliaManya2.zip when I noticed you updated your comment ;-) Thanks for running it to 20 repeats on master - I did not have the patience :-)

Would you please do the same timing comparison when modifying some dimension in the first group of this model? #841 (comment)

How do you do the measurements on Linux?

phkahler · 2021-05-07T14:39:55Z

By the way, why is this? Models from 2.0-2.3 should "just open™". If you have some that don't (and are not this #120) open an issue.

Off topic, but I know some Lathe groups can have issues. Lathe used to create 2 copies of the sketch entities like at the start and end of an extrude or revolve, even though both copies were exactly the same. I removed those redundant entities so any sketch that constrained against them could break in 3.0 depending which copy the constraint used.

phkahler · 2021-05-12T23:51:35Z

Going with the @ruevs version unless something bad happens.

pjanx force-pushed the master branch from 539dfc9 to bd1789b Compare April 3, 2021 18:35

pjanx mentioned this pull request Apr 3, 2021

Poor Linking performance #939

Closed

pjanx added 6 commits April 4, 2021 02:38

GTK: silence a compiler warning

5f0d25c

IdList speedup 1: maintain an array of indexes

e2fb147

Maintain an array that points to IdList elements in the order of their ID, not used for anything so far.

IdList speedup 2: use iterators

0e3fba0

Use an iterator class internally to index elements using the array established in the previous commit. Remove a few unused methods.

IdList speedup 4: make RemoveTagged fast again

6c55182

Allocate a bit of memory to make it O(n).

IdList speedup 5: use range for rather than NextAfter

3b395bb

Resolves another performance regression of iteration being O(n * log n) rather than O(n).

pjanx force-pushed the master branch from bd1789b to 3b395bb Compare April 4, 2021 01:16

ruevs mentioned this pull request Apr 4, 2021

3.0 Release prep #772

Closed

16 tasks

ruevs reviewed Apr 4, 2021

View reviewed changes

ruevs mentioned this pull request Apr 5, 2021

Optimize IdList by Using an Index and a Proper Iterator #1002

Merged

This was referenced Apr 7, 2021

Port idlist #429

Closed

WIP: Port both IdList and List to std::vector #433

Closed

phkahler closed this May 12, 2021

ruevs added performance internals labels Jan 4, 2022

ruevs mentioned this pull request Jan 9, 2022

Performance testing models (slow speed benchmarks) #1186

Open

ruevs mentioned this pull request Feb 10, 2022

Performance drop with a step and repeat rotating #1214

Closed

ruevs mentioned this pull request Jun 1, 2022

Use nullptr instead of NULL #1250

Closed

ruevs mentioned this pull request Feb 26, 2025

Minor refactoring of surface curve data into its own struct #1536

Open

Eradicate the IdList bottleneck #992

Eradicate the IdList bottleneck #992

Uh oh!

Conversation

pjanx commented Apr 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjanx commented Apr 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruevs commented Apr 3, 2021

Uh oh!

ruevs commented Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjanx commented Apr 4, 2021

Uh oh!

phkahler commented Apr 4, 2021

Uh oh!

pjanx commented Apr 4, 2021

Uh oh!

phkahler commented Apr 4, 2021

Uh oh!

pjanx commented Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruevs commented Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruevs commented Apr 4, 2021

Uh oh!

pjanx commented Apr 4, 2021

Uh oh!

ruevs commented Apr 4, 2021

Uh oh!

ruevs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pjanx Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruevs Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruevs Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruevs Apr 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

pjanx commented Apr 3, 2021 •

edited

Loading

pjanx commented Apr 3, 2021 •

edited

Loading

ruevs commented Apr 4, 2021 •

edited

Loading

pjanx commented Apr 4, 2021 •

edited

Loading

ruevs commented Apr 4, 2021 •

edited

Loading

ruevs left a comment •

edited

Loading

pjanx Apr 4, 2021 •

edited

Loading

ruevs Apr 4, 2021 •

edited

Loading

ruevs Apr 4, 2021 •

edited

Loading

ruevs Apr 4, 2021 •

edited

Loading

pjanx commented Apr 4, 2021 •

edited

Loading

Evil-Spirit commented Apr 7, 2021 •

edited

Loading

baryluk commented Apr 18, 2021 •

edited

Loading