Skip to content

Conversation

@fredroy
Copy link
Contributor

@fredroy fredroy commented Feb 3, 2026

I did detect this issue some time ago, about having a controller in a scene was slowing down the simulation so much, especially on macOS. Even if the controller do nothing. And it gets slower and slower more there are Controllers.

DISCLAIMER: this was mostly the work of Claude, which detected/suggested the issues (lookups were slowing down the simulation) and generated the solution.
I just did the benches/tests to make sure it works, but as for everything with sofapython3 and/or pybind11, I cannot prove everything is okay/well-done. So deep review from experts would be appreciated 🫠

In any case, the modifications lead to a dramatic speed up:
(refer to the scene with this PR, which creates an empy scene with a certain number of Controller doing nothing)

Ubuntu 22.04 (gcc12, i7 13700k)

before:
Scene with 1 controllers and 10000 steps took 2.464146852493286 seconds.
Scene with 5 controllers and 10000 steps took 12.076464414596558 seconds.
Scene with 10 controllers and 10000 steps took 24.062500715255737 seconds.

after:
Scene with 1 controllers and 10000 steps took 0.04976940155029297 seconds.
Scene with 5 controllers and 10000 steps took 0.09446001052856445 seconds.
Scene with 10 controllers and 10000 steps took 0.1459205150604248 seconds.

--> with 10controllers, 150x faster... 😮

Windows (MSVC2026, i7 11800h)

before:
Scene with 1 controllers and 10000 steps took 6.102800607681274 seconds.
Scene with 5 controllers and 10000 steps took 27.300215482711792 seconds.
Scene with 10 controllers and 10000 steps took 54.59787082672119 seconds.
after:
Scene with 1 controllers and 10000 steps took 0.12163424491882324 seconds.
Scene with 5 controllers and 10000 steps took 0.18189406394958496 seconds.
Scene with 10 controllers and 10000 steps took 0.27340126037597656 seconds.

--> with 10controllers, 200x faster... 😲

macOS (xcode26, M3 max)

before:
Scene with 1 controllers and 10000 steps took 8.079632759094238 seconds.
Scene with 5 controllers and 10000 steps took 40.43093395233154 seconds.
Scene with 10 controllers and 10000 steps took 79.13048505783081 seconds.

after:
Scene with 1 controllers and 10000 steps took 0.03541707992553711 seconds.
Scene with 5 controllers and 10000 steps took 0.06284904479980469 seconds.
Scene with 10 controllers and 10000 steps took 0.09451079368591309 seconds.

--> with 10controllers, 837x faster... 🤪

@fredroy fredroy added enhancement New feature or request pr: status to review pr: clean-fix pr: highlighted in next release Highlight this contribution in the notes of the upcoming release topic for next dev-meeting Worth discussion at dev meeting labels Feb 3, 2026
Copy link
Contributor

@alxbilger alxbilger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cache system also relevant for other trampoline classes?

@fredroy
Copy link
Contributor Author

fredroy commented Feb 3, 2026

Is the cache system also relevant for other trampoline classes?

To my meager knowledge, I would say yes. But in this case it is especially useful because there were(are) many lookups to call implicitly all the *event() every step.

Summary of Modifications

  The changes in this branch (speedup_controller) optimize the Controller_Trampoline class in the Python bindings by adding a caching mechanism for Python method lookups:

  Key Changes:

  1. New caching infrastructure (in Binding_Controller.h):
  - Added member variables to cache:
    - m_pySelf - cached Python self reference (avoids repeated py::cast(this))
    - m_methodCache - unordered_map storing Python method objects by name
    - m_onEventMethod - cached fallback "onEvent" method
    - m_hasOnEvent / m_cacheInitialized - state flags

  2. New methods (in Binding_Controller.cpp):
  - initializePythonCache() - initializes the cache on first use
  - getCachedMethod() - retrieves methods from cache (or looks them up once and caches)
  - callCachedMethod() - calls a cached Python method with an event
  - Constructor and destructor to properly manage the cached Python objects with GIL

  3. Optimized handleEvent():
  - Previously: every event caused py::cast(this), py::hasattr(), and attr() lookups
  - Now: uses cached method references, avoiding repeated Python attribute lookups

  4. Optimized getClassName():
  - Uses the cached m_pySelf when available instead of casting each time

  Purpose:

  This is a performance optimization that reduces overhead when handling frequent events (like AnimateBeginEvent, AnimateEndEvent), which can be called many times per simulation step. The caching eliminates repeated Python/C++ boundary crossings for method lookups.
@fredroy fredroy force-pushed the speedup_controller branch from 57debb4 to 040d3e6 Compare February 4, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pr: clean-fix pr: highlighted in next release Highlight this contribution in the notes of the upcoming release pr: status to review topic for next dev-meeting Worth discussion at dev meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants