vtkDataCache: caching data to avoid duplicate computation

Consider this simple pipeline: reader → calculator. As the user is constructing this pipeline the geometry filter – which is automatically executed to generate polygons for rendering – executes twice on the same geometry + topology: first time when user hits Apply for the reader, and the another time when user hits Apply for the calculator. Both times, the geometry filter generates exactly the same surfaces since the mesh is unchanged, only the fields are changing.

This is just a simple illustration, but we can easily keep on making changes to this pipeline to come up with scenarios where filters keep doing the computation they already did over and over again.

A question that @Will_Schroeder and I have been brainstorming on and off over several months has been how can we avoid this. Ideally, without making any cumbersome changes to the VTK pipeline or adding memory costs. Will started a discussion in a similar spirit here, however, it didn’t really go anywhere conclusive.

Here’s a more detailed proposal. I pose it as more of ParaView concern here, however, it’s broadly applicable to VTK use-cases too.

Let’s add a new global cache, let’s say vtkDataCache.

class vtkDataCache : ... 
{
public:
   // this has to be a singleton to ensure cached data can
   // be shared between different instances of algorithms with
   // ease.
   static vtkDataCache* GetInstance();

   // API to access a cached item.
   template <typename T, typename ...KeyT>
   T* GetCachedData(ArgsT... key);

   // API to add an item to the cache.
   template <typename T, typename ...KeyT>
   void AddToCache(T* data, vtkObject* context, KeyT... key);
};

Let’s consider an example: a modified version of vtkDataSetSurfaceFilter to extract exterior surface from a vtkStructuredGrid – something the geometry filter in our illustrative example employs. The algorithm executes over cells and generates an originalCellIds array to identify cells in the input dataset that are part of the exterior shell. For simplicity, let’s just focus on this step. The algorithm can easily be changed to avoid re-computation as follows:

vtkDataSetSurfaceFilter* self =  ...

vtkStructuredGrid* sg = ... // input structured grid
auto points = sg->GetPoints();
auto ghostCells = sg->GetCellData()->GetArray(
    vtkDataSetAttributes::GhostArrayName());

auto cache = vtkDataCache::GetInstance();
auto originalCellIds = vtk::MakeSmartPointer(
    cache->GetCachedData<vtkIdTypeArray>(
       "vtkDSF::OriginalCellIds", points, ghostCells));
if (originalCellIds == nullptr)
{
    // iterate over cells and compute new original cell ids array
   cache->AddToCache(originalCellIds, /*context=*/ self,
      /* key .... */
     "vtkDSF::OriginalCellIds", points, ghostCells));
}

Let’s see how this works:

Avoid re-computation: this is fairly obvious. The key for the cache is built using input points which would indeed change if the input structured grid was changed in any way that would impact the result. Here, that is the input points and ghost cells arrays. If we had already computed the originalIds arrays for the chosen points and ghost cells, we won’t recompute it. And when we have to recompute, we update the cache.

Avoid memory overhead: this is a tricky one. To understand this, we’ll have to dive into the implementation of vtkDataCache. vtkDataCache stores vtkObject’s. Hence it can manage reference count for the stored object to release it for garbage collection or keep it around.
When something is added to the cache, besides the object being cached, the arguments to AddToCache take in a set of arguments which comprise the key followed by an optional vtkObject* that we call the context. The key can comprise of vtkObject*s and std::strings (any copyable, comparable type can be supported here, but for now let’s just use strings for simplicity). The cache adds ModifiedEvent and DeleteEvent observers to all vtkObjects in key and context arguments. If any of those are fired, the cache simply release the reference to the cached vtkObject, thus releasing it. That is it! This simple trick minimizes memory overhead. With key, we are assured that it any item that affects the cached result changes, the cache will be flushed. With context, we are assured that if the filter itself changes we don’t need to keep around cached data for it. The context could easily be part of the key, however, keeping it separate enables multiple instances of vtkDataSetSurfaceFilter share the cache while still ensuring that if the filters go away, the cache is flushed.

The beauty of this approach is we no longer need to invest in any changes to the VTK pipeline or filters to deal with static meshes. Computation results are now keyed only on input arrays that affect the result and hence will not be recomputed unless necessary.

The cache can be used to store things like array ranges too. Currently, computing array ranges does not take into account masks or ghost arrays. We can easily start supporting that and use cache to avoid recomputing ranges.

The cache can be used by readers too to avoid re-reading data from disk when iterating through time or if array selection, for example, changes etc.

The cache also works for caching this things like point/cell locators without requiring any changes to dataset API. Thus, no need to start storing point-locators with vtkPoints to avoid rebuilding them, the cache can help with that effortlessly!

It’s not limited a specific use-case either, as is the case with vtkOpenGLVertexBufferObjectCache. The proposed cache can not only store vtkOpenGLVertexBufferObject, but also things like ranges, locators, you name it. Also, vtkDataCache does not require any explicit Remove calls. Using modification and delete events fired from key and context objects, it can automatically remove obsolete entries.

Thoughts? Suggestions? Critiques?

4 Likes

I would be very happy to see this implemented :slight_smile:

There will probably be some issues with thread access to the cache, especially as vtkSMPTools becomes more prevalent. You might either

  • lock access to the cache with a mutex or
  • make the static instance() method return a thread_local instance and provide some reduce operation for caches across threads.

This is fantastic! I second Dave’s comment about threaded access. Probably a mutex around modification and access…

Indeed; ensuring the API on vtkDataCache is thread-safe will be important. And also documenting so developers are aware of potential mutex. That may encourage them to move the cache access outside the threaded loop, which would be the best approach, performance-wise, anyways.

Nice work Utkarsh!!!

It seems to use this effectively we need to modify many filters / classes to make them cache aware and take advantage of this capability. I’d like to do this in an organized way so we don’t end up with another feature in VTK that is partially implemented. Locators are very high on my list to address. It would be good to work through one use case to fully understand what the implications are…

This would be a nice feature to have. I was just wondering if a cache size limit would be necessary. In this case a cache policy should be defined : LRU, FIFO, LFU …(as a property of the class or as a template parameter) in order to remove data when the size limit is exceeded.

I’ve started putting together an implementation of the vtkDataCache, so far seems promising. I was worried I was getting too lofty with the variadic arguments but seems like it’s workable. I’m also building a test that will test the guarantees made by the cache. Next, I am hoping to modify the geometry filters esp. the ones used in ParaView. That’d be a good place to test the impact of this. We can coordinate how to expand that to locator uses etc.

Perhaps. Originally, I was toying with the idea that the cache itself will only store weak-pointers. In the case I described, if the geometry filter put out the original ids array it generated as an array in the output dataset, the ids arrays will be held on to by something else and hence the cache will only be making it accessible rather than actually caching it. This falls apart if people use the cache to store intermediate results that are not added the output data object, or for things like locators, ranges etc. Something to ponder for sure.

Here’s a quick implementation: !8347.