How to write VTK files in parallel?

Hey, I am trying to visualize data from a large simulation (~ 1TB data). I have managed to read/process this giant dataset in python using the distributed computing package Dask. My last bottle neck seems to be how to use hundreds of cores (which I set up with Dask already) to help me write VTK files in parallel.

Here is what I have now:

def all_time(itime,**kwargs):
    vtk = pyvtk.VtkData(kwargs["vtk_mesh"], pyvtk.PointData(pyvtk.Scalars(kwargs["wave"][:, itime].compute(), name='U' + kwargs["channel"])))    
    vtk.tofile(kwargs["vtk_dir"] + '/' + 'wave%d.vtk' % itime, 'binary')
    return

for itime in np.arange(ntime):
    all_time(itime,vtk_mesh=vtk_mesh,wave=wave_on_slice_channel,channel=wave_channel,vtk_dir=vtk_dir)

The function all_time basically extracts the wavefield (i.e. simulation data) at a particular time from a dask array, which I think of as just a pointer so is effortless to deal with. The .compute() method is actually dereferencing that point and getting the actual values, and with Dask this is done with hundreds of cores that I have so it’s also pretty fast. Then, I combine this wavefield data with a vtk_mesh that has already been made (with real values in it, ,not just pointers) with pyvtk.VtkData.

And here is the bottle neck, the vtk.tofile. I am wondering how something like this can be done in parallel with many cores?

P.S. please forgive me with the many kwargs. I was trying to parallellize over time steps as well so the function all_time is written in a way that is compatible with dask.client.map to be submitted to my hundreds of cores cluster. However, with the ~6k ish time steps this seems to blow up the scheduler (I tried dask.bag and batch inputs, but these didn’t help) so for now I am simply doing this over a for loop.