Errors when executing connectivity filter in parallel

Hello,

Here I am computing my integrals and still being too clever for my own good (i.e. not really very clever at all!).

Using trace functionality I built a pipeline that computes some integrals. The source of the data is a decomposed OpenFOAM simulation. I take several box clips, then a cell data threshold. In each threshold I run a connectivity filter extracting the largest region, followed by a calculator. The integral is on point data function from the calculator and I also want to save the volume of the region.

I’ve tested the pipeline on a serial data set and it works as expected, but when running in parallel I am seeing the following set of errors on a subset of ranks:

...
(  39.109s) [pvbatch.1       ]vtkPConnectivityFilter.:511    ERR| vtkPConnectivityFilter (0x3128d10): An error occurred on at least one process.
...

(  39.109s) [pvbatch.1       ]       vtkExecutive.cxx:741    ERR| vtkPVCompositeDataPipeline (0x3127a90): Algorithm vtkPConnectivityFilter (0x3128d10) returned failure for request: vtkInformation (0x3fa3f60)
  Debug: Off
  Modified Time: 294148
  Reference Count: 1
  Registered Events: (none)
  Request: REQUEST_DATA
  FROM_OUTPUT_PORT: 0
  ALGORITHM_AFTER_FORWARD: 1
  FORWARD_DIRECTION: 0
...

I suspect this is because threshold data is not available everywhere. Is this the right way to do it in pvbatch or am I missing something? Assuming integration is ok is there a way to suppress the errors? I’ve also tried running UpdatePipeline(time, proxy) after creation of each filter to force passing of the information, but the same error remains.

I am attaching a gist of my script below. Please let me know if you have any suggestions.


reader = OpenFOAMReader(
    registrationName='...',
    FileName=f'...')
reader.MeshRegions = ['internalMesh']
reader.CellArrays = ['alpha.water', 'U']
reader.Createcelltopointfiltereddata = 0
reader.CaseType = 'Decomposed Case'

merge_blocks = MergeBlocks(
    registrationName=f'Merge blocks',
    Input=reader)
# ...
for pn in plate_numbers:
    clips.append(Clip(registrationName=f'Plate {pn}', Input=merge_blocks))
    clip = clips[-1]
    clip.ClipType = 'Box'
    clip.ClipType.Position = [
        clip_dimensions.xmin,
        clip_dimensions.ytop[pn-1],
        clip_dimensions.zmin[pn%2]]
    clip.ClipType.Length = [
        clip_dimensions.span,
        1.0,
        clip_dimensions.length]

    threshold = Threshold(registrationName=f'Threshold over p{pn}', Input=plate_clip)
    threshold.LowerThreshold = 0.5
    threshold.UpperThreshold = 1.0
    threshold.Scalars = ['CELLS', 'alpha.water']

    connectivity = Connectivity(
        registrationName=f'Connectivity over p{pn}',
        Input=threshold)
    connectivity.ExtractionMode = 'Extract Largest Region'

    calculator = Calculator(
        registrationName=f'Calc position over p{pn}',
        Input=connectivity)
    calculator.ResultArrayName = 'Position'
    if (plate_no) % 2 == 0:
        calculator.Function = f'abs({dimensions.zmin[pn%2]} - coordsZ)'
    else:
        calculator.Function = f'abs({dimensions.zmin[pn%2] + dimensions.length} - coordsZ)'

    integrate = IntegrateVariables(
        registrationName=f'Integrate position over p{pn}',
        Input=calculator)

Actually, after looking closer at what’s actually happening - it seems that with the first time the integral executes as expected (though still produces errors). The subsequent time stepping freezes:

This is how I am stepping through time:

for time in animation_scene.TimeKeeper.TimestepValues:
    # animation_scene.AnimationTime=time
    # animation_scene.UpdateAnimationUsingDataTimeSteps()
    for idx, integral in enumerate(position_integrals):
        integral.UpdatePipeline(time)
        SaveData(
             f'{OUTPUT_PREFIX}/position_{id}-t{time}.csv',
             proxy=integral,
             ChooseArraysToWrite=1,
             CellDataArrays=['Volume'],
             AddTimeStep=1,
             AddTime=1)

I also tried using UpdateTime(time=time, proxy=integrals). My position integrals are collected into a list.

I don’t think the time stepping is an issue though as in serial this works fine. But maybe there’s something in this pipeline updating that doesn’t quite work in parallel. Maybe I need to save my entire pipeline to a list and explicitly call update on each step?

Hi @robertsawko , do you see any errors with “No points in data set”?

Any chance you can share a dataset (or tell how to synthesize one) along with the full script so I can investigate further?

Hi, @cory.quammen,

Yes to both. I can see “No points in data set”.

I was just able to reproduce the problem on four-processor dam break problem in OpenFOAM tutorial set. I am going to strip down my script a little as I think/hope clips are immaterial and focus on the connectivity, integration and time stepping.

It seems to freeze when I put into a time filter e.g. Plot Data Over Time.

I hope to be able to post the minimal example later today.

Here (4.0 KB) is the setup. To regenerate you will need OpenFOAM v2212 (or later?) compiled, sourced environment and run Allrun-parallel. The data set is 14MB but it was too big for this server.

The script to run is script.py. I reproduced the problem locally, by running a 4-rank pvserver. If I comment out SaveData and just press “play” in the client, I can see that volume change as expected.

This may be a bug, but please let me know. Two issues is that it produces lots of warnings and errors and when saving all time steps the whole pipeline freezes.

I made also a small video to check that the errors do indeed crop up from ranks having none of thresholded data. You can see here the decomposition, the entire threshold (opaque grey) and the largest region selected by Connectivity filter (purple).

There’s another random issue. The extracted connectivity data remains rendered on non-zero ranks when I jump back to first time step.

There is definitely an bug here. Part of the parallel connectivity algorithm involves creating a sub communicator only on ranks with cells. If a rank in the communicator has cells, it should have points.

Compiling OpenFOAM raises the bar on investigating this further. Could you share the file on https://wetransfer.com/ or a similar file sending service?

Thanks for confirming and some preliminary analysis.

Here’s a link to my personal Dropbox. It’s a bit larger because I included both serial and decomposed data. This pipeline works fine on serial data (comment out CaseType), but freezes in parallel. I can live with errors from empty ranks, but time freezing is a show stopper for my analysis (pun not intended).