Import giant(50GB) data file into Paraview

ZiweiWu · November 7, 2018, 10:51am

Hi all,
I tried to import a 50GB netCDF file into Paraview but always run into Memory error like this:

So I’m wondering if there’s a limit on the size of input file in Paraview or the memoryError happened just because my local memory was too small? And is there any way that I can run this on my local machine or it has to be HPC?

Thanks!

cory.quammen · November 7, 2018, 2:20pm

Do you have more than 50GB of RAM? How many timesteps are in the file?

banesullivan · November 7, 2018, 3:30pm

I see this is coming from a vtkPythonAlgorithm. Is this a custom netCDF reader plugin?

If so, you may want to update the request data script to only read chunks of the netCDF file based on your current time step or data extent.

ZiweiWu · November 8, 2018, 3:15am

No, local machine only have 16GB RAM, and there’re 25 timesteps in the file.

ZiweiWu · November 8, 2018, 3:24am

Yeah, this could be a solution, I’ll try it out. And yes, I’m using the PVGeo-CMAQ reader, the file contains 25 timesteps (looks like the file I sent you last time but has bigger scale). Do I have to update script in both PVGeo and PVGeo-HDF5 or only update PVGeo-HDF5?
Thanks!

banesullivan · November 8, 2018, 6:57pm

You will only need to update the code for the PVGeo-CMAQ reader in PVGeo-HDF.

To outline the changes needed, instead of reading the data up front (this is what the PVGeo-CMAQ reader is currently doing), you’ll update it to get the requested time step and pass that timestep to the reading function to only read the requested part of the file.

So change lines 190-193 to pass the requested timestep on every RequestData call. Then change _ReadUpFront to take the timestep index and restructure that function to only grab the needed data for each time step.

This might be a bit clunky, but it should allow you to visualize the 50GB data file on your local machine.

ZiweiWu · November 11, 2018, 1:33am

Got it, I’ll try it out later. Thanks Bane!