Displaying 3D data on a GPU server

I displayed 3D data using ParaView-5.13.0-egl-MPI-Linux-Python3.10-x86_64 on a GPU server.
This is taking a long time to display and the GPU is not being used. Is this normal behavior? (I checked the status with nvidia-smi.) The memory display in the bottom right of the Paraview screen shows the CPU, and since the number here has increased, it seems that it has been loaded into the CPU’s memory.
Also, if this is normal behavior, is it possible to reduce the load time from the next time onwards by setting the cache, etc.?
The goal is to reduce the load time when repeatedly analyzing large amounts of data.

You need to check if you are bound because of I/O, processing or rendering.

This is pretty easy to do, you can use the Tools → TimerLog to check that.

the load time from the next time onwards by setting the cache, etc.?

Yes, but first you need to make sure what is taking time

The TimerLog is below.
The local “PropertiesPanel::Apply, 875.217 seconds” and the server “Execute output.vti id: 23972, 435.995 seconds” are very large, is there any way to improve this?

-------------------------------------------------------------------------------------------------------------------------
Local Process
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
    Interactive Render (END event without matching START event)
Interactive Render,  0.019 seconds
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.01 seconds
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
Process Proxy definitions,  5.775 seconds
    vtkSIProxyDefinitionManager Load Definitions,  0.12 seconds
Process Proxy definitions,  7.207 seconds
    vtkSIProxyDefinitionManager Load Definitions,  0.11 seconds
CellCenters::GatherInformation,  0.095 seconds
DataSetSurfaceFilter::GatherInformation,  0.064 seconds
CellCenters::GatherInformation,  0.064 seconds
DataSetSurfaceFilter::GatherInformation,  0.107 seconds
RenderView::Update,  0.067 seconds
Interactive Render,  0.04 seconds
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.024 seconds
Process Proxy definitions,  9.391 seconds
    vtkSIProxyDefinitionManager Load Definitions,  0.71 seconds
Process Proxy definitions,  10.198 seconds
    vtkSIProxyDefinitionManager Load Definitions,  0.519 seconds
CellCenters::GatherInformation,  0.064 seconds
DataSetSurfaceFilter::GatherInformation,  0.2 seconds
RenderView::Update,  0.062 seconds
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
XMLImageDataReader::GatherInformation,  0.067 seconds
PropertiesPanel::Apply,  875.217 seconds
    ParaViewPipelineControllerWithRendering::UpdatePipelineBeforeDisplay,  872.282 seconds
        XMLImageDataReader::GatherInformation,  0.271 seconds
    ParaViewPipelineControllerWithRendering::Show,  2.042 seconds
        XMLImageDataReader::GatherInformation,  0.096 seconds
        ParaViewPipelineControllerWithRendering::Show::CreatingRepresentation,  1.897 seconds
            vtkSMRepresentationProxy::GetRepresentedDataInformation,  0.032 seconds
            vtkSMRepresentationProxy::GetRepresentedDataInformation,  0.037 seconds
    RenderView::Update,  0.094 seconds
    RenderView::Update,  0.063 seconds
    vtkSMRepresentationProxy::GetRepresentedDataInformation,  0.033 seconds
        Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
    Render (use_lod: 0), (use_distributed_rendering: 0), (use_ordered_compositing: 0)
RenderView::Update,  11.639 seconds
vtkSMRepresentationProxy::GetRepresentedDataInformation,  0.037 seconds
Still Render,  0.891 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.017 seconds
RenderView::UpdateLOD,  0.478 seconds
Interactive Render,  0.732 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.012 seconds
Interactive Render,  0.967 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.012 seconds
Still Render,  0.581 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.011 seconds
Interactive Render,  2.267 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.01 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.011 seconds
    Still Render (END event without matching START event)



Server
RenderView::Update,  0.066197 seconds
RenderView::Update,  0.063687 seconds
Execute output.vti id: 23972,  435.995 seconds
RenderView::Update,  0.093006 seconds
RenderView::Update,  0.062857 seconds
RenderView::Update,  11.6392 seconds
    vtkPVView::Update,  11.4835 seconds
        Execute UniformGridRepresentation1/SurfaceRepresentation id: 24280,  11.4778 seconds
Still Render,  0.494377 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.324693 seconds
    OpenGL Dev Render,  0.038332 seconds
RenderView::UpdateLOD,  0.477957 seconds
Interactive Render,  0.019528 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
Interactive Render,  0.421796 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
Still Render,  0.227793 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
    OpenGL Dev Render,  0.166815 seconds
Interactive Render,  1.52627 seconds
    Render (use_lod: 1), (use_distributed_rendering: 1), (use_ordered_compositing: 0)
Still Render,  0.047278 seconds
    Render (use_lod: 0), (use_distributed_rendering: 1), (use_ordered_compositing: 0)

This is I/O, reading the file from the disk, depending on your filesystem it can be beneficial to read the file using a distributed server but this is not a given.

Alternatively, save a less detailed version of the data or use a AMR data format.

Thank you.
I understand that the server’s “Execute output.vti id: 23972, 435.995 seconds” is caused by disk I/O.

What is the cause of the local “PropertiesPanel::Apply, 875.217 seconds”? The client’s task manager seemed to have plenty of CPU and memory.

This is probably the I/O too.

The cause of the problem was that the server data was stored on a file server. When I replaced it with the server’s internal disk, Execute output.vti id and Local Process ParaViewPipelineControllerWithRendering::UpdatePipelineBeforeDisplay improved.
However, ParaViewPipelineControllerWithRendering::UpdatePipelineBeforeDisplay is still 60s. Is there any way to improve this further?

Sorry for continuing.

This may not be related to the content of this article, but when I ran nvidia-smi, the following output was obtained.

/paraview/bin/pvserver-real is only using 400MiB of GPU memory. Is this limited by the paraview settings?

Mon Dec 9 23:22:36 2024
±--------------------------------------------------------------------------------------+

| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+

| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. | |==========================================+=======================+======================| | 0 NVIDIA RTX A6000 On | 00000000:31:00.0 Off | Off | | 30% 25C P8 29W / 300W | 416MiB / 49140MiB | 0% E. Process | | | | N/A | ±----------------------------------------±---------------------±---------------------+ | 1 NVIDIA RTX A6000 On | 00000000:B1:00.0 Off | Off | | 30% 24C P8 32W / 300W | 12MiB / 49140MiB | 0% E. Process | | | | N/A | ±----------------------------------------±---------------------±---------------------+
±-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |========================================================================================| | 0 N/A N/A 2885 G /usr/libexec/Xorg 4MiB | | 0 N/A N/A 3757912 G /paraview/bin/pvserver-real 400MiB | | 1 N/A N/A 2885 G /usr/libexec/Xorg 4MiB | ±-----------------------------------------------------------------------------------------+

That depends on what you are doing with the data. Usually, this number corresponds to extracting a surface, which can take a long time with a big image data. What data are you displaying ?

/paraview/bin/pvserver-real is only using 400MiB of GPU memory

That is not small by any means, ParaView is only sending what is really needed to the GPU.

You seems to believe that your GPU is not used to its full potential, but it is.