Performance problems: data format & amount of data per pvserver?

(Erik Keever) #1


I am hoping to use PV to visualize a structured grid DNS with ~900M cells. As a prelude I’m trying to get client/server mode working and performant on my local cluster and… there are… some performance problems to put it mildly.

The cluster should not be presenting a problem (100Gbps interconnect, 14 cores and 128GB ram per node, GPFS filesystem) and my laptop’s connection ought to be adequate (1Gbps, 0.3msec ping to head node).

However even with a far smaller test dataset (30M cells) performance with small (1/2/4) and somewhat larger (8/16) numbers of pvservers is far worse than literally just single-threaded paraview accessing the same data on my laptop: It takes nearly two full minutes of wait to (1) open (2) ‘apply’ (3) generate a contour and 1-2 minutes to change frames.

What format/formats work well for larger datasets (suppose ‘larger’ is taken to mean >1GB of binary data per frame)? I have can currently output both ensight gold directly, or use the vtkwrite Matlab utility to generate .vtk files. Is there any giveaway test I can take to our admins to diagnose the problem?

Thanks in advance,

(Utkarsh Ayachit) #2

Are you by any chance seeing a message whet you connect to the remote server that that indicates remote rendering is not supported since DISPLAY not accessible or any such thing? For what you say, seems me you’re ending up local rendering configuration. To confirm that, try going to the Settings dialog and check Show Annotation on the Render View tab. What does the annotation state for Remote/parallel rendering: ?

(Erik Keever) #3

Yes, I am getting forced to local rendering. Unfortunately our cluster nodes aren’t running X servers; I’m trying to get a hold of admins about running them on some of the GPU nodes (problematically, attaching X to a GPU also activates the kernel runtime-limit watchdog, and prevents cuda-gdb from using that gpu).

I didn’t think it would be such a problem for relatively simple meshes as my test dataset/contour generates.

(Utkarsh Ayachit) #4

For testing, try this…download the osmesa binary from (we started distributing those since 5.6) and then use that as the pvserver (without MPI) and see how that performs. You should be able to use remote rendering with the osmesa binary.

Totally depends on the size of the geometry that the contour filter generates. The timer log will tell you what part most of the time is being spent, so closer inspection there may be next in order, but try the osmesa route first…let’s isolate the issue.

(Erik Keever) #5

Okay, I have PV using llvm/mesa+gallium on the server side and indeed the “remote rendering unavailable” message did not pop up when I connected to the pvservers this time, so that’s not the problem any longer.

The mesh generated by my contour operation is reported as 900K cells, 450K points and totalling 49MB (the remote render thresh is the default 20M so I am in fact using remote render). Once the actual mesh is created, I can move the camera around just fine (both with remote render and at full 60fps with local rendering before).

I am still seeing the same times for open/apply/contour/change frame: 15sec, 1m15, 20sec, 1m20 in change.

In particular I note it’s always reporting fully 30 seconds for the read vtk frame operation, which seems unreasonable given the data file size (1.6GB) and the performance of the underlying hardware (a serial instance of dd can read the file off the network FS and copy it to /dev/null at a rate of 1.1GBps). And the timer log (remote osmesa) is indeed reporting that the render times are O(1sec) or O(0.1 sec) depending on what it’s doing, so it’s not slow rendering that’s the problem.

I use the most convenient vtkwrite utility
to generate the .vtk files; Having not read its code in detail (because I’m hoping for a turnkey solution that I don’t have to do that), is it possible/likely it’s somehow writing them dumb and causing a massive slowdown?

(Utkarsh Ayachit) #6

Can you share the dataset? Also what version of paraview are you using? The reader indeed seems to taking the most time. Splitting the file into multiple partitions may be the solution if reading the same file from multiple ranks is causing the issue – which is likely.