Proposed General Rule for Save and Load Supported Files on Remote and/or Local Filesystems

patchett2002 · September 22, 2020, 5:55pm

Proposed General Rule for Save and Load of Supported Files on Remote and/or Local Filesystems using the ParaView GUI

Files that are generally considered small: images(png, jpg, etc…), Scripts (pvsm, py, catalyst, cinema, etc…) should have the option to be saved and loaded locally or remotely.
Files that are generally considered big: all data files (.vt*, pvt*, exo, .h5, etc…) should only have the option to load from the remote file system

Question: if this were adopted, would it be possible to have single “open” that could handle pvsm and data files? (Perhaps the data files are greyed out on local filesystems?)

This Topic is an offshoot of: PVSM ParaView State - Saving-Loading-Productivity

wascott · September 22, 2020, 6:17pm

In my mind, a clarification would be:

Proposed General Rule for Save and Load of Supported Files on Remote and/or Local Filesystems using the ParaView GUI

Files are loaded from the file system accessed by the ParaView server. If that is local, the local file system is accessed. If that is remote, the remote file system is accessed. This includes data and state files. (NOTE - What about Python trace files?)
Files that are generally considered small: images(png, jpg, etc…), Scripts (pvsm, py, catalyst, cinema, etc…) should have the option to be saved locally or remotely.
Files that are generally considered big: all data files (.vt*, pvt*, exo, .h5, etc…) should only have the option to be saved to the file system accessed by the ParaView server.

It will be possible to have a single “open” that could handle pvsm and data files. pvsm and data files will be on the same file system.

wascott · September 22, 2020, 6:17pm

@Kenneth_Moreland @utkarsh.ayachit @cory.quammen

Kenneth_Moreland · September 22, 2020, 10:37pm

tl;dr
Save and open everything on the server.

I had advocated earlier to have Open only open on the server. Most stuff that can be Saved should be saved on the server. The possible exception might be things like images and animations. But as was brought up during our meeting, even things like that can get very large. Plus, it is just confusing trying to figure out what gets saved on the client and what on the server.

What I would prefer not having is extra options to choose which file system we are loading from or saving to. That just gets confusing for users and is a PITA to implement.

wascott · September 22, 2020, 11:10pm

Ken,
This will then require the user to manually move screenshots and movies to the client side around 98% of the time. This for some users that don’t understand Linux well. Is this really what we want?

Kenneth_Moreland · September 23, 2020, 3:29am

You could argue that images should be written on the client vs. the host. Arguments for that would be (as you have said)

Images are almost never opened back up in ParaView.
Images are almost always used with apps on the desktop (client)
Images (individually) have a fixed size.

I’m OK with choosing to save certain things to the client. But the main point is

Let’s not mess with user options to Open or Save on either client or server.
Our default choice should be to Open/Save on the server. It’s the burden to argue for doing it on the client.

utkarsh.ayachit · September 23, 2020, 11:52am

I agree with @Kenneth_Moreland.

woodscn · September 23, 2020, 2:54pm

@Kenneth_Moreland

I’m confused by your main points. It seems like the first is saying that we shouldn’t give the user an option to choose a file destination (item 1), and that you think everything should happen on the server (item 2). That seems in conflict with your statement that you’re ok choosing to save certain things to the client.

Also, people are getting confused between Image Data (potentially large, 3d, uniform, structured mesh data) and images (individually small, non-interactive, .png-type files). The first kind is opened in ParaView all the time, especially with large datasets.

The only point of this whole discussion is to give users additional options. If we’re not going to do that, then we should stick with the setup we currently have, where data lives on the server and images/states/etc. live on the client. ParaView today is already in the business of moving data back and forth from client to server. I understand not wanting to get into the mess of doing that with very large files, but let’s not apply that standard in cases where it doesn’t really matter.

wascott · September 23, 2020, 5:32pm

I believe I wasn’t clear yesterday as we discussed this topic. Here are my points. (Note, thanks Nathan, I missed the nuance with regards to Image Data. To me, it is just data.).

We have four cases that are important.

Data reads (Datasets, Image Data (single images or stacks), basically something read by a reader). Dataset readers belong on the server. We are NOT in the business of moving large data. That is the user’s responsibility, if they need to move it.
Data writes. These should stay with the server.
Script type files (trace files, state files, etc.). Lets break these down. The State files really should be with the data. Keeping track of data, and state files on different systems sucks. Trace files. This one is hard. Macros belong on the client side? Standalone traces belong … with the data? (After all, you are making the trace to then batch the data). This is too confusing. Leave all scripts on the client.
Image and movie output. This one is really hard. 95% of the time you want the image and especially movies on the CLIENT. If you are doing a viz from Sandia to Trinity, I don’t want to have to pull movies back from Trinity to view them, or try to run display up on trinity, X forwarding the display. BUT, for Cinema and Slycat, and a few other use cases, you want the images on the server. This is the one use case where I strongly believe users should have the option to write on the client or server side.

My $0.03 (Inflation).

woodscn · September 23, 2020, 6:17pm

Thanks for breaking those use cases down. I agree with pretty much all of your points, with one exception. I think we should use your recommendations as defaults, give the user the option to choose where they want to save things, and only disallow that option when we have a compelling reason (i.e. moving large data around). You said it yourself in your answer: “Trace files. This one is hard… Image and movie output. This one is really hard.” I just can’t understand how it could be so difficult to let a user pick where they want to save it, and potentially throw an error if the file is bigger than 10MB or something.

Kenneth_Moreland · September 23, 2020, 6:20pm

Yeah, I started to say everything should go to the server, then I quickly backed off that when @wascott argued that screenshots and animations should be saved on the client.

When I said “image,” I explicitly meant images generated from screenshots and animations. For the purposes of this discussion, I don’t differentiate “Image Data” (e.g. vtkImageData) from other VTK data structures (e.g. vtkUnstructuredGrid).

I disagree with this statement. The point is not to necessarily give additional options. The point is to improve user experience. Adding options does not necessarily aid in user experience. In fact, a common criticism of ParaView is that it pushes too many options on users. I’m pushing back on giving users a choice on whether files go to the server or client because I personally will be annoyed if every time I load or save a file I get queried to select client or server.

This discussion started with a feature request from @patchett2002 to consolidate the Open (data) and Load State operations into one simple Open. Such a feature improves user experience (arguably by removing unnecessary options). My suggestion is to move the open/saves to be principally on the server so that this feature can be implemented and be more consistent. I think making the user constantly decide whether operating on client or server makes the overall experience worse.

woodscn · September 23, 2020, 6:32pm

With respect to everything you said about reducing the burden on users to pick options: I agree completely. I just think it should be handled with an intelligent default choice and an out-of-the-way menu to change it.

Andy_Bauer · September 24, 2020, 12:18pm

Script files are somewhat useless (at least without hand editing) if you’re saving stuff on the client like images in a non-working directory but save the script on the server to be run in batch later on. I like being able to save a script on the server because not only does it save me a step but it also will likely be closer to the directory that I want it saved in as opposed to my home directory which is the default when using scp. Not only that, but where I want it to be saved is likely something like /p/mnt/andybauer/project1/somethingelse/anotherthing/ohyeah/andthistoo/thefile.py

So I can see my workflow with the new capability something like:

Connect to remote machine
Start Python Trace
Do some stuff (likely saving an image on the server, maybe in a separate directory)
Stop Python Trace
Edit the script a bit in the editor
Save the script on the remote machine
Login to the remote machine and launch my batch job

Now is this convenience worth the extra complexity in the GUI having to remember where the script and image is being saved? I don’t know. Maybe advanced options or some other GUI interface would make this an improved user interface. I’m with Ken on this and making ParaView a better user experience rather than always looking to add options.

wascott · September 29, 2020, 1:50am

OK, I think I know what I would recommend. This is somewhat working off of what we have, and what we have presented above.

For data reads of all kinds, stays with the server. (Current functionality.) Reason, never move data.
For data writes of all kinds, stays on the server. (Current functionality.) Reason, never move data.
For screenshots and animations, stays on the client (Current functionality.). Reason, with few exceptions, that is where users need these data products.
For Extractor products of all kinds, stays on the server. (New, implemented functionality.). Reason, data always stays with the server, and this solves the few percent of the time you need images or movies on the server. (Slycat, Cinema, images that stay with datasets)
For State files of all kinds, stays on the server. (New, not implemented functionality.). Data, and the ability to recreate ParaView state, belongs together. Note that the definition of “state file” used here may/will include thumbnails in the future.
Traces of all kinds (Macros, within ParaView Run Script, trace output). (Current functionality.). Macros and within Paraview GUI Python View/ Run Scripts need to be on client side. Thus, put all traces client side. When python trace is ready for pvbatch, user can copy them to server. This happens relative infrequently, and is not a large burden.

Now, everything has it’s “place”, and users (with training and experience) know where everything goes.

Thoughts?

woodscn · September 29, 2020, 4:34am

This introduces a new problem, where the two different ways of saving images behave in different ways for no reason that is intuitive to the average user. I’m also not sure that any benefits are worth introducing a breaking change, like the location of state files. You just know that people are going to accidentally save their .pvsm files to scratch space and not realize the mistake until after they’ve been purged.

I like the rule: Never move data. It’s like the Apply button; an imperfect, reasonable, simple-to-explain rule. I’ll note that we do have a GUI option to ignore the Apply button, though.

utkarsh.ayachit · September 29, 2020, 10:30am

This is inconsistent with pvsm state files. The reasons stated for which we say PVSM state files should be on server-side are still applicable to python traces and python states i.e. Python traces/states could refer to data files (as often as pvsm files) and hence should behave similarly.

wascott · September 29, 2020, 4:28pm

@utkarsh.ayachit. My reasoning primarily is Macros. They need to be in a known place, and you want a macro to work across numerous clusters. Thus, Macros should be client side? If Macros are client side, it would be horribly confusing (especially for new users) to have traces being written sometimes to server side… Further, I always think of pvbatch users as more experienced users, who know enough to easily make the copy?

The reason state files don’t belong on the client side is if you want to archive your data, you need to copy the state file to the server side, archive your data (or maybe it’s a big ensemble), and before you can use said state file, you need to move it back to the client? That doesn’t make sense.

wascott · October 13, 2020, 12:10am

OK, after talking to Utkarsh and pondering, I would recommend we leave state files and traces where we leave them now - on the client side. This is simple, and expected behavior. Further, it is a simple scp to move files between client and server.

This means that, with Extractors, we have what is desired. No changes required to where different files are written.

Kenneth_Moreland · October 13, 2020, 6:45pm

Doesn’t leave us with the problem where we started with where we wanted one File -> Open for either data files or state files, and that won’t work if data is on the server and state files are on the client.

woodscn · October 13, 2020, 8:20pm

Yep; we’ve basically gone around and back to where we started. I still don’t see why we couldn’t just let the user select “client” or “server” on the left-hand side of the Open File dialog, or in the drop-down menu up top. Is this some sort of OS-provided code that is impractical to mess with, or something?