Adding support for opening any archived single file

mwestphal · November 28, 2023, 4:56pm

I’m wondering if there is any interest to add support in ParaView for opening any archived file with ParaView.

What I mean by that would be as follow.

Lets take a file that ParaView is able to open, file.ext
Lets archive it using a compression tool, eg, zip, archive is named, file.ext.zip
ParaView would still be able to open file.ext.zip as if it was not archived at all
Reader detection would still be based on the name of the file, the archive extension would just be ignored
Supported archive extension would be .zip, .tar.gz, .7z, .tgz., .tar, .tar.bz2
Ideally, the uncommpressed file would never be stored on disk

In order to be able to implement the last point, we would need to leverage the relatively recently merged vtkRessourceParser: https://www.kitware.com/faster-data-loading-in-vtk/, however only a handful of reader support it yet. For other readers, the filename would point to a temporary file on disk, which may have unintended side effects.

In any case, this could be enabled only for a subset of readers in order for users to test the feature.

I think this could help many users keep the size of datasets down while not having too big of an impact in terms of I/O times.

What do you think ?

Edit: It will concern single file archive, as the naming suggested.

Andy_Bauer · November 28, 2023, 5:38pm

There’s plenty of times I have an archived file that I need to extract files from. So I like the idea.

wascott · November 28, 2023, 6:05pm

I think it’s a great idea. This opens up the possibility to have clusters post process and compress the data, saving disk space. Further, it would frequently significantly increase read speeds from disk. If you write it up, point Phil and me at it and we will finance it.
@phismith

Kenneth_Moreland · November 28, 2023, 6:17pm

I agree this is a very nice feature. But one design point to consider is how to handle archives that typically have multiple files in them (e.g. .zip, .tar). Should they be supported?

A nice solution would be if the file browser could introspect archives and allow you to navigate into the archive like it was a directory. But if you did that, what would you do with the compression formats that are for a single file (.gz, .bz2, .7z)? Those probably should just be treated as a single file.

ben.boeckel · November 29, 2023, 2:34am

.7z is an archive format and should be inspected as such. I don’t know of libarchive support for .7z files anyways.

I’d be careful with saying “any archived file” because there are ancient formats I’m sure we’re not going to support. Advertise .zip and .tar.*; if someone wants cpio or .a archives supported…we can consider allowing them (modulo libarchive support probably).

mwestphal · November 29, 2023, 8:26am

This is definitely much much harder and will not be supported in a first version.

A nice solution would be if the file browser could introspect archives and allow you to navigate into the archive like it was a directory.

This is not planned at all. In order to be able to do that, you need to extract the data on the fly when browsing, which would be much harder.

In short, it concerns “single file archive” with potentially the exception of file series. I’ve edited my post.

I’d be careful with saying “any archived file”

Yes, this was a bit of a teaser.

mwestphal · December 1, 2023, 2:28pm

After some internal discussion, another potential solution emerged.

Instead of relying on the vtkRessourceParser to abstract the decompression, going one layer deeper, to the filesystem, may prove more usefull to end users.

Indeed, way too many formats rely on multiple file which would not be supported with the proposed implementation, as @Kenneth_Moreland pointed out.

This new solution would be instead mounting the archive in the host filesystem in a temporary directory.
This is a bit similar to what your desktop environnement is doing when navigating into a .zip in the file explorer.

Once the archive is mounted as actual filesystem, ParaView can interact with it as with any file, with the limitation that the filename may be a bit different and point to a temporary directory.

When ParaView closes or the reader is deleted, the temporary folder would disappear.

In theory this sounds perfect, however the implementation would be quite complex and we need to rely on a cross-platform tool to do the heavy lifting. Such a tool could be physfs.

Do not hesitate if you have any feedback.

mwestphal · January 5, 2024, 1:50pm

Looking more deeply into it, it looks like physfs could be a solution however it require some work. Indeed, physfs only provide a dedicated API to be able to use archive as they are files, they are not mounted directly unto the host filesystem.

This means that this needs to be integrated way more deeply into VTK and ParaView to be usable and we wont be able to just mount an archive in a temporary directory.

So are we back to square one ? Well not really, because physfs does provide some interesting services, especially it would let us mount an archive inside the physfs logic which we could then abstract using a dedicated resource stream as suggested in my initial post.

With this, we would be able to implement a dedicated URI loader (see vtkURILoader) with a custom scheme for physfs.

So in a way, this solution would integrate the solution highlited in the first post but rely on a 3rd party for the work.

However it leave an issue, that for a reader to be compatible, it will need to be implemented using vtkResourceStream (and vtkURILoader when needed).

Another issue is that physfs currently only support .zip and .7z, so adding support to .tar.gz and .tar.xz would be needed.

The next step would be to create a prototype of this vtkPhysFSResourceStream to check that all this theory can actually be applied.

The doc of physfs for those interested: https://github.com/icculus/physfs/blob/main/src/physfs.h

savalma · August 6, 2024, 9:55pm

Hi Mathieu, I am just wondering if this was developed or if it is in development but yet to be released. Thank you!

Being able to open .gz files would be great. I am currently having the issue that when I use pyevtk.hl library to create a structuredgrid .vts file, its file size is way too big, making impossible to storage many time steps at a time.

Best,
Sergio

mwestphal · August 8, 2024, 8:02am

I’m afraid this feature has not been funded. Do not hesitate to reach out to Kitware if you are interested.

Adding support for opening *any* archived single file

Adding support for opening any archived single file