Superbuild USE_SYSTEM_ flags not working properly

This is probably more an “issue”, but at the moment it is a question - because I want to get this working!

Symptom: I managed to finish my PV custom application with plugins so that it runs locally. Then I managed to get also a “superbuild” run working, including the final ctest -R … that generates the installation package (for Windows, with NSIS). And also the installation went through - only the application did not start!

And it turned out that there is a wrong version of hdf5.dll installed with the generated package.

Further research showed that from the /install subdirectory of the superbuild “target” directory, the software does start indeed, and it contains also another version of the hdf5.dll, and the same also with libpng16.dll and zlib1.dll.

Now the versions that went into the installation package must have originated from my local OSGeo4W installation (see https://www.osgeo.org/projects/osgeo4w/ and https://trac.osgeo.org/osgeo4w/). My first check to just avoid the problem: update that package so the libraries would match and I do not have to care about anything. However, their “current” versions of the dlls are not more current than what I have already installed, so nothing to gain there - if I do not want to break the consistency of that OSGeo4W package, which of course I do not want.

But actually I already have correct dlls that do work - and they were obviously built during the superbuild, but only not finally put into the package!

And there seem to be even options now taking care of the issue: USE_SYSTEM_, which of course I set to NOT using the system versions for the above dlls, so my CMakeCache file shows it correctly:

...
USE_SYSTEM_boost:BOOL=OFF
USE_SYSTEM_bzip2:BOOL=OFF
USE_SYSTEM_hdf5:BOOL=OFF
USE_SYSTEM_numpy:BOOL=OFF
USE_SYSTEM_png:BOOL=OFF
USE_SYSTEM_python3:BOOL=OFF
USE_SYSTEM_qt5:BOOL=ON
USE_SYSTEM_zlib:BOOL=OFF
...

So exactly what I want! But the point is that the settings do not seem to be respected - which is for me obviously an issue.

My question is only: I want this thing to be running as soon as possible, not only once it would be eventually fixed, so I am more or less looking for the script or code where the actual transfer from the …/install folder to the …/_CPackPackages/… folder happens so I can patch it and continue working.

After finding already so many other things in these endless cmake scripts I may eventually even find that one, but if somebody would have a little hint that shortens the exploration I would be happy and say many thanks!

There may be issues with projects finding an hdf5 that the superbuild did not build. My initial suspicions lay with netcdf. I’d check to make sure that its build found the hdf5 the superbuild built rather than some other version.

Current state of the research: At some point during the “ctest -R …” run, an “install.manifest” file is written that seems to be finally used as input for CPack - and which contains the wrong copy source for all the modules that are NOT supposed to be copied “from the system”!

Now my research goes towards finding out how and where this “install.manifest” is written! So far I did not find the right location in the scripts…

But to me it looks pretty obvious that the USE_SYSTEM_… flags are set - and then totally ignored.

Some keys found so far: the SuperbuildInstallMacros-win32.cmake scripts, the fixup_bundle.windows.py program - but even throwing logging printout statements into these files, the riddle cannot be solved: how do the external libraries like hdf5, zlib etc. get into the install.manifest? That Python program is definitely putting things there, but it all refers to either the Windows system or my own application…

Next day - hopefully more successful than yesterday!

Preliminary “solution” - but to be seen if it holds!

  • It looks like there is a mechanism that looks for “dependencies” during the packaging process, and if some unit is asking e.g. for some hdf5 library, it seems to look for it along the PATH, and does not care about any USE_SYSTEM_xxx settings on or off.

  • From this I concluded that there might be a discrepancy between linking, “installing” and final packaging phase, so not all parts of the program are looking for the same versions of the same libraries.

  • For this reason I simply started the selective installation procedure of the OSGeo4W tools and simply deinstalled the hdf5 stuff selectively. (Uninstalling the entire system is not desirable because a) I am using it and b) it is still required for the GDAL stuff that I am including in my ParaView and derived builds - and for that there is no option to compile during the superbuild.)

  • Before doing the packaging, I made sure that the /install path is first in the PATH variable, so looking for “system” things will instead get the own versions.

  • Now I turned the USE_SYSTEM_xxx flags ON for some of the libs, including hdf5 - and with this it is now indeed using the self-compiled version - because there simply is no other “system lib” available!

With this I could finally generate a package that can be installed, and the resulting exe is running. Well, so far on my development system, so I have to make sure now that it is not only “accidentally” working (like by grabbing some libs that are not part of the installation) but in a reliable way also on the systems of customers.

The fixup_bundle.*.py files are basically implementations of the linker logic for each platform so that we know what we’re supposed to use and move into the package. They do have hooks for extra search paths (but warn when they’re used because that generally means the binaries aren’t built properly). As for not finding your built HDF5 libraries…

Ooh, this is almost certainly true. I suspect where we do Qt5_DIR information to get it into the packaging process (via superbuild_export_variables) and where it gets used (in the paraview.bundle.cmake files; look for library_paths variables), we’d need to do more to get it done for the other use-system-aware packages (certainly on Windows, maybe on other platforms).

Well, Windows lacks RPATH, so all you have is PATH to look for DLLs. Linking uses .lib files (which for shared builds includes a DLL name to look for at runtime). Since all you have is the name, you need to search in paths for the given DLL name.

The ParaView superbuild is definitely tuned for how we build our release packages. This includes a lack of coverage for making packages with external dependencies other than Qt (the other system-able packages are to support supercomputer deployments, but they don’t use the packaging logic).

Thanks for this: It “converges” with many things that I found so far (after almost 3 full days of research).
Basically the problem is that there are different methods to look for libraries and dependencies, and sometimes they are working against each other in a rather destructive way:

  • There is the USE_SYSTEM_… flags that seem to basically trigger the download and build of a module if they are set to OFF

  • Then there is the “find_package” system, ie. looking for a config.cmake file, in order to include an external module in the case of US_SYSTEM… set ON

  • And finally there is that method within the fixup_bundle…py files that try to follow (or more or less use) the system specific library loading mechanism

Variants that I tried so far:

  • Set USE_SYSTEM_hdf5 to OFF - because this is what I actually want: use a version of that module that is proven to cooperate with the current version of PV 5.7. This fails because it does indeed download and build a correct hdf5 module, but then that fixup_bundle… logic still takes that installed version of the module that I have in my OSGeo4W package - because it appears on the PATH in order to be accessible by the other parts of that package.

  • Set USE_SYSTEM_hdf5 to ON - in order to use the same variant of hdf5 at least everywhere consistently. In that case that hdf5_config… mechanism works and finds indeed also the OSGeo4W package correctly, but then still complains that it “could not find HDF5”. Well, there is an additional hint: “version 1.10.4 found”. So basically it tells me “it is found but it is not found”, and I concluded that it may look for a different version - and indeed: the version that was downloaded and built with the first attempt was version 1.10.5. Well, this explains already a lot!

  • So I went and downloaded a hdf5 1.10.5 package sources and built them independently. Then I made sure that the new binary is on the PATH before the OSGeo4W package (for the fixup_bundle… thing), and I provided a hdf5_DIR property, in order to “guide” the “find_package-oid” mechanism. Now the thing fails because the hdf5_config.cmake seems to initialize certain variables in a wrong way, so I get an error message HDF5_TOOLS_DIR does not exist, and it seems to be C:/dev/bin which indeed does not exist: the HDF5 tools are not sitting there but in another folder.

  • The previously mentioned approached that “worked” was actually not really doing it: I set USE_SYSTEM_… to ON and tried to make sure that the fixup_bundle thing actually finds the self-built version from the superbuild - and it worked indeed, but only because before that I had already done a buld with USE_SYSTEM_ OFF without fully deleting the target path of the superbuild, so it worked with a leftover from the previous run - not a really working solution.

And like you say, the method of including external modules is not really made for the superbuild: it works for the actual build (if versions are matching), but then it fails in the superbuild. So in order to finally still come to a “clean” solution, I should set USE_SYSTEM_HDF5 again OFF and then somehow find a way that the fixup_bundle… mechanism does NOT find the version from OSGeo4W, but the one that is built during the superbuild.

This is hard because I cannot remove the OSGeo4W stuff completely from PATH and anything (like temporarily renaming the folder etc.) - because the superbuild still needs the GDAL package from there! And from what I have seen so far I am getting the impression that the GDAL works also with HDF5 version 1.10.5 - which would otherwise be the next problem of course. Anyway - this is how it LOOKS LIKE so far…

So one more finding: what makes the research also difficult is the fact that I normally do not want to delete the entire target directory of the superbuild for the next attempt because a complete superbuild takes many hours, mostly for the complete rebuild of PV. But if I do not delete everything, there may be “working solutions” that actually do not work because it happened only by using “leftovers”.

Well, it’s tough, but little by little…

Hmm. I’d have thought the directory the DLL was living in would be preferred over a PATH entry. That sounds like a bug if that’s not being done.

A good intermediate solution:

  • Delete the install/ directory.
  • Delete superbuild/*/stamp/*-install so the superbuild reruns the install of all projects. Deleting individual superbuild/* directories may be useful in case that project itself has stale cache variables from the old settings you used. Surgically deleting the specific entries from the offending CMakeCache.txt to force a “refind” may also save time on things like ParaView.

Thanks for the hints regarding selective deleting!

Regarding the search strategy in the fixup_bundle…, I am about to analyze that a bit more: I put some print() commands into the fixup_bundle.windows.py file that should run according to my understanding from the ctest -R nnn after the build is done - but I still need some patience because indeed I am doing a total rebuild that takes a few hours. I have seen that there is also some code that looks into manifest files, and these print() statements are supposed to tell me which function finds which library how.

Status: once again “almost there, but you never know…”

Running that fixup_bundle.py with patches did not happen so far - because then again other things happened during cmake, ninja and ctest runs that have never happened before, so I did not even get so far that this script would run at all. Meaning that even with your hints, it looks like there is no way to avoid a full re-run of many hours!

However, tricking that python script to do the right thing is certainly not a solution, but only a first step towards making sure that the “superbuild” always goes for the correct variant - while I really want a solution after one week of struggling now.

Looking into the sources of Paraview 5.7 (or others) I see that if you build and run it locally, it simply takes whatever HDF5 it finds on the system - and this is in my case from OSGeo4W, which is version 1.10.4. And that seems to work with no problem! The problem comes only with the “superbuild” that insists on going for 1.10.5 which then interferes with the installed 1.10.4 version and generates the entire confusion. So why not convincing the “superbuild” to simply also use 1.10.4?

Now at the “official superbuild repository of external stuff”, ie. https://www.paraview.org/files/dependencies/, there are only 1.10.3 and 1.10.5 (plus many older versions) available, so I downloaded the sources of 1.10.4 from the official HDF5 site: https://www.hdfgroup.org/downloads/hdf5/. This means to patch also the superbuild/versions.cmake file, to go for the local 1.10.4 sources instead of downloading - which works nicely. I needed one more try to find out that the “superbuild” insists of doing a few patches itself (after I had seen the differences, applied them myself and failing with a message that “patches are not applicable” - well, they were already applied…).

With this the superbuild managed to build hdf5 version 1.10.4 successfully - but failed now with freetype, zlib and szip… And this probably because I indeed deleted the entire “install” subdirectory in the sb “target” directory: some files there were “missing”, and they were not even rebuilt after I deleted freetype & Co also from the /superbuild folder - which otherwise would trigger a selective rebuild.

So I am now about to do the horrible thing again: delete the entire target folder and start a complete superbuild again from scratch…

Meaning: If I am now very lucky I will see a successful superbuild, packaging and installation by this evening! But it would not be the first time that I hoped for this - and then it did not happen for some completely stupid reason…

Finally: cmake - ninja - ctest - install and start is working! (And it’s only past noon yet…)

So the last proposed strategy was working indeed - still of course with some unexpected struggles: the “superbuild” wants to apply some little patches to the source code, so I learned that I have to “present” it the unpatched code, directly from the hdfgroup download.

But then even this did not work and there was a complaint that the patch does not fit! Well, maybe it refers to the wrong version? But if you read the *.patch files it looks like they were initially generated for version 1.10.3 and now applied unchanged to version 1.10.5 - so why would they not also fit for the version in between, ie. 1.10.4?

Further investigation showed that indeed the “non-fit” resulted from the fact that the code was already patched in a first run, and in a second it fails! Basically meaning that this patching is only usable for the case that the code is downloaded, not if it is already locally available: in that case it must not be done but rather suppressed in the proper *.cmake script. Ok, finally I also found out how to do that.

Only disadvantage: If I publish this superbuild setup and somebody else tries to follow it, he also needs to first download and patch the hdf5 stuff! In order to avoid that, I should not switch from “superbuild_set_revision(hdf5 … URL …)” to “superbuild_set_selectable_source(hdf5 SELECT…)” (like I did), but instead just put in the URL for a direct download from the hdfgroup link. But then there is still another problem to solve: if you unpack their source code, you find the “Paraview ready” sources only in a subdirectory - and at this moment I am not really motivated to do research about solving that issue. (Maybe it’s the reason why Kitware hosts all the “external” sources on their own server for download in the superbuild!? Because in the case of HDF5 it is really nothing “new” or “special” that you find there!). So for the moment I am going to live with that one.

And why did I manage to be faster than expected? Actually I “dared” to still not delete the entire “target” directory of the superbuild, but I kept the /superbuild/paraview subdirectory - and the ninja run was then “clever” enough to “see” that there was not much to do…

Now more testing is required, but it feels still a bit like a breakthrough is reached with this first running software that was not only built on PV5.7, but also packed and installed (not in the cmake sense of the word, but in the “rest of the world” sense of it!) and successfully started!

We do it because customers who use the superbuild have firewalls in place. We have a hole poked for them, but arbitrary websites are not available. Feel free to send a patch; I can upload the tarball and rewrite the URL in the MR.

I could of course send you a tarball of the hdf5 1.10.4 sources - and I would be happy if I could integrate it into my own superbuild!

Not sure if it makes sense to adapt the “official PV superbuild” to using that version because it looks like I am a bit a special case also because I have to refer not only to Paraview/VTK, but also to that OSGeo4W project, and they are still at the older version. But this is not the case probably for most other PV users and programmers - and 1.10.5 is definitely more “future” than the older version.

Still: If a 1.10.4 tarball would be on your website, I would certainly change my superbuild in the way that it is used. Because then the setup change is indeed only changing 1.10.5 to 1.10.4 in superbuild/versions.cmake - and everything else should work completely unchanged - no further patch required (as far as I can see at the moment).

I just see that this is already exactly the tarball that could be put on the Kitware server:

https://s3.amazonaws.com/hdf-wordpress-1/wp-content/uploads/manual/HDF5/HDF5_1_10_4/hdf5-1.10.4.tar.gz

Or else the one that I downloaded first - where the actual code is inside one more subdirectory level:

https://s3.amazonaws.com/hdf-wordpress-1/wp-content/uploads/manual/HDF5/HDF5_1_10_4/CMake-hdf5-1.10.4.zip

Of course I can try to use the first one of the two directly in my own superbuild scripts - and see if it works that way or if that location is not directly accessible from a script for whatever reason…

In any case, the website from where both links are coming is this one (already once above):

https://www.hdfgroup.org/downloads/hdf5/