Standardizing string substitutions

utkarsh.ayachit · June 30, 2020, 4:48pm

We have several cases in ParaView where strings (or filenames) are substituted by runtime values. e.g.

In Chart views, we can use ${TIME} to have it replaced with current view time.
In AnnotationTime source, we use %f or other printf style format string to be substituted with time
The newly added extract-generators support using %t or %ts to be replaced with time or timestep

Similarly (but not exactly the same), there are few other places were substitutions happen:

the Python calculator defines time and time_index as variables to use to access current time and timestep index.
ParaView pvsc server configuration files use $varname$ .

It’s probably time to standardize these. Here’s a proposal:

We add fmt as a thirdparty library we depend on. fmt supports named arguments with format specifiers. The context defines what named variables are available. We start by defining some variables in global scope, such as username, systemname, date etc. – things that are currently available only through EnviromentAnnotation filter. Views/filters/sources/ will have additional variables like current-time available. We build a document that documents various scopes and variables available within each scope.

Now, for any string in the application, we transform it using fmt together with the variables defined the context. For example, for Chart Title that uses today’s date and current time, one can use "Created on {TODAYS_DATE} for time {VIEW_TIME}" The fmt syntax defines a rich set of ways to control how the values are formatted including leading 0s, text-alignment etc.

I am not too thrilled about the variable names I used in this example, but I am sure we can come up with a nice way for namespacing which is readable and succinct.

The same could apply to filenames. Writers could use formatted string substitutions to generate filenames before writing. Maybe same for readers? – this one’s tricky though. We’ll need to clearly define context for reader not include time since that doesn’t make much sense since the time is coming from the reader itself.

Thoughts? Suggestions?

dcthomp · June 30, 2020, 5:17pm

I like the idea of consistent string formatting. The cmb superbuild already has a dependency on PEGTL, which doesn’t do what fmt does, but would be pretty easy to use if you wanted to implement parsing string-format specifiers yourself. I mention it because it’s header-only and we have found it useful. You could imagine extending the parser to allow people to refer to field data or other data that fmt might have a hard time dealing with.

wascott · June 30, 2020, 5:51pm

From a black box user of ParaView, +1.

Kenneth_Moreland · June 30, 2020, 7:02pm

Sounds cool.

Are there thoughts to use this in the Calculator filter? That would help a lot for field names with spaces or hyphens.

How would this interact with parts of ParaView that use, for example, the Python interpreter, such as Python Calculator and the find dialog?

dcthomp · June 30, 2020, 8:54pm

I think these are both orthogonal to strings for annotation/chart legends, but interesting in their own right. PEGTL sure would make a rewrite of vtkArrayCalculator a lot easier and more robust. I’d love to be able to reference global field-data in array-calculations on point/cell data.

As far as the Python Calculator, I think its string expression should continue to be as close as possible to valid python code that is fed to the interpreter. Or we should take Python out of the name.

utkarsh.ayachit · June 30, 2020, 11:04pm

maybe even more tangential, it’s probably worthwhile to revisit vtkArrayCalculator. Several libraries now exist that we may be able to leverage easily.

dcthomp · July 1, 2020, 12:04pm

Another math-parsing library we’ve been pointed at recently is METL which is based on PEGTL and looks pretty interesting.

utkarsh.ayachit · July 3, 2020, 5:36pm

Going back to the string formatting topic and without explicitly discussing which library to use, here’s a proposal:

We add a new type of domain (or hint) which can be added to string-vector-properties that indicates to the application that the value is a formattable string. This avoids weird scenarios where we end up formatting array names and such.
Such property values can contain “replacement fields” surrounded by curly braces {} . Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }} .
Supported field names are defined is named context maps/dictionaries and can be referenced as {context_name[key]}, for example {GLOBALS[username]} gets replaced by username during execution.
Predefined named contexts available are GLOBALS and ENV. Individual processing objects, such as the TimeToTextFilter or ChartView, can define additional context maps. GLOBALS gets populated when the process is launched and will include things like username, system name, current date, etc. ENV provides access to environment variables. Note, these don’t change over the lifetime of the process. This is crucial since that helps us implement a solution that works with any vtkObject-subclass that simply takes in character strings. The format substitution is handled by ParaView before calling the Set..(const char*..) method on the vtkObject-subclass when updating the property value.
Specific sources/filters/views may provide additional named contexts. For example, Chart View may define a named context called VIEW with items like ‘time’. The format substitution happens every time the view renders. Thus for a title with time, one can use something like Plot for time {VIEW[time]}. Extract Generators, similarly, define a context that allows users to use time or timestep-index in format strings when specifying output filenames.
To make things simpler, one can skip the name of the context object i.e. instead of using {VIEW[time]} one can just write {time}. In that case, the named contexts are evaluated in a predetermined order: starting with the local, vtkObject-specific, context map such as VIEW, then GLOBALS and finally ENV.

By design there is no data-dependent variables except if defined specifically by a vtkObject-subclass. For example, if one wants something that converts data array values to strings with formatting, one has to write that as a filter, providing a local context-map for accessing input data variables. This helps us explicitly define when the substitutions happen – in this case, since it’s a filter, that will happen anytime the filter re-executes.

Kenneth_Moreland · July 4, 2020, 2:06pm

I like these ideas very much.

One question about. For contexts provided by objects (i.e. sources/filters/views), how to you determine which instance is being referred to? For example, it is common to have multiple views. In that case, how do you resolve something like {VIEW[foo]} assuming both views provide a foo variable?

utkarsh.ayachit · July 4, 2020, 9:36pm

Such named-contexts are only available for text within the corresponding object. For example, in current ParaView, we have Annotate Time Filter. The Text property for that filter will not have access to VIEW at all. Only properties on a view proxy itself will have access to it. Currently, we only have potentially formattable string properties on Chart Views e.g. Chart Title, Left Axis Title etc. For these properties, VIEW is clearly defined even in multi-view setups – it’s the view on which the property is being specified.

The advantage of doing so is that we have better control of when values may change and need to be re-generated. If we allowed Annotate Time Filter to access VIEW, we’ll now need to make the filter re-executed anytime the VIEW’s parameters could change which can get complicated very quickly.

utkarsh.ayachit · July 8, 2020, 5:30pm

I’ve created an issue to track and prioritize this proposal implementation.