NXapm_paraprobe_results_transcoder

Status:

application definition, extends NXobject

Description:

Results of a paraprobe-transcoder tool run.

Symbols:

The symbols used in the schema to specify e.g. dimensions of arrays.

n_ions: The total number of ions in the reconstruction.

n_ivec_max: Maximum number of allowed atoms per (molecular) ion (fragment). Needs to match maximum_number_of_atoms_per_molecular_ion.

n_ranges: Number of mass-to-charge-state-ratio intervals mapped on this ion type.

n_topology: Total number of integers in the supplementary XDMF topology array.

Groups cited:

NXcoordinate_system_set, NXcs_computer, NXcs_cpu, NXcs_gpu, NXcs_io_obj, NXcs_io_sys, NXcs_mm_sys, NXcs_profiling_event, NXcs_profiling, NXentry, NXfabrication, NXinstrument, NXion, NXprocess, NXtransformations, NXuser

Structure:

ENTRY: (required) NXentry

@version: (required) NX_CHAR

Version specifier of this application definition.

definition: (required) NX_CHAR

Official NeXus NXDL schema with which this file was written.

Obligatory value: NXapm_paraprobe_results_transcoder

program: (required) NX_CHAR

Given name of the program/software/tool with which this NeXus (configuration) file was generated.

@version: (required) NX_CHAR

Ideally program version plus build number, or commit hash or description of ever persistent resources where the source code of the program and build instructions can be found so that the program can be configured ideally in such a manner that the result of this computational process is recreatable in the same deterministic manner.

analysis_identifier: (optional) NX_CHAR

Ideally, a (globally persistent) unique identifier for referring to this analysis.

analysis_description: (optional) NX_CHAR

Possibility for leaving a free-text description about this analysis.

start_time: (required) NX_DATE_TIME

ISO 8601 formatted time code with local time zone offset to UTC information included when the analysis behind this results file was started, i.e. the paraprobe-tool executable started as a process.

end_time: (required) NX_DATE_TIME

ISO 8601 formatted time code with local time zone offset to UTC information included when the analysis behind this results file were completed and the paraprobe-tool executable exited as a process.

config_filename: (required) NX_CHAR

The absolute path and name of the config file for this analysis.

@version: (required) NX_CHAR

At least SHA256 strong hash of the specific config_file for tracking provenance.

results_path: (optional) NX_CHAR

Path to the directory where the tool should store NeXus/HDF5 results of this analysis. If not specified results will be stored in the current working directory.

status: (required) NX_CHAR

A statement whether the paraprobe-tool executable managed to process the analysis or failed prematurely.

This status is written to the results file after the end_time at which point the executable must not compute any analysis. Only when this status message is present and shows success, the user should consider the results. In all other cases it might be that the executable has terminated prematurely or another error occurred.

Any of these values: success | failure

USER: (recommended) NXuser

If used, contact information and eventually details of at least the person who performed this analysis.

name: (required) NX_CHAR

affiliation: (recommended) NX_CHAR

address: (optional) NX_CHAR

email: (recommended) NX_CHAR

orcid: (recommended) NX_CHAR

orcid_platform: (recommended) NX_CHAR

telephone_number: (optional) NX_CHAR

role: (recommended) NX_CHAR

social_media_name: (optional) NX_CHAR

social_media_platform: (optional) NX_CHAR

COORDINATE_SYSTEM_SET: (required) NXcoordinate_system_set

Details about the coordinate system conventions used.

TRANSFORMATIONS: (required) NXtransformations

The individual coordinate systems which should be used. Field names should be prefixed with the following controlled terms indicating which individual coordinate system is described:

  • paraprobe

  • lab

  • specimen

  • laser

  • leap

  • detector

  • recon

visualization: (recommended) NXprocess

xdmf_topology: (required) NX_UINT (Rank: 1, Dimensions: [n_topology]) {units=NX_UNITLESS}

An array of triplets of integers which can serve as a supplementary array for Paraview to display the reconstruction. The XDMF datatype is here 1, the number of primitives 1 per triplet, the last integer in each triplet is the identifier of each point starting from zero.

atom_probe: (required) NXinstrument

On a mid term perspective we would like to evolve the paraprobe-toolbox to an implementation stage where it works exclusively with completely provenance-tracked formats for both the configuration of the workflow step and/or analysis with each tool and also for the output of these analyses in the form of so-called tool-specific results files. Currently the Hierarchical Data Format 5 (HDF5) is used to store such data.

Different file formats can be used to inject reconstructed datasets and ranging definitions into the toolbox. Traditionally, these are the POS, ePOS, and APT files with the tomographic reconstruction and other metadata and RNG and RRNG file formats for the ranging definitions how mass-to-charge state-ratio values map on (molecular) ion types. Such input should be injected via specific NeXus/HDF5 files which are documented in compliance with the NXapm application definition.

So far the paraprobe-toolbox was used as a standalone tool. Therefore, it was not relevant during the development to focus on interoperability. Essentially paraprobe-transcoder was used as a parser to transcode data in the above-mentioned file formats into a paraprobe-specific representation. This transcoding should become deprecated. Here we describe steps we have taken into this direction.

With the work in the FAIRmat project and the desire to make the paraprobe- toolbox also accessible as a cloud-computing capable service in the Nomad Remote Tools Hub (NORTH) the topic of interoperability became more important and eventually the NXapm application definition was proposed. NORTH is a GUI and related service in a NOMAD OASIS instance which allows to spawn preconfigured docker containers via JupyterHub. Currently, NORTH includes the so-called apm container. A container with tools specific for analyzing data from atom probe microscopy as well as processing of point cloud and mesh data.

The NXapm application definition and related implementation work within NOMAD OASIS enabled users to parse content of POS, ePOS, APT, RNG, and RRNG files, surplus key metadata from vendor-agnostic electronic lab notebook solutions directly into NOMAD OASIS via the uploads section. The process is automated and yields an NXapm-compliant NeXus/HDF5 file inside the uploads section in return.

With these improvements made there is no longer a need for - at least the users of a NOMAD OASIS and NORTH instance to use the deprecated PARAPROBE.Transcoder.Results.*.h5 files. Ideally, paraprobe should automatically detect that the input can now be an NXapm-compliant NeXus/HDF5 file and in response work with this file directly. To remain compliant with users however who do not have or do not wish to use a NOMAD OASIS or NXapm or NeXus at all right now, the solution is as follows:

Calling the configuration stage of paraprobe-transcoder is always mandatory. It is always the first step of working with the toolbox. In this process the user defines the input files. These can either be nxs i.e. the NXapm/NeXus/ HDF5 file from e.g. the upload section, or such a file that was obtained from a colleague with a NOMAD OASIS instance. In all other cases, users can pass the reconstruction and ranging definitions using the traditional POS, ePOS, or APT and RNG or RRNG file formats respectively.

Based on which input the user delivers, the parmsetup-transcoder tool then creates a configuration file PARAPROBE.Transcoder.Config.SimID.*.nxs and informs the user whether the input was NeXus (and thus if all relevant input is already available) or whether the paraprobe-transcoder tool needs to be executed to convert the content of the vendor files first into a format which paraprobe can provenance track and understand. In the latter case, the PARAPROBE.Transcoder.Config.SimID.*.nxs file is used to communicate to all subsequently used tools from which files the tools can expect to find the reconstruction and ranging definitions.

All subsequent analysis steps start also with a tool-specific configuration. This configuration step reads in (among others) the PARAPROBE.Transcoder.Config.SimID.*.nxs file from which the configuration tool identifies automatically whether to read the reconstruction and ranging data from PARAPROBE.Transcoder.Results.SimID.*.h5 or directly the NXapm-compliant NeXus/HDF5 file that was created upon preparing the upload or the file shared from a colleague. This design removes the need for unnecessary copies of the data. Currently still though users should execute the transcoder step as it will generate a supplementary XDMF topology field with which the data in either the NeXus/HDF5 or the transcoded vendor files can be displayed using e.g. Paraview. For this purpose XDMF is used.

Of course ideally the APT community would at some point converge to use a common data exchange file format. To this end, AMETEK/Cameca’s APT file format could be a good starting point but so far it is lacking a consistent way of how to store generalized ranging definitions and post-processing results. POS, ePOS, Rouen’s ATO, as well as other so far used representations of data like CSV or text files have, to the best of our current knowledge, no concept of how to marry reconstruction and (optional) ranging data into one self-descriptive format.

This summarizes the rationale behind the current choices of the I/O for paraprobe. Furthermore, this summarizes also why the fundamental design of splitting an analysis always into steps of configuration (with parmsetup), task execution (with the respective C/C++ or Python tool of the toolbox), and post-processing (e.g. with autoreporter) is useful because it offers a clear description of provenance tracking. This is a necessary step to make atom probe microscopy data at all better aligned with the aims of the FAIR principles.

The internal organization of the data entries in the atom_probe group in this application definition for paraprobe-transcoder results files mirror the definitions of the NXapm for consistency reasons.

mass_to_charge_conversion: (required) NXprocess

mass_to_charge: (required) NX_FLOAT (Rank: 1, Dimensions: [n_ions]) {units=NX_ANY}

Mass-to-charge-state ratio values.

reconstruction: (required) NXprocess

reconstructed_positions: (required) NX_FLOAT (Rank: 2, Dimensions: [n_ions, 3]) {units=NX_LENGTH}

Three-dimensional reconstructed positions of the ions. Interleaved array of x, y, z positions in the specimen space.

ranging: (required) NXprocess

peak_identification: (required) NXprocess

Details about how peaks, with taking into account error models, were interpreted as ion types or not.

ION: (required) NXion

isotope_vector: (required) NX_UINT

nuclid_list: (recommended) NX_UINT

charge_state: (required) NX_INT

mass_to_charge_range: (required) NX_FLOAT

performance: (required) NXcs_profiling

current_working_directory: (required) NX_CHAR

command_line_call: (optional) NX_CHAR

start_time: (recommended) NX_DATE_TIME

end_time: (recommended) NX_DATE_TIME

total_elapsed_time: (required) NX_NUMBER

number_of_processes: (required) NX_POSINT

number_of_threads: (required) NX_POSINT

number_of_gpus: (required) NX_POSINT

CS_COMPUTER: (recommended) NXcs_computer

name: (recommended) NX_CHAR

operating_system: (required) NX_CHAR

@version: (required) NX_CHAR

uuid: (optional) NX_CHAR

CS_CPU: (optional) NXcs_cpu

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_GPU: (optional) NXcs_gpu

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_MM_SYS: (optional) NXcs_mm_sys

total_physical_memory: (required) NX_NUMBER

CS_IO_SYS: (optional) NXcs_io_sys

CS_IO_OBJ: (required) NXcs_io_obj

technology: (required) NX_CHAR

max_physical_capacity: (required) NX_NUMBER

name: (optional) NX_CHAR

FABRICATION: (recommended) NXfabrication

identifier: (optional) NX_CHAR

capabilities: (optional) NX_CHAR

CS_PROFILING_EVENT: (required) NXcs_profiling_event

start_time: (optional) NX_DATE_TIME

end_time: (optional) NX_DATE_TIME

description: (required) NX_CHAR

elapsed_time: (required) NX_NUMBER

number_of_processes: (required) NX_POSINT

Specify if it was different from the number_of_processes in the NXcs_profiling super class.

number_of_threads: (required) NX_POSINT

Specify if it was different from the number_of_threads in the NXcs_profiling super class.

number_of_gpus: (required) NX_POSINT

Specify if it was different from the number_of_threads in the NXcs_profiling super class.

max_virtual_memory_snapshot: (recommended) NX_NUMBER

max_resident_memory_snapshot: (recommended) NX_NUMBER

Hypertext Anchors

List of hypertext anchors for all groups, fields, attributes, and links defined in this class.

NXDL Source:

https://github.com/FAIRmat-Experimental/nexus_definitions/tree/fairmat/contributed_definitions/NXapm_paraprobe_results_transcoder.nxdl.xml