NXsimilarity_grouping

Status:

base class, extends NXobject

Description:

Metadata to the results of a similarity grouping analysis.

Similarity grouping analyses can be supervised segmentation or machine learning clustering algorithms. These are routine methods which partition the member of a set of objects/geometric primitives into (sub-)groups, features of different type. A plethora of algorithms have been proposed which can be applied also on geometric primitives like points, triangles, or (abstract) features aka objects (including categorical sub-groups).

This base class considers metadata and results of one similarity grouping analysis applied to a set in which objects are either categorized as noise or belonging to a cluster. As the results of the analysis each similarity group, here called feature aka object can get a number of numerical and/or categorical labels.

Symbols:

The symbols used in the schema to specify e.g. dimensions of arrays.

c: Cardinality of the set.

n_lbl_num: Number of numerical labels per object.

n_lbl_cat: Number of categorical labels per object.

n_features: Total number of similarity groups aka features, objects, clusters.

Groups cited:

NXprocess

Structure:

cardinality: (optional) NX_UINT {units=NX_UNITLESS}

Number of members in the set which is partitioned into features.

number_of_numeric_labels: (optional) NX_UINT {units=NX_UNITLESS}

How many numerical labels does each feature have.

number_of_categorical_labels: (optional) NX_UINT {units=NX_UNITLESS}

How many categorical labels does each feature have.

identifier_offset: (optional) NX_UINT (Rank: 1, Dimensions: [n_lbl_num]) {units=NX_UNITLESS}

Which identifier is the first to be used to label a cluster.

The value should be chosen in such a way that special values can be resolved: * identifier_offset-1 indicates an object belongs to no cluster. * identifier_offset-2 indicates an object belongs to the noise category. Setting for instance identifier_offset to 1 recovers the commonly used case that objects of the noise category get values to -1 and unassigned points to 0. Numerical identifier have to be strictly increasing.

numerical_label: (optional) NX_UINT (Rank: 2, Dimensions: [c, n_lbl_num]) {units=NX_UNITLESS}

Matrix of numerical label for each member in the set. For classical clustering algorithms this can for instance encode the cluster_identifier.

categorical_label: (optional) NX_CHAR (Rank: 2, Dimensions: [c, n_lbl_cat])

Matrix of categorical attribute data for each member in the set.

statistics: (optional) NXprocess

In addition to the detailed storage which members was grouped to which feature/group summary statistics are stored under this group.

number_of_unassigned_members: (optional) NX_UINT (Rank: 1, Dimensions: [n_lbl_num]) {units=NX_UNITLESS}

Total number of members in the set which are categorized as unassigned.

noise: (optional) NX_UINT (Rank: 1, Dimensions: [n_lbl_num]) {units=NX_UNITLESS}

Total number of members in the set which are categorized as noise.

number_of_features: (optional) NX_UINT {units=NX_UNITLESS}

Total number of clusters (excluding noise and unassigned).

feature_identifier: (optional) NX_UINT (Rank: 2, Dimensions: [n_features, n_lbl_num]) {units=NX_UNITLESS}

Array of numerical identifier of each feature (cluster).

feature_member_count: (optional) NX_UINT (Rank: 2, Dimensions: [n_features, n_lbl_num]) {units=NX_UNITLESS}

Array of number of members for each feature.

Hypertext Anchors

List of hypertext anchors for all groups, fields, attributes, and links defined in this class.

NXDL Source:

https://github.com/FAIRmat-Experimental/nexus_definitions/tree/fairmat/contributed_definitions/NXsimilarity_grouping.nxdl.xml