Skip to content

Should I normalize SOAP vectors before applying kernels? Also getting 'ValueError: Input X contains NaN.' #151

@alejandroarche

Description

@alejandroarche

Hello!

I’m using SOAP descriptors for building similarity and wondering if I should explicitly normalize each SOAP vector before computing the kernel. Some references suggest dividing by the vector norm (or normalizing the kernel), but it’s not obvious from the DScribe documentation whether this is done automatically. Your example does the following:

a_features = desc.create(a)
b_features = desc.create(b)
re = AverageKernel(metric="linear")
re_kernel = re.create([a_features, b_features])

Would it be better to normalize the SOAP vectors beforehand? Something like:

soap_list_norm = []
for a_feat, b_feat in zip(a_features, b_features):
    a_norm = np.linalg.norm(a_feat, axis=1, keepdims=True)
    b_norm = np.linalg.norm(b_feat, axis=1, keepdims=True)
    
    soap_list_norm.append(a_feat / a_norm)
    soap_list_norm.append(b_feat / b_norm)

I’m asking because I frequently encounter the error: ValueError: Input X contains NaN when using the REMatchKernel. This might indicate an issue with descriptor generation or a division by zero in the kernel computation. Do you have any guidelines on best practices for avoiding NaNs in DScribe?

Thanks for your help!

Best,
Alejandro

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions