Should I normalize SOAP vectors before applying kernels? Also getting 'ValueError: Input X contains NaN.'

Hello!

I’m using SOAP descriptors for building similarity and wondering if I should explicitly normalize each SOAP vector before computing the kernel. Some references suggest dividing by the vector norm (or normalizing the kernel), but it’s not obvious from the DScribe documentation whether this is done automatically. Your example does the following:

```
a_features = desc.create(a)
b_features = desc.create(b)
re = AverageKernel(metric="linear")
re_kernel = re.create([a_features, b_features])
```

Would it be better to normalize the SOAP vectors beforehand? Something like:
```
soap_list_norm = []
for a_feat, b_feat in zip(a_features, b_features):
    a_norm = np.linalg.norm(a_feat, axis=1, keepdims=True)
    b_norm = np.linalg.norm(b_feat, axis=1, keepdims=True)
    
    soap_list_norm.append(a_feat / a_norm)
    soap_list_norm.append(b_feat / b_norm)
```

I’m asking because I frequently encounter the error: `ValueError: Input X contains NaN` when using the REMatchKernel. This might indicate an issue with descriptor generation or a division by zero in the kernel computation. Do you have any guidelines on best practices for avoiding NaNs in DScribe?

Thanks for your help!

Best,
Alejandro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should I normalize SOAP vectors before applying kernels? Also getting 'ValueError: Input X contains NaN.' #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should I normalize SOAP vectors before applying kernels? Also getting 'ValueError: Input X contains NaN.' #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions