Skip to content

Commit b1970d9

Browse files
authored
Merge pull request #388 from naik-aakash/update_featurizers
fix sitewise bwdf
2 parents 123d346 + 77c5940 commit b1970d9

15 files changed

Lines changed: 720 additions & 79 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ If you use any of the following Featurizers, also cite the respective papers:
9292
* `FeaturizeCharges`: R. Nelson, C. Ertural, P. C. Müller, R. Dronskowski, in Comprehensive Inorganic Chemistry III, *Elsevier*, **2023**, pp. 141–201. [https://doi.org/10.1016/B978-0-12-823144-9.00120-5](https://doi.org/10.1016/B978-0-12-823144-9.00120-5)
9393
* `FeaturizeIcoxxlist`: V. L. Deringer, W. Zhang, M. Lumeij, S. Maintz, M. Wuttig, R. Mazzarello, R. Dronskowski, *Angewandte Chemie International Edition 2014*, *53*, 10817–10820. [https://doi.org/10.1002/anie.201404223](https://doi.org/10.1002/anie.201404223)
9494

95+
Cite the following paper as well if using aysmmetry index:
96+
97+
* F. Belli, E. Zurek, I. Errea, 2025 [https://arxiv.org/abs/2501.14420](https://arxiv.org/abs/2501.14420).
98+
9599
Please cite [pymatgen](https://github.com/materialsproject/pymatgen), [Lobster](https://schmeling.ac.rwth-aachen.de/cohp/index.php?menuID=1), and [ChemEnv](https://doi.org/10.1107/S2052520620007994) correctly as well.
96100

97101

docs/fundamentals/index.ipynb

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -281,8 +281,9 @@
281281
]
282282
},
283283
{
284-
"metadata": {},
285284
"cell_type": "markdown",
285+
"id": "a2f623edafa5efc9",
286+
"metadata": {},
286287
"source": [
287288
"### ICOXX based features (ICOXX: ICOHP / ICOBI / ICOOP)\n",
288289
"\n",
@@ -292,9 +293,18 @@
292293
"\n",
293294
"In the formula above, $\\delta$ is the Dirac delta function, $r$ is the distance between atoms A and B, and $B_{\\mathrm{AB}}$ is the bond strength between atoms A and B. The bond strength can be ICOHPs, ICOBIs, or ICOOPs. The BWDF is thus a histogram of bond strengths as a function of bond length. One can compute this for entire structure, for each unique atom pair in the structure, per site or per bond label.\n",
294295
"\n",
295-
"Here, from the BWDF mainly statistical features like mean, standard deviation, skewness, kurtosis, weighted mean, and weighted standard deviation are computed.\n"
296-
],
297-
"id": "a2f623edafa5efc9"
296+
"Here, from the BWDF mainly statistical features like mean, standard deviation, skewness, kurtosis, weighted mean, and weighted standard deviation are computed.\n",
297+
"\n",
298+
"As proposed by [F. Belli, E. Zurek, I. Errea, 2025](https://arxiv.org/abs/2501.14420), one can also generate the asymmetry index (ASI) of the local bonding environment using ICOBI / ICOHP / ICOOP for a site using following formulation\n",
299+
"\n",
300+
"$$\n",
301+
"\\boldsymbol{V}_{x} = \\frac{1}{B_{x}} \\sum_{\\alpha=1}^{B_{x}} \\mathrm{iCOXX}(x, \\alpha) \\, \\boldsymbol{i}_{x \\alpha}\n",
302+
"$$\n",
303+
"\n",
304+
"$V_x$, is computed for each atom $x$ in the structure. The vector is the sum of the iCOXX values for all of the interactions between an atom and its neighboring atoms $\\alpha$ (iCOXX(x, $\\alpha$)) weighted by a unit vector, $i_{xα}$, which denotes the direction of each interaction. This summation is performed over all neighbors for which the iCOXX values fall above a user-defined threshold, and divided by the number of interactions considered, $B_x$.\n",
305+
"\n",
306+
"One can then get statistics of ASI from all sites in the structure."
307+
]
298308
}
299309
],
300310
"metadata": {

docs/tutorial/commandlineinterface.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,10 @@ command. Below is an example and sample output using this command.
228228
:file: tutorial_assets/CdF2.html
229229

230230

231+
- ``lobsterpy plot-auto-ia --save-plot-json auto_plot_data.json`` can be used to save underlying plot data from the automatic analysis as a json file. This json file can be then loaded using monty package via ``monty.serialization.loadfn`` function.
232+
233+
Note that ``--save-plot-json`` also works in combination with ``lobsterpy plot-auto`` subcommand.
234+
231235
2. Plotting of COHPs/COBIs/COOPs
232236
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
233237

@@ -274,7 +278,7 @@ You can plot COHPs/COBIs/COOPs from the command line.
274278

275279
.. code:: bash
276280
277-
lobsterpy plot-bwdf
281+
lobsterpy plot-bwdf --plot-negative --sigma 0.15
278282
279283
.. image:: tutorial_assets/example_bwdf.png
280284

docs/tutorial/tutorial.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1203,7 +1203,7 @@
12031203
"outputs": [],
12041204
"source": [
12051205
"# get the BWDF stats df\n",
1206-
"batch_icohp.get_df()"
1206+
"batch_icohp.get_bwdf_df()"
12071207
]
12081208
},
12091209
{
3.02 KB
Loading

src/lobsterpy/cli.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -599,6 +599,14 @@ def get_parser() -> argparse.ArgumentParser:
599599
help="Normalization of the BWDFs. Options: 'formula_units', 'area', 'counts' and 'none'. "
600600
"Default: 'formula_units'.",
601601
)
602+
bwdf_plotting_group.add_argument(
603+
"-plotneg",
604+
"--plot-negative",
605+
dest="plotneg",
606+
action="store_true",
607+
default=False,
608+
help="If True, will plot -1*ICOHPs. Works only for ICOHPs. Default: True.",
609+
)
602610
bwdf_plotting_group.add_argument(
603611
"-siteindex",
604612
"--site-index",
@@ -1092,7 +1100,7 @@ def run(args):
10921100
for bwdf_dict in plot_data:
10931101
pair = next(iter(bwdf_dict.keys()))
10941102
bwdf_plotter.add_bwdf(bwdf=bwdf_dict, label=label)
1095-
plot = bwdf_plotter.get_plot(sigma=args.sigma, xlim=args.xlim, ylim=args.ylim)
1103+
plot = bwdf_plotter.get_plot(sigma=args.sigma, xlim=args.xlim, ylim=args.ylim, plot_negative=args.plotneg)
10961104
title = f"{args.title} : {pair}" if args.title else ""
10971105
plot.title(title)
10981106

src/lobsterpy/featurize/batch.py

Lines changed: 99 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -850,17 +850,25 @@ class BatchIcoxxlistFeaturizer:
850850
:param max_length: maximum bond length for BWDF computation
851851
:param min_length: minimum bond length for BWDF computation
852852
:param normalization: normalization strategy for BWDF
853+
:param n_electrons_scaling: bool indicating if ICOXX values should be scaled by number of electrons
853854
:param bin_width: bin width for BWDF
854855
:param bwdf_df_type: Type of BWDF dataframe to generate
855856
856-
- "binned": Binned BWDF function.
857-
- "stats": Statistical features of BWDF function.
857+
- "binned": Binned BWDF.
858+
- "stats": Statistical features of BWDF.
858859
- "sorted_bwdf": BWDF values sorted by distances, ascending.
859860
- "sorted_dists": Distances sorted by BWDF values (either only positive or negative),
860861
sorted descending by absolute values.
861862
:param sorted_dists_mode: only applies if bwdf_df_type=="sorted_dists".
862-
Corresponds to param "mode" of get_sorted_dist_df, defines whether BWDF values above or
863+
Corresponds to the param "mode" of get_sorted_dist_df, defines whether BWDF values above or
863864
below zero are considered for distance featurization.
865+
:param stats_type: type of statistics to compute from BWDF.
866+
867+
- "atompair": compute stats from unique atom pairs BWDFs.
868+
- "site": compute stats from site BWDFs.
869+
- "summed": compute stats from structure BWDFs.
870+
- "all": concatenated dataframe from `atompair`, `site` and `summed` options.
871+
864872
:read_icobis: bool to state to read ICOBILIST.lobster from the path
865873
:read_icoops: bool to state to read ICOOPLIST.lobster from the path
866874
:param n_jobs: number of parallel processes to run
@@ -871,9 +879,11 @@ def __init__(
871879
self,
872880
path_to_lobster_calcs: str | Path,
873881
normalization: Literal["formula_units", "area", "counts", "none"] = "formula_units",
882+
n_electrons_scaling: bool = False,
874883
bin_width: float = 0.02,
875884
bwdf_df_type: Literal["binned", "stats", "sorted_bwdf", "sorted_dists"] = "stats",
876885
sorted_dists_mode: Literal["positive", "negative"] = "negative",
886+
stats_type: Literal["atompair", "site", "summed", "all"] = "all",
877887
interactions_tol: float = 1e-3,
878888
max_length: float = 6.0,
879889
min_length: float = 0.0,
@@ -888,6 +898,8 @@ def __init__(
888898
:param max_length: maximum bond length for BWDF computation
889899
:param min_length: minimum bond length for BWDF computation
890900
:param normalization: normalization strategy for BWDF
901+
:param n_electrons_scaling: bool indicating if ICOXX values should be scaled by number of electrons.
902+
Only for testing purposes. Should not affect the results in any meaningful way.
891903
:param bin_width: bin width for BWDF
892904
:param bwdf_df_type: Type of BWDF dataframe to generate
893905
@@ -899,59 +911,76 @@ def __init__(
899911
:param sorted_dists_mode: only applies if bwdf_df_type=="sorted_dists".
900912
Corresponds to param "mode" of get_sorted_dist_df, defines whether BWDF values above or
901913
below zero are considered for distance featurization.
914+
:param stats_type: type of BWDF stats to be computed. Only applies if bwdf_df_type=="stats".
915+
916+
- "atompair": compute stats from unique atom pairs BWDFs.
917+
- "site": compute stats from site BWDFs.
918+
- "summed": compute stats from structure BWDFs.
919+
- "all": concatenated dataframe from `atompair`, `site` and `summed` options.
902920
:param interactions_tol: tolerance for interactions
903921
:param read_icobis: bool to state to read ICOBILIST.lobster from the path
904922
:param read_icoops: bool to state to read ICOOPLIST.lobster from the path
905923
:param n_jobs: number of parallel processes to run
906924
"""
907925
self.path_to_lobster_calcs = path_to_lobster_calcs
908926
self.normalization = normalization
927+
self.n_electrons_scaling = n_electrons_scaling
909928
self.max_length = max_length
910929
self.min_length = min_length
911930
self.bin_width = bin_width
912931
self.interactions_tol = interactions_tol
913932
self.bwdf_df_type = bwdf_df_type
914933
self.sorted_dists_mode = sorted_dists_mode
934+
self.stats_type = stats_type
915935
self.read_icobis = read_icobis
916936
self.read_icoops = read_icoops
917937
self.n_jobs = n_jobs
918938

919-
def _get_icoxxlist_bwdf_df(self, path_to_lobster_calc: str | Path) -> pd.DataFrame:
939+
def _get_featurizer_obj(self, path_to_lobster_calc: str | Path) -> FeaturizeIcoxxlist:
920940
"""
921-
Featurize ICOXXLIST data using FeaturizeCOXX.
941+
Set up the FeaturizeIcoxxlist object based on the provided parameters.
922942
923943
:param path_to_lobster_calc: path to root LOBSTER calculation directory
924944
925945
Returns:
926-
A pandas dataframe with computed ICOXXLIST moment features
946+
A FeaturizeIcoxxlist object
927947
"""
928948
if self.read_icobis:
929949
file_paths = get_file_paths(
930950
path_to_lobster_calc=path_to_lobster_calc,
931-
requested_files=["structure", "icobilist"],
951+
requested_files=["structure", "icobilist", "grosspop"]
952+
if self.n_electrons_scaling
953+
else ["structure", "icobilist"],
932954
)
933955
feat_icoxx = FeaturizeIcoxxlist(
934956
path_to_icoxxlist=file_paths.get("icobilist"),
935957
path_to_structure=file_paths.get("structure"),
958+
path_to_grosspop=file_paths.get("grosspop", None),
936959
bin_width=self.bin_width,
937960
interactions_tol=self.interactions_tol,
938961
normalization=self.normalization,
962+
n_electrons_scaling=self.n_electrons_scaling,
939963
max_length=self.max_length,
940964
min_length=self.min_length,
941965
are_cobis=self.read_icobis,
942966
are_coops=self.read_icoops,
943967
)
968+
944969
elif self.read_icoops:
945970
file_paths = get_file_paths(
946971
path_to_lobster_calc=path_to_lobster_calc,
947-
requested_files=["structure", "icooplist"],
972+
requested_files=["structure", "icooplist", "grosspop"]
973+
if self.n_electrons_scaling
974+
else ["structure", "icooplist"],
948975
)
949976
feat_icoxx = FeaturizeIcoxxlist(
950977
path_to_icoxxlist=file_paths.get("icooplist"),
951978
path_to_structure=file_paths.get("structure"),
979+
path_to_grosspop=file_paths.get("grosspop", None),
952980
bin_width=self.bin_width,
953981
interactions_tol=self.interactions_tol,
954982
normalization=self.normalization,
983+
n_electrons_scaling=self.n_electrons_scaling,
955984
max_length=self.max_length,
956985
min_length=self.min_length,
957986
are_cobis=self.read_icobis,
@@ -960,29 +989,59 @@ def _get_icoxxlist_bwdf_df(self, path_to_lobster_calc: str | Path) -> pd.DataFra
960989
else:
961990
file_paths = get_file_paths(
962991
path_to_lobster_calc=path_to_lobster_calc,
963-
requested_files=["structure", "icohplist"],
992+
requested_files=["structure", "icohplist", "grosspop"]
993+
if self.n_electrons_scaling
994+
else ["structure", "icohplist"],
964995
)
965996
feat_icoxx = FeaturizeIcoxxlist(
966997
path_to_icoxxlist=file_paths.get("icohplist"),
967998
path_to_structure=file_paths.get("structure"),
999+
path_to_grosspop=file_paths.get("grosspop", None),
9681000
bin_width=self.bin_width,
9691001
interactions_tol=self.interactions_tol,
9701002
normalization=self.normalization,
1003+
n_electrons_scaling=self.n_electrons_scaling,
9711004
max_length=self.max_length,
9721005
min_length=self.min_length,
9731006
are_cobis=self.read_icobis,
9741007
are_coops=self.read_icoops,
9751008
)
9761009

1010+
return feat_icoxx
1011+
1012+
def _get_icoxxlist_bwdf_df(self, path_to_lobster_calc: str | Path) -> pd.DataFrame:
1013+
"""
1014+
Featurize ICOXXLIST data using FeaturizeIcoxxlist.
1015+
1016+
:param path_to_lobster_calc: path to root LOBSTER calculation directory
1017+
1018+
Returns:
1019+
A pandas dataframe with computed ICOXXLIST moment features
1020+
"""
1021+
feat_icoxx = self._get_featurizer_obj(path_to_lobster_calc)
1022+
9771023
if self.bwdf_df_type == "binned":
9781024
return feat_icoxx.get_binned_bwdf_df()
9791025
if self.bwdf_df_type == "sorted_bwdf":
9801026
return feat_icoxx.get_sorted_bwdf_df()
9811027
if self.bwdf_df_type == "sorted_dists":
9821028
return feat_icoxx.get_sorted_dist_df(mode=self.sorted_dists_mode)
983-
return feat_icoxx.get_stats_df()
1029+
return feat_icoxx.get_stats_df(stats_type=self.stats_type)
9841030

985-
def get_df(self) -> pd.DataFrame:
1031+
def _get_asymmetry_index_df(self, path_to_lobster_calc: str | Path) -> pd.DataFrame:
1032+
"""
1033+
Generate a pandas dataframe with asymmetry index stats features.
1034+
1035+
:param path_to_lobster_calc: path to root LOBSTER calculation directory
1036+
1037+
Returns:
1038+
A pandas dataframe with asymmetry index stats features as columns.
1039+
"""
1040+
feat_icoxx = self._get_featurizer_obj(path_to_lobster_calc)
1041+
1042+
return feat_icoxx.get_asymmetry_index_stats_df()
1043+
1044+
def get_bwdf_df(self) -> pd.DataFrame:
9861045
"""
9871046
Generate a pandas dataframe with BWDF for all calcs.
9881047
@@ -1013,3 +1072,32 @@ def get_df(self) -> pd.DataFrame:
10131072
df_icoxxlist.sort_index(inplace=True) # noqa: PD002
10141073

10151074
return df_icoxxlist
1075+
1076+
def get_asymmetry_index_df(self) -> pd.DataFrame:
1077+
"""
1078+
Generate a pandas dataframe with site asymmetry index stats for all calcs.
1079+
1080+
Returns:
1081+
A pandas dataframe with site asymmetry index stats as columns.
1082+
"""
1083+
paths = [
1084+
os.path.join(self.path_to_lobster_calcs, f)
1085+
for f in os.listdir(self.path_to_lobster_calcs)
1086+
if not f.startswith("t")
1087+
and not f.startswith(".")
1088+
and os.path.isdir(os.path.join(self.path_to_lobster_calcs, f))
1089+
]
1090+
row = []
1091+
with (
1092+
mp.Pool(processes=self.n_jobs, maxtasksperchild=1) as pool,
1093+
tqdm(total=len(paths), desc="Generating ASI from ICOXXLIST") as pbar,
1094+
):
1095+
for _, result in enumerate(pool.imap_unordered(self._get_asymmetry_index_df, paths, chunksize=1)):
1096+
pbar.update()
1097+
row.append(result)
1098+
1099+
df_icoxxlist = pd.concat(row)
1100+
1101+
df_icoxxlist.sort_index(inplace=True) # noqa: PD002
1102+
1103+
return df_icoxxlist

0 commit comments

Comments
 (0)