Hi,
i was having an issue with the ctapipe-optimize-event-selection tool, see cta-observatory/ctapipe#2861. It seems that in cuts.py, the 'cut' columns was not initialized (filled with ones).
After iterating with chat GPT, I understand it is linke to the fact that np.nanpercentile does not support masked arrays.
Below is a function that works for me :
def calculate_percentile_cut(
values,
bin_values,
bins,
fill_value,
percentile=68,
min_value=None,
max_value=None,
smoothing=None,
min_events=10,
):
"""
Calculate cuts as the percentile of a given quantity in bins of another
quantity.
Parameters
----------
values: ``~numpy.ndarray`` or ``~astropy.units.Quantity``
The values for which the cut should be calculated
bin_values: ``~numpy.ndarray`` or ``~astropy.units.Quantity``
The values used to sort the ``values`` into bins
edges: ``~numpy.ndarray`` or ``~astropy.units.Quantity``
Bin edges
fill_value: float or quantity
Value for bins with less than ``min_events``,
must have same unit as values
percentile: float
The percentile to calculate in each bin as a percentage,
i.e. 0 <= percentile <= 100.
min_value: float or quantity or None
If given, cuts smaller than this value are replaced with ``min_value``
max_value: float or quantity or None
If given, cuts larger than this value are replaced with ``max_value``
smoothing: float or None
If given, apply a gaussian filter of width ``sigma`` in terms
of bins.
min_events: int
Bins with less events than this number are replaced with ``fill_value``
"""
# create a table to make use of groupby operations
# we use a normal table here to avoid astropy/astropy#13840
table = Table({"values": values}, copy=COPY_IF_NEEDED)
unit = table["values"].unit
# make sure units match
if unit is not None:
fill_value = u.Quantity(fill_value).to(unit)
if min_value is not None:
min_value = u.Quantity(min_value).to_value(unit)
if max_value is not None:
max_value = u.Quantity(max_value).to_value(unit)
bin_index, valid = calculate_bin_indices(bin_values, bins)
by_bin = table[valid].group_by(bin_index[valid])
n_bins = len(bins) - 1
cut_table = QTable()
cut_table["low"] = bins[:-1]
cut_table["high"] = bins[1:]
cut_table["center"] = bin_center(bins)
cut_table["n_events"] = 0
unit = None
if hasattr(fill_value, 'unit'):
unit = fill_value.unit
fill_value = fill_value.value
# Extract scalar from masked arrays if needed
if hasattr(fill_value, 'unmasked'):
fill_value = float(fill_value.unmasked)
elif hasattr(fill_value, 'item'):
fill_value = float(fill_value.item())
else:
fill_value = float(fill_value)
percentile = np.asanyarray(percentile)
# Initialize cut column with proper shape based on percentile
if percentile.shape == ():
cut_table["cut"] = np.full(n_bins, fill_value, dtype=float)
else:
cut_table["cut"] = np.full((n_bins, len(percentile)), fill_value, dtype=float)
if unit is not None:
cut_table["cut"].unit = unit
for bin_idx, group in zip(by_bin.groups.keys, by_bin.groups):
# replace bins with too few events with fill_value
n_events = len(group)
cut_table["n_events"][bin_idx] = n_events
if n_events < min_events:
cut_table["cut"].value[bin_idx] = fill_value
else:
# Extract the actual data from masked arrays
group_values = group["values"]
# Handle different types of masked arrays
if hasattr(group_values, 'unmasked'):
# Astropy MaskedNDArray - get unmasked data and handle mask separately
data = np.asarray(group_values.unmasked)
if hasattr(group_values, 'mask') and group_values.mask is not None:
mask = np.asarray(group_values.mask)
data = np.where(mask, np.nan, data)
elif hasattr(group_values, 'filled'):
# numpy.ma.MaskedArray - fill masked values with nan
data = group_values.filled(np.nan)
else:
# Regular array
data = np.asarray(group_values, dtype=float)
value = np.nanpercentile(data, percentile)
if min_value is not None or max_value is not None:
value = np.clip(value, min_value, max_value)
cut_table["cut"].value[bin_idx] = value
if smoothing is not None:
cut_table["cut"].value[:] = gaussian_filter1d(
cut_table["cut"].value,
smoothing,
mode="nearest",
)
return cut_table
Hi,
i was having an issue with the
ctapipe-optimize-event-selectiontool, see cta-observatory/ctapipe#2861. It seems that in cuts.py, the 'cut' columns was not initialized (filled with ones).After iterating with chat GPT, I understand it is linke to the fact that np.nanpercentile does not support masked arrays.
Below is a function that works for me :