kcpdi package

Submodules

kcpdi.kcp_ds module

This module, kcp_ds (Kernel Change Point Detect Select), encapsulates functions from the Ruptures package and includes a function for penalized variable selection.

Specific details and remarks: 1. The function outputs Python indices of the columns of “data” corresponding to the detected change points. Real-world times associated with the data are not considered in this function. 2. The ruptures class KernelCPD has a minimum allowed segment size (min_size). It is currently set to 2 by default and is not an input parameter for kcp_ds. 3. The default kernel is “linear,” with options for “cosine” and “rbf” kernels. To cover all bases, an input params is required, which defaults to None but can be modified for the rbf kernel (e.g., params = {“gamma”: value_of_gamma}). 4. Due to memory constraints, the algorithm processes offline time-series data longer than max_n_time_points in segments of length max_n_time_points, and then outputs the full set of change points.

Note: Please handle real-world times associated with the data independently as this module focuses on change point detection and variable selection.

kcpdi.kcp_ds.kcp_ds(data: array, kernel: Literal['linear', 'cosine', 'rbf'] = 'linear', params: Dict[str, Any] | None = None, max_n_time_points: int = 2000, min_n_time_points: int = 10, expected_frac_anomaly: float = 0.001) → Tuple[List[int], List[int]][source]

Return a set of important change points change points after running kernel change point detection followed by penalized model selection using the slope heuristic method.

Since the criterion is responding to noise/overfitting, penalized variable selection is performed in order to obtain a final decision as to how many of the detected change points are truly significant and not simply adjusted noise.

Parameters:

data – array of dimension (number of time points) x (number of time-series) containing the data points.
kernel – Kernel of the kernet change_point detection. If it is rbf, params must include a “gamma” parameters, with a positive real value.
params – parameters for the kernel instance
max_n_time_points – maximum size (max_n_time_points) x (max_n_time_points) of matrices expected to be processed quickly by the computer system.
min_n_time_points – minimum number of time points in the current dataset for which it makes sense to run the detection algorithm again. If there are fewer than min_n_time_points points in the dataset, no computations will be run and the outputs will be empty.
expected_frac_anomaly – This parameter encodes prior knowledge of users as to how often anomalies might occur. Results can be quite dependent on the choice of this parameter, so choose carefully!

kcpdi.kcp_ss_learner module

This module, kcp_ss_learner (Kernel Change Point Sample Scorer Learner), includes a wrapper to perform anomaly scoring as understood in the TADkit formalism, based on this library’s change-point detection methodology.

The kernel change-point detection algorithm outputs a final list of time indices at which it thinks that “something happened”.

Other methods output a score at __all__ time indices, where the higher the score is, the more we believe that “something happened there”.

In order to allow the kernel change-point method to be integrated into a Python package based around score samples, we have implemented a function __kcp_ss__ which takes the __kcp_ds__ output list of change-points and turns them into scores at __all time indices__.

Remember that the kernel change-point detection algorithm truly believes that it has found the true and only set of change-points. However, due to random noise, it could have been that a true change-point was immediately before or immediately after a detected change-point, i.e., 1 or 2 or maybe even 3 time indices before or after.

We give a score of 1 at each detected change-point. Other points are assigned a score that decreases as they move away from the detected change-point, in a way parameterized by a decay parameter (gamma).

class kcpdi.kcp_ss_learner.KcpLearner(kernel: Literal['linear', 'cosine', 'rbf'] = 'linear', params: Dict[str, Any] | None = None, max_n_time_points: int = 2000, min_n_time_points: int = 10, expected_frac_anomaly: float = 0.001, decay_param: float = 1.0)[source]

Bases: BaseEstimator, OutlierMixin

fit(X, y=None)[source]

params_description = {'decay_param': {'default': 1, 'description': 'How fast the score decays around the change-points.', 'family': 'postprocessing', 'log_start': -2, 'log_step': 0.1, 'log_stop': 2, 'value_type': 'log_range'}}

required_properties = []

score_samples(X)[source]

kcpdi.kcp_v module

This module, kcp_v (Kernel Change Point Visualisation), includes a function for visualization and explainability. It extracts intervals around each detected change point, illustrating the contribution of each sensor to the change point criterion. Each interval is bounded on the left by either 0 or the previous change point location and on the right by either the next change point or the time series end-point. The visualization highlights the change point’s location with a vertical dotted red line and displays the algorithm criterion as a blue curve.

Remarks: 1. For clarity in visualization, individual time series related to different change points are visualized separately, resulting in one plot per detected change point. 2. The visualization function accommodates the algorithm’s processing of a maximum of “max_n_time_points” in each loop. If the total exceeds “max_n_time_points,” the interval calculation differs for the last change point found in each loop. Hence, “max_n_time_points” must be consistent between the change point detection and visualization functions. 3. Only the linear kernel provides criterion lines for each individual time series that sum up to the value of the global criterion line. Although designed for the linear kernel, the visualization function can be used for other kernels (if the same kernel is used in the kcp_ds function). 4. #data: The input data array must be the exact same array used in the kcp_ds function.

kcpdi.kcp_v.kcp_v(data: array, true_times: Sequence[int], detected_change_points: Sequence[int], interval_end_points: Sequence[int], n_legend: int = 10, save_plots: bool = False) → None[source]

Visualize the post-hoc importance of each individual time series in the detection of each shared change point by the kernel change point detection algorithm.

Parameters:

data – array of dimension (number of time points) x (number of time-series) containing the data points.
true_times – the dataset-dependent true time points (after any postprocessing steps required to interpolate data to a fixed time grid). This is required to make more meaningful plot outputs, or to obtain the actual true times of predicted change points. true_times should have the length of the first dimension of data.
detected_change_points – the detected change points. It should be the first output of the kcp_ds function.
interval_end_points – indices of the ends of intervals of length max_n_time_points. It should be the second output of kcpDS.
n_legend – the number of “important” individual time series you want to be colorful in the final plot.

kcpdi.utils module

kcpdi.utils.fig_ax(figsize=(15, 5), dpi=150)[source]: Generate a (matplotlib) figure and ax objects with given size.

kcpdi.utils.get_sum_of_cost(algo, n_bkps) → float[source]

Calculate the sum of costs for the change points bkps.

Utility function used to do penalized variable selection and obtain a final list of anomaly time points.