Anomaly Detection Algorithms  

Anomaly detection is an unsupervised machine learning technique that focuses on identifying data points, events, or observations that deviate significantly from the majority of the data. These outliers can represent critical incidents such as fraud, network intrusions, equipment failures, or data quality issues. Instead of predicting predefined labels, anomaly detection models learn normal behavior within the data and flag deviations that may require attention.

Note: Anomaly Detection feature is available in early access across all the data centers. To use this, request access via support@zohocatalyst.com
 
To have a conceptual understanding of anomaly detection, kindly refer to the Anomaly Detection Help Doc.

Anomaly detection in Catalyst QuickML is broadly categorized into Time Series and Non-Time Series, each offering its own set of algorithms and configurable parameters.

Time-Series Anomaly Detection Algorithms 

Within Catalyst QuickML, anomaly detection for time series data focuses on identifying abnormal patterns or deviations in sequential data points over time. These methods account for temporal dependencies and seasonality, enabling the detection of sudden spikes, drops, or trend changes.

Before exploring the individual time-series anomaly detection algorithms, it’s important to understand the common parameters used for Anomaly Sensitivity Control, as shown in the image below. These parameters are consistent across all time-series anomaly detection algorithms in Catalyst QuickML and are essential for controlling how anomalies are detected and interpreted. They allow users to fine-tune the threshold, window size, and anomaly scoring method, ensuring that the system accurately differentiates between normal fluctuations and true anomalous behavior.

anomaly-1

Common parameters:

  • Threshold: Determines how sensitive the model is to deviations from expected behavior. A lower threshold increases sensitivity, flagging smaller fluctuations as anomalies, while a higher threshold reduces false positives.

  • Window size: Defines the rolling window of data points used for analysis. Applicable up to a maximum of 24 time steps.

  • Anomaly scoring method: Currently supports Z-Score, which measures how far a point deviates from the mean in terms of standard deviations.

Below are the time series anomaly detection algorithms supported in Catalyst QuickML, along with their explanations, real-world use cases, and key parameters.

Auto Regressor   

Explanation: The Auto Regressor (AR) model predicts future values based on a linear combination of past observations. It assumes that past behavior directly influences the present, with coefficients representing the relationship strength between past and current values. QuickML extends this with flexible regression backends, including Linear Regressor, Random Forest Regressor, AdaBoost Regressor, and Gradient Boosting Regressor, enabling both linear and non-linear temporal modeling.

Mathematical intuition:

copy
 Yₜ = c + ∑ᵢ₌₁ᵖ φᵢ Yₜ₋ᵢ + εₜ 

Where:

  • Yt = current value
  • c = intercept (constant term)
  • ϕi = autoregressive coefficients
  • Yt−i = lagged observations
  • εt = random noise

Use case: Used in financial forecasting, sensor drift detection, and predictive process monitoring, where recent history significantly affects the next outcome — e.g., predicting short-term stock trends or detecting abnormal fluctuations in temperature sensors.

Key parameters:

Note: In parameter ranges, square brackets [ ] include boundaries, and parentheses ( ) exclude them.
 
Parameter Description Data Type Possible Values Default Value
select_model_to_fit Chooses the regression model for fitting time series. string {‘Linear Regressor’, ‘Random-Forest Regressor’, ‘AdaBoost Regressor’, ‘Gradient Boosting Regressor’} ‘Linear Regressor’
max_lag Defines how many past time steps are used to predict the current value. int [1, ∞) 1

Moving Average (MA) 

Explanation: The Moving Average model focuses on modeling noise or random shocks within a time series. Instead of using past values directly, it predicts current observations as a weighted sum of past forecast errors. This approach helps smooth short-term volatility and is ideal for detecting unusual deviations from expected residual patterns.

Mathematical intuition:

copy
 Yₜ = μ + ∑ᵢ₌₁ᵠ θᵢ εₜ₋ᵢ + εₜ 

Where:

  • Yt = current observation
  • μ = mean of the series
  • θi = moving average coefficients
  • εt−i = past forecast errors

Use case: Applied in stock price anomaly detection, manufacturing output monitoring, and short-term demand forecasting, where identifying irregular error patterns helps uncover transient anomalies or process instability.

Key parameters:

Parameter Description Data Type Possible Values Default Value
ma_lag (q) Specifies the number of past forecast errors used to predict future values. int [0, 5] 1

ARIMA (AutoRegressive Integrated Moving Average)   

Explanation: ARIMA integrates three components:

  • AR (AutoRegressive) – Uses past values
  • I (Integrated) – Removes trend and non-stationarity through differencing
  • MA (Moving Average) – Models residual noise

By combining these, ARIMA captures both temporal dependence and trend shifts. It’s highly effective for stationary time series and detecting deviations that violate established statistical relationships over time.

Mathematical intuition:

copy
 Y′ₜ = c + ∑ᵢ₌₁ᵖ φᵢ Y′ₜ₋ᵢ + ∑ⱼ₌₁ᵠ θⱼ εₜ₋ⱼ + εₜ 

Where Yt′ is the differenced series after applying order d.

Use case: Widely used in economic indicator monitoring, system log anomaly detection, and production line forecasting, where detecting shifts from long-term trends or cyclical stability is crucial.

Key parameters:

Parameter Description Data Type Possible Values Default Value
ar_lag (p) Number of lag observations included in the model. int [0, 5] 1
ma_lag (q) Number of lagged forecast errors in the prediction equation. int [0, 5] 1
integration (d) Number of times the data is differenced to achieve stationarity. int [0, 5] 0

ARMA (AutoRegressive Moving Average)   

Explanation: ARMA combines the strengths of AR and MA models but assumes the series is stationary (no differencing). It captures both dependency on past observations and correlations in error terms, providing a balance between trend sensitivity and noise filtering.

Mathematical intuition:

copy
 Yₜ = c + ∑ᵢ₌₁ᵖ φᵢ Yₜ₋ᵢ + ∑ⱼ₌₁ᵠ θⱼ εₜ₋ⱼ + εₜ 

Use case: Used in network latency monitoring, server performance analysis, and stable process anomaly detection, where consistent behavior over time makes small deviations highly significant.

Key parameters:

Parameter Description Data Type Possible Values Default Value
ar_lag (p) Number of lag observations used in the model. int [0, 5] 1
ma_lag (q) Number of lagged forecast errors in the model. int [0, 5] 1

Auto ARIMA   

Explanation: Auto ARIMA automates ARIMA model selection by evaluating multiple combinations of parameters (p,d,q) and optionally seasonal parameters based on information criteria (like AIC or BIC). This allows optimal fitting without manual parameter tuning and ensures that the model adapts to changing time dynamics.

Mathematical intuition: Automatically selects parameters minimizing:

copy
 AIC = 2k − 2ln(L) 

Where k = number of parameters and L = maximum likelihood estimate.

Use case: Ideal for seasonal retail forecasting, utility consumption anomaly tracking, and automated operational monitoring, where time series exhibit both periodic and non-periodic irregularities.

Key parameters:

Parameter Description Data Type Possible Values Default Value
seasonal Indicates whether the model should account for seasonality. bool {True, False} True
ar_lag (p) Non-seasonal AR lag parameter. int [0, 5] 1
ma_lag (q) Non-seasonal MA lag parameter. int [0, 5] 1
integration (d) Non-seasonal differencing order. int [0, 5] 0
periodicity (s) Seasonal period length (e.g., 12 for monthly data). int [0, ∞) 12
integration (D) Seasonal differencing order. int [0, 5] 0
max_order Maximum total order of the model (optional). int [0, 5] 5

SARIMA (Seasonal ARIMA)

Explanation: SARIMA extends ARIMA by including seasonal autoregressive, differencing, and moving average components, allowing it to model periodic fluctuations (e.g., daily, monthly, yearly cycles). It excels in detecting anomalies that occur relative to seasonal expectations.

Mathematical intuition:

copy
 Φᴾ(Bˢ) φᵖ(B) (1 − B)ᵈ (1 − Bˢ)ᴰ Yₜ = Θᵠ(Bˢ) θᵩ(B) εₜ 

Where:

  • s = seasonal period
  • (p,d,q) = non-seasonal parameters
  • (P,D,Q) = seasonal parameters

Use case: Applied in energy load anomaly detection, climate monitoring, and sales seasonality analysis, where deviations from expected seasonal behavior signal potential anomalies.

Key parameters:

Parameter Description Data Type Possible Values Default Value
ar_lag (p) Non-seasonal AR term. int [0, 5] 1
ma_lag (q) Non-seasonal MA term. int [0, 5] 1
integration (d) Non-seasonal differencing. int [0, 5] 0
seasonal_ar (P) Seasonal AR term. int [0, 5] 1
seasonal_ma (Q) Seasonal MA term. int [0, 5] 1
seasonal_integration (D) Seasonal differencing. int [0, 5] 0
periodicity (s) Seasonal period (e.g., 12 for monthly seasonality). int [0, ∞) 12
enforce_stationarity Whether to enforce stationarity in the model. bool {True, False} False
enforce_invertibility Whether to enforce invertibility of the model. bool {True, False} False

Exponential Smoothing

Explanation: Exponential Smoothing predicts future values by giving exponentially decreasing weights to older observations. This emphasizes recent data while maintaining awareness of the overall trend. It’s highly responsive to sudden changes in trend or level.

Mathematical intuition:

copy
 Ŷₜ₊₁ = αYₜ + (1 − α)Ŷₜ 

Where:

  • Y^t+1 = forecast
  • α = smoothing parameter (0–1)
  • Yt = actual value

Use case: Common in inventory control, sales trend monitoring, and machine telemetry, where rapid detection of shifts or decays in recent behavior is vital.

Key parameters:

Parameter Description Data Type Possible Values Default Value
damped_trends Whether to apply damping to trend components. bool {True, False} False
season Type of seasonality to apply. string {‘Add’, ‘Mul’} ‘Add’
seasonal_periods Number of periods in a full seasonal cycle. int [1, ∞) 12

Holt-Winter’s Method   

Explanation: The Holt-Winter’s method (Triple Exponential Smoothing) extends exponential smoothing by adding trend and seasonal components. It can adapt to level changes, upward or downward trends, and cyclical variations, providing robust anomaly detection for periodic time series.

Mathematical intuition:

copy
Lₜ = α (Yₜ / Sₜ₋ₛ) + (1 − α)(Lₜ₋₁ + Tₜ₋₁)
Tₜ = β (Lₜ − Lₜ₋₁) + (1 − β)Tₜ₋₁
Sₜ = γ (Yₜ / Lₜ) + (1 − γ)Sₜ₋ₛ
Ŷₜ₊ₘ = (Lₜ + mTₜ) Sₜ₋ₛ₊ₘ

Where Lt = level, Tt = trend, St = seasonal component.

Use case: Widely used in retail demand forecasting, resource utilization monitoring, and temperature anomaly detection, where both seasonality and trend shifts must be captured for accurate anomaly identification.

Key parameters:

Parameter Description Data Type Possible Values Default Value
smoothing_level Smoothing factor for the level component. float [0, 1] 0.8
smoothing_trend Smoothing factor for the trend component. float [0, 1] 0.2
damping_trend Controls the damping of the trend component. bool {True, False} True
optimise Specifies whether parameters should be optimized automatically. string {‘Select’, ‘Manual’} ‘Select’
exponential Whether to use exponential trend smoothing. bool {True, False} False

Non-Time Series Anomaly Detection Algorithms

One-Class SVM

Explanation: One-Class SVM (Support Vector Machine) learns a decision boundary around the majority of (normal) data points in feature space. Points that fall outside this boundary are classified as anomalies. It works well in high-dimensional datasets and is effective when anomalies are rare and distinct from normal data.

Mathematical Intuition:

The algorithm finds a function f(x) that is positive for regions with high data density (normal points) and negative for low-density regions (anomalies). It aims to solve:

copy
min (1/2) ||w||² + (1 / (νn)) Σ ξᵢ − ρ
subject to:
(w · φ(xᵢ)) ≥ ρ − ξᵢ , ξᵢ ≥ 0 

Where:

  • ν: controls the upper bound of outliers
  • ξi: slack variables allowing soft margins
  • ϕ(x): kernel function mapping data to higher dimensions

Use case: Used in fraud detection, network intrusion detection, or novelty detection in industrial systems where normal behavior is well-defined but anomalies are rare.

Key parameters:

Note: In parameter ranges, square brackets [ ] include the boundary, while parentheses ( ) exclude it. Example: [1, ∞) → includes 1 and extends to infinity; (0, 1] → excludes 0 but includes 1.
Parameter Description Data Type Possible Values Default Value
kernel Specifies the kernel type to use. If none is given, 'rbf' is used. string {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} 'rbf'
degree Degree of the polynomial kernel function ('poly'). Ignored by other kernels. int [1, ∞) 3
gamma Kernel coefficient for 'rbf', 'poly', and 'sigmoid'. string {‘scale’, ‘auto’} 'scale'
coef0 Independent term in kernel function, significant in 'poly' and 'sigmoid'. float [0, 1] 0.0
tol Tolerance for stopping criterion. float [0, 1] 1e-3
nu Upper bound on fraction of training errors and lower bound on fraction of support vectors. float (0, 1] 0.5
shrinking Whether to use the shrinking heuristic. bool {False, True} True
max_iter Hard limit on solver iterations, or -1 for no limit. int [1, ∞), {-1} -1

Isolation Forest

Explanation: Isolation Forest identifies anomalies by isolating observations instead of modeling normal data points. It randomly selects a feature and splits the data based on a random threshold. Since anomalies are few and different, they are easier to isolate and require fewer splits. The average path length of trees is shorter for anomalies and longer for normal points.

Mathematical intuition:

The anomaly score is computed as:

copy
 s(x, n) = 2^(− E(h(x)) / c(n)) 

Where:

  • E(h(x)): average path length of observation x
  • c(n): average path length of unsuccessful search in a Binary Search Tree
  • Scores close to 1 → anomalies; near 0.5 → normal

Use case: Commonly used in fraud detection, network intrusion detection, manufacturing defect detection, and IoT sensor anomaly identification. In fraud detection, it helps uncover suspicious transactions that deviate from normal spending behavior. In network intrusion detection, it identifies abnormal access patterns or traffic spikes that may indicate a security breach.

Key parameters:

Parameter Description Data Type Possible Values Default Value
n_estimators Number of base estimators (trees) in the ensemble. int [1, ∞) 100
max_samples Number of samples to draw to train each estimator. string, int, float [1, ∞), [0, 1], {'auto'} 'auto'
contamination Proportion of outliers in the dataset. string, float {‘auto’}, (0, 0.5] 'auto'
max_features Number of features to draw for each estimator. int, float [1, ∞), [0, 1] 1.0
bootstrap Whether to sample training data with replacement. bool {False, True} False

Local Outlier Factor (LOF)

Explanation: Local Outlier Factor detects anomalies by comparing the local density of a data point to that of its neighbors. If a point has a substantially lower density than its neighbors, it’s considered an anomaly. LOF is particularly effective in detecting local anomalies rather than global ones.

Mathematical intuition:

LOF is based on the concept of local reachability density (LRD). The LOF score of a data point A is given by:

copy
 LOFₖ(A) = ( Σ_{B ∈ Nₖ(A)} [LRDₖ(B) / LRDₖ(A)] ) / |Nₖ(A)| 

Where:

  • Nk(A): k-nearest neighbors of A
  • LRDk(A): local reachability density of A
  • Values ≈ 1 → normal, > 1 → outlier

Use case: It is effectively utilized across diverse domains where identifying local deviations from normal patterns is essential. In fraud detection, it highlights customers or transactions that behave differently from their peer groups, helping uncover subtle or evolving fraud patterns. For customer segmentation, it detects outlier profiles within defined segments, such as unusually high-value or inactive customers, enabling more accurate targeting and retention strategies.

Key parameters:

Parameter Description Data Type Possible Values Default Value
n_neighbors Number of neighbors to use for kneighbors queries. int [1, n_samples) 20
algorithm Algorithm used to compute nearest neighbors. string {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’} 'auto'
leaf_size Leaf size for BallTree or KDTree, affecting speed and memory. int [1, ∞) 20
metric Metric for distance computation. string {'cityblock', 'cosine', 'euclidean', 'haversine', 'l1', 'l2', 'manhattan', 'nan_euclidean'} 'minkowski'
p Parameter for Minkowski metric (1=Manhattan, 2=Euclidean). float [1, ∞) 2
contamination Proportion of outliers in the dataset. float, string (0, 0.5], {'auto'} 'auto'

Last Updated 2025-11-11 12:15:26 +0530 IST