Recommendation Algorithms

Recommendation algorithms leverage various datasets, such as historical transactions or interaction data, item attributes, and user demographics, to analyze patterns in user behavior and generate highly personalized recommendations that align with user interests. They play a critical role in driving interactions, improving retention, and enhancing the user experience across various platforms

In QuickML, we currently support Information Retrieval algorithms designed to meet various business needs for building diverse recommendation models.

Types of Recommendation Systems

Types of recommendation models that are being used in real-time environments include:

  • Sequential recommendation system
  • Personalized recommendation system
  • Recurrence cycle recommendation system

a. Information retrieval

A recommender system is an intelligent tool that analyses past interactions, preferences, and behavior to suggest personalized products that are likely to be of interest to each user. It uses advanced algorithms to understand their tastes and needs, making accurate recommendations that match users’ unique preferences.

The information retrieval algorithms aim to generate accurate suggestions that enhance user experience and engagement by providing personalized recommendations.

The algorithms that are used in each type of recommender systems within QuickML include.

  1. Sequential recommendation system

    Sequential Recommendation algorithms utilize machine learning techniques to analyze the historical interaction data and predict the next item or items likely to be consumed in a sequence. This algorithm considers the sequential order of your past interactions to suggest products that align with your buying patterns.

    Algorithm that is used to build these models are:

    i. SubSequence

    SuBSeq or Succinct BWT-Based Sequence Prediction, is a powerful algorithm that utilizes the Burrows-Wheeler Transform (BWT), designed specifically for sequential recommendation systems, leveraging advanced techniques to extract meaningful patterns from transactions/interactions data. It focuses on subsequence mining, enabling it to identify recurring sequences and intricate patterns within user behavior sequences. Through a combination of efficient data processing and pattern recognition, SubSeq excels in capturing temporal dependencies, user preferences, and contextual nuances, ultimately leading to highly personalized recommendations tailored to individual behavior. It is particularly efficient in terms of memory usage and computational complexity

    Hyper-parameters:

    Parameter Description Data Type Possible Values Default Values
    transactions_column Column name for itemsets in the input data str column name in the dataset specified
    min_similar_sequence Minimum number of similar sequences required int any positive integer 2

    Sample scenarios to choose this algorithm:

    An online media platform aims to enhance user engagement by predicting the sequences of items users might consume based on their past behavior and interactions. The goal is to understand and anticipate user preferences, enabling the platform to provide personalized content recommendations. For instance, if a user watches three comedy movies, then switches to historical documentaries, a sequential recommendation algorithm identifies this change in consumption pattern and suggests more documentaries or a mix of both genres

  2. Personalized Recommendation system

    Personalized recommendation algorithms aim to provide tailored recommendations to individual users based on their preferences, past interactions, product attributes, and demographic information. These algorithms leverage user-item interaction data to identify patterns and similarities among users and items.

    Algorithms that are used to build these models are:

    i. LightFM

    LightFM is a robust recommendation algorithm leveraging user profiles, product details, and interaction data to provide personalized recommendations. By merging collaborative filtering with matrix factorization techniques, LightFM captures user preferences and item features, ensuring accurate recommendations in the given domain. Analyzing user profiles and item characteristics, it delivers more relevant suggestions, simplifying the discovery of related products and adjusting recommendations promptly based on user behavior. With insights into user demographics, preferences, and interaction history, LightFM enhances recommendation experiences across diverse industries and applications, fostering engagement and satisfaction

    Hyper-parameters:

    Parameter Description Data Type Possible Values Default Values
    no_components The dimensionality of the latent feature embeddings. Higher values capture more complex relationships but may overfit if too large for the dataset. int Any positive integer 10
    n For k-OS training, this specifies the maximum number of positive samples considered for each user during each update step. int Any positive integer 10
    learning_schedule Determines the learning rate schedule to be used. Options include adaptive schedules like adagrad or adadelta, which adjust learning rates based on gradient information during training. str 'adagrad', 'adadelta'r 'adagrad'
    loss Specifies the loss function to optimize:
    • 'logistic': no penalty is added;
    • 'bpr': add a L2 penalty term and it is the default choice;
    • 'warp': add a L1 penalty term;
    • 'warp-kos': both L1 and L2 penalty terms are added.
    int Any positive integer 10
    learning_rate The initial learning rate for the gradient descent optimizer. Smaller values result in slower but potentially more stable convergence. float Any positive float 0.05
    item_alpha L2 regularization strength on item feature embeddings. Helps prevent overfitting by penalizing large weights. Higher values enforce stronger regularization. float Any non-negative float 0.0
    user_alpha L2 regularization strength on user feature embeddings. Helps prevent overfitting by penalizing large weights. Similar behavior to item_alpha. float Any non-negative float 0.0
    train_split_ratio used for training. The remaining data is reserved for evaluation. A higher ratio means more data for training but less for testing. float between 0 and 1 0.8
    td_uid_column The name of the column in the transactions dataset representing unique user IDs. str Any valid column name in the dataset User-specified
    ud_uid_column The name of the column in the user features dataset representing unique user IDs. str Any valid column name in the dataset User-specified
    pd_pid_column The name of the column in the product features dataset representing unique product IDs. str Any valid column name in the dataset User-specified

    Sample scenarios to choose this algorithm:

    • In e-commerce platforms, personalized recommendation algorithms suggest relevant items based on real-time user interactions and search activity. By analyzing user behavior, item characteristics, and transaction data, these models generate highly personalized suggestions that resonate with individual users. This approach not only enhances the user experience but also increases business value by promoting relevant and related products.
    • In e-learning platforms, facilitating the discovery of related courses that complement users’ learning interests and objectives. These models enhance the discovery by analyzing the course similarities and user interests, ultimately leading to increased exploration and engagement.

    ii. Pixie

    Building systems that provide high-quality personalized recommendations presents a major challenge due to a massive pool of items and a large number of users. These recommendations should be generated responsive to the user actions on demand.

    Pixie is a scalable real-time graph-based recommendation system that addresses this problem using its technique called random walks to explore a large graph of interconnected items to recommend relevant items to users. The graph comprises nodes and edges, where nodes represent items or users, and edges represent relationships or interactions between them. This algorithm is designed to be highly scalable, allowing it to operate efficiently in environments with a vast number of items and users.

    Hyper-parameters:

    Parameter Description Data Type Possible Values Default Values
    user_id_colum unique column name of the user_dataset str column name in the dataset specified
    product_id_column unique column name of the product_dataset str column name in the dataset specified
    depth Number of steps in each random walk. int Any Positive Integers 10
    n_epochs Maximum number of steps for the random walk process. int Any Positive Integers 50
    higher_weight Weight for biasing the graph edge towards more likely connections. float A float between 0 and 1 1.0
    lower_weight Weight for biasing the graph edge towards less likely connections. float A float between 0 and 1 0.0001
    recommendation_type Type of prediction to be made. str 'fbt', 'cwbab' 'cwbab'
    with_feature_encoding Whether to use encoding of user features. bool True, False False

    Sample scenarios to choose this algorithm:

    Pixie can suggest products to users based on their browsing history and purchase behavior, leading to improved product discovery and higher sales conversion rates.

    For example:

    An e-commerce platform specializing in consumer electronics leverages Pixie to improve its recommendation system.

    When a user purchases a smartphone, the algorithm uses recommendation_type: fbt (Frequently Bought Together) to recommend necessary co-purchases, such as screen protectors or chargers, which are immediately relevant to the current purchase.

    At the same time, if algorithm uses recommendation_type: “cwbab (Customers who bought also bought)” to suggest complementary products, like smartwatches or Bluetooth speakers, which the customer might consider buying later. By differentiating between immediate and potential future needs, the platform not only increases the average order value but also builds long-term customer engagement and satisfaction.

  3. Recurrence cycle recommendation system

    The Recurrence Cycle Recommendation Model is trained to identify and suggest items based on recurring patterns in user behavior. By analyzing historical data, it detects the recurring cycles or intervals at which users are likely to repeat specific interactions, such as purchases, subscriptions, or engagements. The Recurrence Finder algorithm predicts the items users are most likely to repurchase, simplifying the process of restocking products or re-engaging with preferred items Algorithms that are used to build the recurring recommendation models are:

    i. Recurrence Finder

    Recurrence Finder identifies and predicts recurring events, including customer product purchases, event attendance, and daily alarm settings, leveraging historical timestamps to forecast future occurrences. By analyzing patterns from historical events, it provides the future occurrences of events, aiding in effective planning and decision-making. This model assists in optimizing strategies for customer retention, event management, and time management tools.

    With its ability to predict future occurrences, Recurrence Finder enhances efficiency and productivity across various domains, from e-commerce to healthcare.

    Hyperparameters:

    Parameter Description Data Type Possible Values Default Values
    user The name of the column in the dataset that contains unique user IDs. This column is essential for identifying the recurrence of transactions based on users. str Any valid column name in the dataset User-specified
    product The name of the column in the dataset that contains unique product IDs. This column is used to track the recurrence of specific products in transactions. str Any valid column name in the dataset User-specified
    timestamp_column The name of the column in the dataset that contains timestamps for the transactions. This column is critical for calculating the time intervals between recurring transactions. str Any valid column name in the dataset User-specified
    quantity A boolean flag indicating whether the recurrence analysis should consider transaction quantities. If True, the quantity_column will be used to incorporate quantity-based recurrence patterns. bool True, False User-specified
    quantity_column The name of the column in the dataset that contains the quantity of products purchased in each transaction. This is required if the quantity parameter is set to True. str Any valid column name in the dataset User-specified

    Sample scenarios to choose this algorithm:

    • Subscription-based businesses can utilize recurrence predictions to forecast when customers are likely to renew their subscriptions. This allows them to implement targeted retention strategies, such as personalized offers or reminders, to increase renewal rates and customer loyalty.
    • Service-based businesses, such as healthcare providers or salons, can use recurrence predictions to schedule appointments efficiently. By anticipating when clients are likely to book appointments, they can optimize staff schedules, minimize wait times, and enhance customer satisfaction.

    Data Validation criteria

    Recommendation models are trained using three datasets:

    1. Transactions/Interactions data
    2. Users’ demographic data
    3. Items’ attribute data

    The Transactions dataset contains transaction details, such as transactionID, userID, itemID, order value, purchase date, timestamp, etc., capturing each purchase the user has made.

    The Users dataset holds demographic information about the users of the business, providing insights into their characteristics and preferences.

    The Items/Product features dataset contains attributes and characteristics of the items being purchased by users, such as category, brand, and price.

    A validation check involving three datasets is automatically performed by the algorithm itself before model training begins. If any of the following criteria are not met, the algorithm stops training and throws an error:

    • No missing values should be present in the transactions, users, or items datasets.
    • Any userID or itemID/productID present in the Transactions dataset must also exist in their respective Users or Items datasets.

Last Updated 2024-12-27 14:14:58 +0530 +0530