Recommendation Algorithms

Recommendation algorithms leverage various datasets, such as historical transactions or interaction data, item attributes, and user demographics, to analyze patterns in user behavior and generate highly personalized recommendations that align with user interests. They play a critical role in driving interactions, improving retention, and enhancing the user experience across various platforms

In QuickML, we currently support Information Retrieval algorithms designed to meet various business needs for building diverse recommendation models.

Types of Recommendation Systems

Types of recommendation models that are being used in real-time environments include:

Sequential recommendation system
Personalized recommendation system
Recurrence cycle recommendation system

a. Information retrieval

A recommender system is an intelligent tool that analyses past interactions, preferences, and behavior to suggest personalized products that are likely to be of interest to each user. It uses advanced algorithms to understand their tastes and needs, making accurate recommendations that match users’ unique preferences.

The information retrieval algorithms aim to generate accurate suggestions that enhance user experience and engagement by providing personalized recommendations.

The algorithms that are used in each type of recommender systems within QuickML include.

Sequential recommendation system

Sequential Recommendation algorithms utilize machine learning techniques to analyze the historical interaction data and predict the next item or items likely to be consumed in a sequence. This algorithm considers the sequential order of your past interactions to suggest products that align with your buying patterns.

Algorithm that is used to build these models are:

i. SubSequence

SuBSeq or Succinct BWT-Based Sequence Prediction, is a powerful algorithm that utilizes the Burrows-Wheeler Transform (BWT), designed specifically for sequential recommendation systems, leveraging advanced techniques to extract meaningful patterns from transactions/interactions data. It focuses on subsequence mining, enabling it to identify recurring sequences and intricate patterns within user behavior sequences. Through a combination of efficient data processing and pattern recognition, SubSeq excels in capturing temporal dependencies, user preferences, and contextual nuances, ultimately leading to highly personalized recommendations tailored to individual behavior. It is particularly efficient in terms of memory usage and computational complexity

Hyper-parameters:

Parameter	Description	Data Type	Possible Values	Default Values
transactions_column	Column name for itemsets in the input data	str	column name in the dataset	specified
min_similar_sequence	Minimum number of similar sequences required	int	any positive integer	2

Sample scenarios to choose this algorithm:

An online media platform aims to enhance user engagement by predicting the sequences of items users might consume based on their past behavior and interactions. The goal is to understand and anticipate user preferences, enabling the platform to provide personalized content recommendations. For instance, if a user watches three comedy movies, then switches to historical documentaries, a sequential recommendation algorithm identifies this change in consumption pattern and suggests more documentaries or a mix of both genres

Personalized Recommendation system

Personalized recommendation algorithms aim to provide tailored recommendations to individual users based on their preferences, past interactions, product attributes, and demographic information. These algorithms leverage user-item interaction data to identify patterns and similarities among users and items.

Algorithms that are used to build these models are:

i. LightFM

LightFM is a robust recommendation algorithm leveraging user profiles, product details, and interaction data to provide personalized recommendations. By merging collaborative filtering with matrix factorization techniques, LightFM captures user preferences and item features, ensuring accurate recommendations in the given domain. Analyzing user profiles and item characteristics, it delivers more relevant suggestions, simplifying the discovery of related products and adjusting recommendations promptly based on user behavior. With insights into user demographics, preferences, and interaction history, LightFM enhances recommendation experiences across diverse industries and applications, fostering engagement and satisfaction

Hyper-parameters:

Parameter	Description	Data Type	Possible Values	Default Values
no_components	The dimensionality of the latent feature embeddings. Higher values capture more complex relationships but may overfit if too large for the dataset.	int	Any positive integer	10
n	For k-OS training, this specifies the maximum number of positive samples considered for each user during each update step.	int	Any positive integer	10
learning_schedule	Determines the learning rate schedule to be used. Options include adaptive schedules like adagrad or adadelta, which adjust learning rates based on gradient information during training.	str	'adagrad', 'adadelta'r	'adagrad'
loss	Specifies the loss function to optimize: 'logistic': no penalty is added; 'bpr': add a L2 penalty term and it is the default choice; 'warp': add a L1 penalty term; 'warp-kos': both L1 and L2 penalty terms are added.	int	Any positive integer	10
learning_rate	The initial learning rate for the gradient descent optimizer. Smaller values result in slower but potentially more stable convergence.	float	Any positive float	0.05
item_alpha	L2 regularization strength on item feature embeddings. Helps prevent overfitting by penalizing large weights. Higher values enforce stronger regularization.	float	Any non-negative float	0.0
user_alpha	L2 regularization strength on user feature embeddings. Helps prevent overfitting by penalizing large weights. Similar behavior to item_alpha.	float	Any non-negative float	0.0
train_split_ratio	used for training. The remaining data is reserved for evaluation. A higher ratio means more data for training but less for testing.	float	between 0 and 1	0.8
td_uid_column	The name of the column in the transactions dataset representing unique user IDs.	str	Any valid column name in the dataset	User-specified
ud_uid_column	The name of the column in the user features dataset representing unique user IDs.	str	Any valid column name in the dataset	User-specified
pd_pid_column	The name of the column in the product features dataset representing unique product IDs.	str	Any valid column name in the dataset	User-specified

Sample scenarios to choose this algorithm:

In e-commerce platforms, personalized recommendation algorithms suggest relevant items based on real-time user interactions and search activity. By analyzing user behavior, item characteristics, and transaction data, these models generate highly personalized suggestions that resonate with individual users. This approach not only enhances the user experience but also increases business value by promoting relevant and related products.
In e-learning platforms, facilitating the discovery of related courses that complement users’ learning interests and objectives. These models enhance the discovery by analyzing the course similarities and user interests, ultimately leading to increased exploration and engagement.

ii. Pixie

Building systems that provide high-quality personalized recommendations presents a major challenge due to a massive pool of items and a large number of users. These recommendations should be generated responsive to the user actions on demand.

Pixie is a scalable real-time graph-based recommendation system that addresses this problem using its technique called random walks to explore a large graph of interconnected items to recommend relevant items to users. The graph comprises nodes and edges, where nodes represent items or users, and edges represent relationships or interactions between them. This algorithm is designed to be highly scalable, allowing it to operate efficiently in environments with a vast number of items and users.

Hyper-parameters:

Parameter	Description	Data Type	Possible Values	Default Values
user_id_colum	unique column name of the user_dataset	str	column name in the dataset	specified
product_id_column	unique column name of the product_dataset	str	column name in the dataset	specified
depth	Number of steps in each random walk.	int	Any Positive Integers	10
n_epochs	Maximum number of steps for the random walk process.	int	Any Positive Integers	50
higher_weight	Weight for biasing the graph edge towards more likely connections.	float	A float between 0 and 1	1.0
lower_weight	Weight for biasing the graph edge towards less likely connections.	float	A float between 0 and 1	0.0001
recommendation_type	Type of prediction to be made.	str	'fbt', 'cwbab'	'cwbab'
with_feature_encoding	Whether to use encoding of user features.	bool	True, False	False

Sample scenarios to choose this algorithm:

Pixie can suggest products to users based on their browsing history and purchase behavior, leading to improved product discovery and higher sales conversion rates.

For example:

An e-commerce platform specializing in consumer electronics leverages Pixie to improve its recommendation system.

When a user purchases a smartphone, the algorithm uses recommendation_type: fbt (Frequently Bought Together) to recommend necessary co-purchases, such as screen protectors or chargers, which are immediately relevant to the current purchase.

At the same time, if algorithm uses recommendation_type: “cwbab (Customers who bought also bought)” to suggest complementary products, like smartwatches or Bluetooth speakers, which the customer might consider buying later. By differentiating between immediate and potential future needs, the platform not only increases the average order value but also builds long-term customer engagement and satisfaction.

Recurrence cycle recommendation system

The Recurrence Cycle Recommendation Model is trained to identify and suggest items based on recurring patterns in user behavior. By analyzing historical data, it detects the recurring cycles or intervals at which users are likely to repeat specific interactions, such as purchases, subscriptions, or engagements. The Recurrence Finder algorithm predicts the items users are most likely to repurchase, simplifying the process of restocking products or re-engaging with preferred items Algorithms that are used to build the recurring recommendation models are:

i. Recurrence Finder

Recurrence Finder identifies and predicts recurring events, including customer product purchases, event attendance, and daily alarm settings, leveraging historical timestamps to forecast future occurrences. By analyzing patterns from historical events, it provides the future occurrences of events, aiding in effective planning and decision-making. This model assists in optimizing strategies for customer retention, event management, and time management tools.

With its ability to predict future occurrences, Recurrence Finder enhances efficiency and productivity across various domains, from e-commerce to healthcare.

Hyperparameters:

Parameter	Description	Data Type	Possible Values	Default Values
user	The name of the column in the dataset that contains unique user IDs. This column is essential for identifying the recurrence of transactions based on users.	str	Any valid column name in the dataset	User-specified
product	The name of the column in the dataset that contains unique product IDs. This column is used to track the recurrence of specific products in transactions.	str	Any valid column name in the dataset	User-specified
timestamp_column	The name of the column in the dataset that contains timestamps for the transactions. This column is critical for calculating the time intervals between recurring transactions.	str	Any valid column name in the dataset	User-specified
quantity	A boolean flag indicating whether the recurrence analysis should consider transaction quantities. If True, the quantity_column will be used to incorporate quantity-based recurrence patterns.	bool	True, False	User-specified
quantity_column	The name of the column in the dataset that contains the quantity of products purchased in each transaction. This is required if the quantity parameter is set to True.	str	Any valid column name in the dataset	User-specified

Sample scenarios to choose this algorithm:

Subscription-based businesses can utilize recurrence predictions to forecast when customers are likely to renew their subscriptions. This allows them to implement targeted retention strategies, such as personalized offers or reminders, to increase renewal rates and customer loyalty.
Service-based businesses, such as healthcare providers or salons, can use recurrence predictions to schedule appointments efficiently. By anticipating when clients are likely to book appointments, they can optimize staff schedules, minimize wait times, and enhance customer satisfaction.

Data Validation criteria

Recommendation models are trained using three datasets:

Transactions/Interactions data
Users’ demographic data
Items’ attribute data

The Transactions dataset contains transaction details, such as transactionID, userID, itemID, order value, purchase date, timestamp, etc., capturing each purchase the user has made.

The Users dataset holds demographic information about the users of the business, providing insights into their characteristics and preferences.

The Items/Product features dataset contains attributes and characteristics of the items being purchased by users, such as category, brand, and price.

A validation check involving three datasets is automatically performed by the algorithm itself before model training begins. If any of the following criteria are not met, the algorithm stops training and throws an error:

No missing values should be present in the transactions, users, or items datasets.
Any userID or itemID/productID present in the Transactions dataset must also exist in their respective Users or Items datasets.

Last Updated 2025-01-30 17:13:56 +0530 +0530

Yes

Thank you for your feedback!

Send your feedback to us

Skip

Submit