Feature Generation

Feature generation is the process of creating new features from one or multiple existing features, potentially for use in statistical analysis. This process adds new information to be accessible during the model construction and therefore hopefully result in a more accurate model.

  1. Explore kit

    ExploreKit generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance according to user-selected criteria.

    For a fixed number of iterations, we generate candidate features based on the current feature set. F0 is the original feature set. We then rank the candidate features by a background classifier. Starting with the highest ranking candidate, we evaluate the set containing the current features and the current candidate feature on the classifier, which is either a decision tree, support vector machine, or random forest. Based on the outcome of this evaluation, we either add the candidate feature to the current feature set, end the iteration and go to the next iteration, mark this candidate feature as best so far and continue to the next candidate feature in the ranking, or discard this candidate feature and continue to the next candidate feature.

  2. AutoLearn

    This is a regression-based, feature-learning algorithm. Being data-driven, it requires no domain knowledge and is hence generic. Such a representation is learnt by mining pairwise feature associations, identifying the linear or non-linear relationship between each pair, applying regression and selecting those relationships that are stable and improve the prediction performance. If the number of numerical(continuous) columns is greater than number of categorical columns use autolearn; otherwise, use explorekit.

Last Updated 2023-06-15 17:14:14 +0530 +0530

ON THIS PAGE