Selecting factors is important in Quantitative Finance. Due to the explosion of information, it is becoming harder to select and monitor these factors without additional help such as using automatic algorithms. In machine learning, this topic is studied in the area known as factor selection.

Factor selection in finance can be a unique and specialised challenge, without an off-the-shelf solution. Instead, one should consider some adaptive approaches to factor selection in a well-controlled manner. Two key properties that we want to evaluate are predictability and diversity.

Predictability

Predictability means that chosen factors will have predictive power for a target variable. For example, if the target variable is the S&P 500 index, there should be a collection of factors which are potentially related to the index. These include factors such as technical indicators or macroeconomic variables.

The factor selection algorithm is usually backed by supervised models, such as lasso regression, or a gradient boosting machine. These algorithms can produce a measure of importance of input data, which serve for selecting the most useful factors from a larger set.

Diversity

However, blindly using the factor selection algorithms may lead to a pointless answer, such as the selected factors providing redundant information. To alleviate this issue, we introduce the second metric: diversity. Diversity means the factors selected from this system should have low correlation.

To enforce diversity, one could cluster the variables into highly correlated groups using some dimensional reduction algorithms. In this process, we care about the correlation within the groups and lack of correlation across groups. Each group then becomes a new derived variable on which we can further run the factor selection algorithm.

Factor Elimination

A benefit of the system is that instead of generating the result in one round of execution, we can run it several times in an iterative fashion. This allows us to gradually refine the process of eliminating factors. Using cloud compute, we can easily and efficiently go through huge data sets and perform this analysis.

For example, we target predicting the S&P 500 index for the next trade day by around 1000 factors. It is clearly unwise to use all 1000 factors, and we would want a more concise model which can be included in the final model.

If we run the system for 5 rounds, and it roughly halves the number of remaining factors for each round. Finally, it ends up with about 30 factors which can now be manually inspected to ensure they are sensible.

For more information stay tuned for our next Cognitive Cloud event focused around AI and Machine Learning in Trading.