Selecting factors is important in Quantitative Finance. Due to the explosion of information, it is becoming harder to select and monitor these factors without additional help such as using automatic algorithms. In machine learning, this topic is studied in the area known as factor selection. Factor selection in finance can be a unique and specialised challenge, without an off-the-shelf solution. Instead, one should consider some adaptive approaches to factor selection in a well-controlled manner. Two key properties that we want to evaluate are predictability and diversity.


Predictability means that chosen factors will have predictive power for a target variable. For example, if the target variable is the S&P 500 index, there should be a collection of factors which are potentially related to the index. These include factors such as technical indicators or macroeconomic variables. The factor selection algorithm is usually backed by supervised models, such as lasso regression, or a gradient boosting machine. These algorithms can produce a measure of importance of input data, which serve for selecting the most useful factors from a larger set.


However, blindly using the factor selection algorithms may lead to a pointless answer, such as the selected factors providing redundant information. To alleviate this issue, we introduce the second metric: diversity. Diversity means the factors selected from this system should have low correlation. To enforce diversity, one could cluster the variables into highly correlated groups using some dimensional reduction algorithms. In this process, we care about the correlation within the groups and lack of correlation across groups. Each group then becomes a new derived variable on which we can further run the factor selection algorithm.

Factor Elimination

A benefit of the system is that instead of generating the result in one round of execution, we can run it several times in an iterative fashion. This allows us to gradually refine the process of eliminating factors. Using cloud compute, we can easily and efficiently go through huge data sets and perform this analysis. For example, we target predicting the S&P 500 index for the next trade day by around 1000 factors. It is clearly unwise to use all 1000 factors, and we would want a more concise model which can be included in the final model. If we run the system for 5 rounds, and it roughly halves the number of remaining factors for each round. Finally, it ends up with about 30 factors which can now be manually inspected to ensure they are sensible. For more information stay tuned for our next Cognitive Cloud event focused around AI and Machine Learning in Trading.


Date(s) - 01/01/1970
12:00 AM - 12:00 AM


600 5th ave. NY, NY

Machine Learning Overview

Machine learning is emerging. The technique backing technologies such as Google’s game playing AlphaGO, self-driving cars, and cancer diagnosis assistants, is now one of the more promising trends in the financial industry.

Many of us have come to know it in different ways; artificial intelligence (AI), machine learning (ML), or deep learning (DL). We will start by clarifying these concepts.

ai; artificial intelligence; trading

Artificial intelligence is intelligence exhibited by machines. It is the general topic that encompasses a vast range of subfields. Machine learning is one of the most successful approaches to realising AI. In fact, some researchers believe that it is the key to True AI. Furthermore, deep learning originated from neural networks, a classic machine learning algorithm modeled on the human brain and nervous system, and is now a powerful toolbox in machine learning. Most recent successes of machine learning in application areas are likely to be driven by deep learning, such as image and speech recognition.

Practical Application  

Machine learning managed to outperform humans on many well-defined problems. For example, the ability to recognise an object in a photo and identify a person from his or her face. Those problems were heavily studied for decades, and some mature non-machine learning techniques worked well. However, solutions based on machine learning really push the performance to the line that traditional methods cannot reach. These discoveries have happened only in the last 5 years. Machines beating humans is no longer simply a headline for news, this kind of system is usable in real life practical application.

Apart from these end-to-end applications, machine learning now plays an important role in some more complicated systems such as Siri or Google Now. Machine learning also provides the fuel to related areas such as data mining. For example, when you are looking for something in Google or Amazon, you may have personally customised results that differ from others. Machine learning is not only for the high-tech internet companies, but also for many established industries, such as drug designs in medicine, automatic summarising of documents in legal, and robo-advisors in trading.

The Rising Trend of Deep Learning

Deep learning significantly reduces the challenges around building an in-house solution because the framework is very standardised and the software for such a framework is available for free.

As an example, many experienced PhDs have spent years building a facial recognition system that could achieve 80% accuracy and have published thousands of research papers. Now, a bachelor with basic programming skills can build a 90% accuracy system in a day by reading a blog post. By learning some slightly more advanced techniques, the system can be further improved to roughly 99% accuracy, which was nearly impossible ten years ago, even for the most successful researchers and industry practitioners in the field.

Readers can try this too through Microsoft Cognitive Services Face API

Interestingly, the top-tier players in the industry still have clear advantages, because the improvement from 99% to 99.1% becomes extremely hard for the beginners in deep learning. The entry-level is now 99.95% for a facial recognition start-up.

Machine Learning in Finance


FICO score is the well-known measurement of customer’s credit risk in the lending industry. The early version of the FICO score implementation was based on logistic regression. Surprisingly, logistic regression is still popular in the era of deep learning, but we have much improved variants of it. The decision engine of many lending companies, especially those internet-based lenders, is built upon machine learning algorithms. This kind of fully automatic system is proven to be more reliable compared to human decision.


Robo-advisors, such as Betterment, Schwab and Vanguard, provide portfolio management based on algorithms. Although the algorithms were largely rule based in the early stages, machine learning appears to be the future direction. The rule-based approach heavily depends on human expertise, but some machine learning algorithms can generate rules without human effort. However, this does not suggest that a machine learning solution is designed to exclude human bits. In fact, there is a topic in machine learning investigating how to combine human expertise and machine learning algorithms with constraints.

For more information, join our April 20th Cognitive Cloud workshop in London, which will be focused on AI and machine learning in trading.


Date(s) - 01/01/1970
12:00 AM - 12:00 AM


600 5th ave. NY, NY