0% found this document useful (0 votes)
87 views7 pages

C MDA

Uploaded by

iyaddurrani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views7 pages

C MDA

Uploaded by

iyaddurrani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

www.technicalanalysts.

com
NEWS NEWS
10 11

Introducing our new Compliance Officer:


Cluster-based Feature Selection
Meet Vince Harvey, Compliance Cubed Ltd
Interview with Richard Adcock MSTA STA Company Secretary
Abstract
side of the regulators. found ways to make it work. With a
clear business plan that is effectively Feature importance in machine learning shows how much information a feature
Richard: What do you think are communicated to staff, new ways contributes when building a supervised learning model, in order that we can
the biggest challenges facing to build teams will emerge and new exclude uninformative features from the predictive model (feature selection). It also
organisations when it comes technologies will become available. improves human interpretation of the resulting model. Recently, Man & Chan (2021)
to navigating the regulatory The FCA, for example has, been compared the stability of features selected by different methods such as MDA, SHAP,
requirements? talking about innovation as the key or LIME when they are subjected to the computational randomness of the selection
Vince: The issue I hear most often is to reaching a wider audience - it has algorithms. In this article, we study whether the cluster-based MDA (cMDA) method
about moving goal posts - it is difficult calculated that in the UK there are proposed by López de Prado (2020) improves predictive performance, feature
Richard: Delighted to meet you Vince to run a business as well as monitor around 7.5m people with investible stability and model interpretability. We applied cMDA to two synthetic datasets,
and to have you join the STA Ethics regulators’ websites to keep on assets of £10,000 or more that are a clinical public dataset and two financial datasets. In all cases, the stability and
and Compliance Committee as our track. The other is the sheer volume sitting on deposit. Individually, those interpretability of the cMDA-selected features are superior to MDA-selected features.
Compliance Officer. Tell us a little and ‘legalese’ - I mustn’t complain clients aren’t attractive to mainstream
about yourself. though because, if understanding the advice firms but - with growing
Vince: I’m married to Brigitte and we rules was easy, firms wouldn’t need acceptance of virtual interactions and
have three sons, three daughters in law compliance consultants. online services - this is a market that
Xin Man 1. Introduction
and five grandchildren. I have worked could become more attractive. Xin Man works as a quantitative research
Financial investors are often reluctant to trust machine-learning algorithms because
in financial services pretty much since Richard: Will there be an end to Resilience is another area which the consultant for PredictNow.ai. and QTS
of their “black-box” nature: there is no transparency and no justification of how they
I graduated from university and in regulatory ‘creep’? FCA has been emphasising in its Capital Management, LLC. She holds a
arrive at their predictions. Feature selection is a technique that attempts to improve
that time have worked for a range of Vince: Unfortunately, not for the communications. Having enough cash masters degree in Financial Mathematics
the transparency and interpretability of machine learning models by ranking the
businesses, moving into a compliance foreseeable future! The UK has to weather storms is one thing, but from McMaster University, Canada.
importance of the features used. However, feature selection algorithms often
role in 2000. Compliance Cubed was brought into UK legislation all of the making sure that systems are robust
suffer from a stability problem, as discussed by Man & Chan (2021). As we change
set up in 2013 and keeps me very busy, previous EU regulations, but we shall and that the right people are engaged
fortunately. I am looking forward to see in 2021 and beyond whether requires thought on what the new the random seed in training a machine learning model, the top selected features
contributing to the discussions in the our Government tries to maintain ‘normal’ could look like. may also change, reducing interpretability. In this paper, we investigate a cluster-
Ethics and Compliance Committee and equivalence or allow financial services based technique pioneered by López de Prado (2020) to see whether it can improve
hope that I am able to add some value. regulations to diverge. It’s possible, Richard: What impact do you think stability and interpretability of the important features.
given the volume of financial services leaving Europe will have on the
Richard: What sort of clients do activity with other European countries, regulatory landscape? Clustering is an effective unsupervised method for grouping features so that the
you advise? that we will attempt to follow EU Vince: As indicated in an earlier answer, those in the same group are more like each other than those in other groups. A good
Vince: Throughout my career, the developments - without a seat at the the impact depends on the political clustering algorithm can minimise the between-clusters similarity and maximise
word diversification has been central table in determining what shape those lead provided; our regulators will have the within-cluster similarity. Popular clustering approaches include K-means and
to many conversations. Having been developments take. Alternatively, to implement the will of Parliament. hierarchical clustering. The K-means algorithm requires the user to fix the number
made redundant four times previously we could look to strike out with our As I write, the terms of our relationship K of clusters. It is intended for situations in which all features are numerical and
and found new employers, when own set of rules and seek to compete after the transition are still to be Euclidean distance is chosen as the metric. By contrast, hierarchical algorithms do
it happened the fifth time, I finally globally. Either way, our rules will agreed and I have little confidence not require K to be predetermined and can also adapt to categorical features and
got the message that I should do continue to change. that it will be a ‘good’ deal. The EU non-Euclidean metrics. Clusters at each level of the hierarchy are created by merging
something for myself. While many cannot appear to be generous to the clusters at a lower level and we end up with a single cluster containing all features at
of my clients are investment firms, Richard: As a new ‘normal’ begins to UK as that would cause political issues the highest level. More details regarding clustering algorithms can be seen in Hastie,
I also work with some insurance emerge, what can companies do to with other nations. My guess is that, Ernest P. Chan Tobshirani and Friedman (2009) and López de Prado (2020).
brokers, payment-service businesses, ensure they are Covid proof? in financial services, we will adopt
employee-benefits advisers, credit Vince: Having a clear business plan is a equivalence (maybe not officially) and Dr Ernest Chan is the Managing Member With a clustering algorithm, the full feature space is split into multiple non-overlapping
brokers and mortgage advisers. This good start - too many businesses are so our rules will evolve as though we of QTS Capital Management, LLC., a clusters. Using the rank-based importance score, we calculate an importance score
provides a varied workload and some ticking along doing the same things were still in Europe. commodity-pool operator and trading for each cluster. Those clusters with scores higher than a chosen threshold can be
protection for the business if one they have done for years. Many people advisor specialising in crisis alpha selected for training a machine learning model. Clustering features with ‘similar’
sector hits a difficult patch. have said to me that virtual meetings Richard: Thank you, Vince, for your and machine learning. He also runs information is a straightforward way to isolate irrelevant features. Features that
were thought to be likely to be normal time. We look forward to utilising your PredictNow.ai, A financial machine- do not belong to any important clusters can be dropped. The chance of losing
Part of my work has been to help new in around five years - what Covid has valuable professional expertise. learning start-up. useful information can also be reduced as any features belonging to a significantly
firms obtain FCA authorisation but an done is accelerate their acceptance. informative cluster will not be discarded.
increasing proportion of my time is The ability to communicate and For more information on Vince Harvey
now spent advising existing authorised collaborate as a team has been visit Compliance Cubed Ltd Clustering improves interpretability as a cluster offers a higher level of abstraction.
businesses on how to stay on the right challenged but businesses have For example, in finance, we may find that volatilities computed using three-, five-
www.technicalanalysts.com
RESEARCH RESEARCH
12 13

and seven-day lookback periods are in the same cluster. That clearly identifies the cluster as “historical volatility”. In addition, features. As defined by López de Prado (2018), we have:
clustering may improve the stability of the important features, since their relative importance within a cluster won’t cause some
of them to be dropped, and it is less likely that the importance rank of a whole cluster of features will change drastically if we use 1) informative features that are used to determine the label; and
a different random seed. As discussed in Man & Chan (2021), stability of features improves interpretability. 2) noisy features that bear no information on determining the labels and are drawn from standard normal distributions.

The rest of this article is organised as follows: According to the descriptions in scikit-learn.org, the informative features are drawn independently from a standard normal
distribution. However, to introduce clusters into our data, we first randomly draw multiple ‘centroids’ and generate informative
• Section 2 introduces the cMDA algorithm and its use of hierarchical clustering to compute importance score at the cluster level; features around them from normal distributions centred around these centroids. We provide a detailed description of the
algorithm for creating these clustered features, and how they map to the labels, in Appendix 2.
• Section 3 compares the predictive performance of MDA vs cMDA using two synthetic datasets;
The dataset has 1,000 samples and 40 features comprised of 20 informative and 20 noisy features. These 20 informative features
• Section 4 compares the predictive performance of MDA vs cMDA on two popular datasets, including a financial dataset that form three clusters with six or seven features in each cluster.
uses technical and fundamental indicators to predict the S&P 500 stock index excess returns.
The selected features are analysed in Table 1. From Panel A, cMDA tends to keep all the informative features but it also includes a
• Finally, the algorithm is applied to our proprietary trading strategy returns dataset to see if it can identify interpretable small number of noisy features. In contrast, MDA chooses far fewer features but filters out all the noisy features. The downside is
clusters and improve the strategy’s performance. We find high stability and interpretability of the selected clusters in these that it also drops a lot of informative features. We may figuratively say that cMDA has a higher recall but lower precision than MDA.
financial applications, which should make machine learning employing this technique appealing to investors.
Denote ‘I_m_n’ as the m-th informative feature which is assigned to n-th synthetic cluster. For example, ‘I_20_2’ means the 20th
informative feature which belongs to the 2nd synthetic cluster. ‘N_m’ represents the m-th noisy feature. Panel B shows that all
2. cMDA using hierarchical clustering the features in the ‘0’ synthetic classification cluster are put into the most important selected cluster. The informative features in
‘1’ and ‘2’ synthetic clusters are not recovered by the algorithm since each of them has two features in the same selected cluster
Cluster-based feature selection consists of two steps: clustering features and ranking clusters. To begin clustering features, we and the rest of their features are grouped into another selected cluster. Panel C shows the synthetic regression data selects all the
define a distance matrix from the pair-wise correlations of the features Di,j)= √(½(1-pi,j). As discussed in López de Prado (2020), informative features of their original clusters to form the top two most important selected clusters, but each cluster also includes
the ideal distance matrix should be based on one of the information-theoretic metrics, but the correlation matrix is still the one one or two noisy features.
most commonly used in finance. The selection of distance matrix won’t affect the subsequent procedures, though it may affect
the predictive performance.

Next, a clustering method should be used to split the feature set into smaller sets according to the distance matrix. K-means and
hierarchical algorithms are popular clustering methods. The K-means clustering algorithm fixes the number K of clusters and the PREDICTNOW.AI
observations are assigned to each cluster based on the distance to the centre point. By contrast, hierarchical clustering works
in a ‘bottom-up’ manner. Starting from the bottom, every single feature is taken as a cluster. As we ascend to the next level, the
two closest clusters are merged. At the end of the process, all the features will be included in a single cluster. We then cut the
hierarchical tree at the proper level to create an optimal set of clusters. The outputs of hierarchical clustering have more structure Financial Machine Learning: No-code/API
and are more informative than the unstructured set of flat clusters returned by the K-means algorithm.
Computes the Probability of Profit for your next investment.
In the following analysis, we use the hierarchical algorithm as the clustering method:

The number of clusters is determined by finding the number (from 2 to the number of samples minus 1) that maximises the
“clustering quality” q. The clustering quality is related to the silhouette coefficient (Rousseeuw, 1987) which represents how similar
a sample is to samples in its own cluster compared with those in other clusters. For the data sample i, its silhouette coefficient
bi-ai
is defined as Si max {ai-bi
, where ai is the average distance between i and all other samples in the same cluster, and bi is the
average distance between i and all the samples in the nearest cluster of which i is not a member. Then for a given partition, the
E[S]
measure of clustering quality q is defined as q= Std[S] , where E[S] and Std[S] are the mean and variance of silhouette coefficients
for all samples in the training data.

After finding the optimal number of clusters based on maximising q and assigning the features to each cluster, the feature
importance algorithm is performed on the clusters rather than individual features. This means that during MDA feature selection,
all the features in a cluster are permuted at the same time, as described in López de Prado (2020). Since this article focuses on
how the clustering method can add value to model performance rather than comparison across different feature importance
algorithms, we omit presenting the implementation of clustered LIME and SHAP and only discuss clustered MDA. If a feature
is isolated by a cluster, MDA and clustered MDA are the same. The feature importance is measured by the rank-based score
proposed by Man & Chan (2021). As the importance score of a cluster is determined by the mean of the importance scores of the
features contained in it, a large cluster won’t necessarily be more important than a smaller cluster with fewer features.

3. Predictive Performance on Synthetic Data


To test how the proposed method responds to synthetic data, we construct a dataset composed of both informative and noisy
www.technicalanalysts.com
RESEARCH RESEARCH
14 15

Table 1: Selected Features on Synthetic Datasets Table 2: Prediction Performance Comparison on Synthetic Datasets
Panel A: Number of informative features selected by cMDA and MDA
Synthetic Classification Synthetic Regression

Synthetic Classification Synthetic Regression F1 AUC Acc MSE MAE R2

cMDA All 20 informative features, All 20 informative features, cMDA 0.975 0.998 0.973 545460.60 585.39 0.9626
1 noisy feature 5 noisy features
MDA 0.960 0.996 0.957 436676.44 510.08 0.9700
MDA Only 11 informative features, Only 9 informative features,
Full 0.975 0.995 0.973 718870.06 662.79 0.9607
0 noisy feature 0 noisy feature

Given that the predictive performances of cMDA and MDA are close, cMDA should be favoured given the increase in interpretability
Panel B: Selected clusters in classification data Panel C: Selected clusters in regression data and, as we shall see later, the stability of the selected features.

Cluster Importance Score Features Cluster Importance Score Features


0.328 ‘I_2_2’ 0.256 ‘I_0_1’ 4. Cluster Interpretability and Stability
‘I_3_1’ ‘I_3_1’
‘I_4_0’ ‘I_11_1’ First, we take the Breast Cancer dataset ¹ as an example. This dataset is a binary classification dataset with target variables
‘I_6_2’ ‘I_13_1’ showing whether the cancer is malignant or benign, and 30 features which are characteristics of each of 569 medical images. The
‘I_8_0’ ‘I_15_1’ clustering algorithm groups those 30 features into eight clusters. The cluster importance scores, and the features within them,
‘I_9_0’ ‘I_18_1’ are listed in Table 3. Since these clusters have clearly human-interpretable themes, we also apply a descriptive “Topic” to them.
‘I_12_0’ ‘I_19_1’
‘I_14_0’ ‘N_10’
‘I_16_0’ ‘N_16’
‘I_17_0’ Table 3: Feature Clustering for Breast Cancer Dataset
‘I_19_1’ 0.212 ‘I_1_2’
‘I_2_2’
0.164 ‘I_0_1’ ‘I_5_2’ Topic Cluster Importance Scores Features
‘I_1_2’ ‘I_6_2’
‘I_5_2’ ‘I_7_2’ Geometry summary 0.360 ‘mean radius’
‘I_7_2’ ‘I_10_2’ ‘mean perimeter’
‘I_10_2’ ‘N_2’ ‘mean area’
‘I_11_1’ ‘mean compactness’
‘I_13_1’ 0.116 ‘I_4_0’ ‘mean concavity’
‘I_15_1’ ‘I_8_0’ ‘mean concave points’
‘I_18_1’ ‘I_9_0’ ‘radius error’
‘N_2’ ‘I_12_0’ ‘perimeter error’
‘I_14_0’ ‘area error’
‘I_16_0’ ‘worst radius’
‘I_17_0’ ‘worst perimeter’
‘N_6’ ‘worst area’
‘N_12’ ‘worst compactness’
‘worst concavity’
‘worst concave points’

Texture summary 0.174 ‘mean texture’


We can see that cMDA does a good job of grouping together related informative features, at least for the top cluster. This clustering
‘worst texture’
improves human interpretability, reduces the substitution effect and can potentially improve predictive accuracy.

For most of the datasets in this paper, the data is split into training sets, validation sets and testing sets in the ratio 60:20:20 Geometry error 0.112 ‘compactness error’
(some datasets are differently split and this is noted in the text). The model is trained and features are clustered in the training ‘concavity error’
set. The clusters are ranked in the validation set and then the features in the top clusters with above-average importance scores ‘concave points error’
are selected. This would be just the top cluster in both the synthetic classification and regression examples. Using the selected ‘fractal dimension error’
features, the prediction performance is evaluated on the testing set. In Table 2, we compare the out-of-sample results based on
the full feature set versus the selected feature subset. The cMDA approach outperforms the full set in both datasets. cMDA also
outperforms MDA in the classification dataset but underperforms it in the regression dataset. ¹ The data is taken from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic).
www.technicalanalysts.com
RESEARCH RESEARCH
16 17

Smoothness error 0.092 ‘smoothness error’


Symmetry error 0.062 ‘symmetry error’
Texture error 0.056 ‘texture error’
Symmetry summary 0.055 ‘mean symmetry’
‘worst symmetry’

Fractal dimension 0.049 ‘mean fractal dimension’


‘worst fractal dimension’

Smoothness summary 0.042 ‘mean smoothness’


‘worst smoothness’

As the scores of clusters with topics ‘Geometry summary’ and ‘Texture summary’ are greater than the average of the 17 features,
these two clusters are selected. While individual feature importance results give ‘worst concave points’, ‘worst perimeter’, ‘worst
radius’, ‘mean concavity’, ‘area error’ and ‘worst texture’ as the most important features, we can easily see here that geometry of
the tumour is the most important cluster, while texture is the second most important.

The rank-based ‘instability’ of the cluster j is defined as its variance: Vj = Var(r1j, ... , rnj).
V(1)+ ... +V(k)
If we apply this only to the top k clusters, the ‘instability index’ is defined as: I = √ '
k
where V(k) is the variance of the kth-most important cluster. According to Figure 1, the instability index increases with k and
the most important cluster (Geometry summary) is ranked in 1st place for all 100 runs and the second most important cluster
(Texture summary) is ranked in 2nd place for 99 runs. Notably, the features selected from these two clusters are almost always
positioned in the top.

Figure 1: Instability Analysis for Breast Cancer Dataset


www.technicalanalysts.com
RESEARCH RESEARCH
18 19

After selecting the features from the top two clusters, we train a new random forest on the combined training and validation set Figure 2: Instability Analysis for S&P Dataset
and use that to make predictions on the testing set. We can see that cMDA has the best out-of-sample performance on AUC but
that F1 and Acc underperform non-clustering methods.

Table 4: Prediction Performance Comparison on Breast Cancer Dataset

F1 AUC Acc

cMDA 0.953 0.990 0.953


MDA 0.981 0.982 0.974
Full 0.954 0.980 0.939

Next, we conduct the analysis on predicting S&P 500 excess returns using economic factors, as discussed in Man and Chan
(2021). The data ranges from January 1945 to December 2019. Excess return is defined as the monthly SPX index return minus the
risk-free rate. The features are a set of fundamental and technical factors that include dividend price ratio (d/p), dividend yield
(d/y), earning price ratio (e/p), dividend payout ratio (d/e), stock variance (svar), book to market (b/m), net equity expansion (ntis),
T-Bill rate (tbl), long term yield (lty), long term return (ltr), term spread (tms), default yield spread (dfy), default return spread (dfr)
and inflation (infl). Fractional differentiation (López de Prado, 2018) is applied to all these features prior to the machine learning
process. The clustering algorithm groups these features into two clusters as shown in Table 5.

Table 5: Feature Clustering for S&P Dataset

Topic Cluster Scores Features

Fundamental 0.63 d/p, d/y, e/p, d/e, svar, ntis, ltr, tms
Technical 0.37 b/m, tbl. Lty, dfy, dfr, infl

As these clusters are highly human-interpretable, we again apply descriptive topics to them. The ‘Fundamental’ cluster contains
8 features and has higher importance score. The ‘Interest rate’ cluster contains 6 features. This cluster can also be called the
‘unimportant’ cluster, since we only have two clusters, and is not selected to train the final random forest model.

Figure 2 shows these two clusters are very stable. The instability index remains zero when involving either one or two clusters.
‘Fundamental’ and ‘Technical’ clusters are constantly ranked in the first and second places for all 100 runs.
www.technicalanalysts.com
RESEARCH RESEARCH
20 21

The data is from January 2013 to June 2020 with 160 features. We split the data into training/validation/testing sets over periods
2013-2017, 2018-2019 and 2020. cMDA groups the 160 features are into 44 clusters. Among them, eight clusters containing 81 features
with above-average importance scores are selected to train a new random forest model. Since the features are proprietary, we do
not display the clusters that identify them. Suffice to say that the top two clusters are highly human-interpretable, while the lower
ranked clusters are mixed bags of disparate features.

From Figure 3, the instability index increases with number of clusters and the most and second most important clusters are
steadily ranked in 1st and 2nd places respectively for all 100 runs. The third most important cluster is not as stable as the first two.
Given that the third cluster is a mixed bag of features of uninterpretable theme, this isn’t a surprising result.

Figure 3: Instability Analysis for Trading data

We split the data into training, validation and testing sets with the periods January 1945-December 2005, January 2006-December
2015 and January 2016-December 2019, respectively. The out-of-sample prediction performance on the testing set is summarised
as follows:

Table 6: Predictive Performance Comparison on S&P Dataset

F1 AUC Acc

cMDA 0.558 0.561 0.537


MDA 0.551 0.514 0.533 Table 7 shows the comparisons of ‘cMDA’ with 81 features in selected clusters, original ‘MDA’ with 20 selected features and ‘Full’
features set with total 160 features. As the top two clusters selected by ‘cMDA’ are intuitively interpretable, we also show the results
Full 0.432 0.461 0.483
of ‘cMDA(top 2)’ which contains 41 features from the top two clusters. The out-of-sample (test set) performance of a predictive
model based on cMDA significantly outperforms all others through F1, AUC and accuracy.

The metrics F1 score, AUC score and Accuracy obtained with the testing set are shown in Table 6. We can see that cMDA
outperforms MDA in out-of-sample prediction on all metrics for this dataset. Table 7: Prediction Performance Comparison

F1 AUC Acc
Application to Trading Strategy Meta-Labelling
cMDA 0.658 0.672 0.614
In this section, we apply clustering-based feature selection to a dataset with the labels equal to the sign of actual historical
returns of our proprietary Tail Reaper trading strategy. We want to see if this algorithm can select stable features and improve the cMDA (top 2) 0.595 0.640 0.571
trading performance. This application of financial machine learning is termed “meta-labelling” (López de Prado, 2018).
MDA 0.602 0.537 0.529
Full 0.481 0.416 0.414
² See www.qtscm.com/accounts for more details.
www.technicalanalysts.com
RESEARCH RESEARCH
22 23

Conclusions
Not a major top for Nasdaq-100
Ranking a cluster is more stable than ranking a feature and such stability enhances the model interpretability. It is also easier to
interpret the clusters by examining the common characteristics of the features contained within each cluster. For example, for
the S&P 500 excess returns dataset, we can identify the top cluster as fundamental indicators, while the second-ranked cluster
as mainly technical indicators.
Introduction
The clustering algorithm also improves the predictive performance over non-clustered MDA feature selection on the S&P 500
excess returns dataset and the proprietary Tail Reaper strategy returns dataset, though not on the synthetic datasets. Their As the US equity market is undergoing a correction in the last three weeks and as last
predictive performances on the Breast Cancer dataset are similar. month we mentioned that October to April is the seasonally bullish period, a review
of the leading sector of the US market, the Nasdaq100 is necessary to evaluate the
In this article, the clustering algorithm is driven by a correlation-based metric. As the distance matrix just need to satisfy non- strength of its uptrend.
negativity, identity, symmetric and sub-additivity, we may be able to improve the model performance by choosing other info-
theoretic metrics which also satisfy these conditions. We also chose hierarchical clustering instead of K-means. We will discuss
our reason for doing so in Appendix 1. Further work can also investigate whether clustering can improve the SHAP and LIME Still in Uptrend
feature-selection methods that we compared in Man and Chan (2021).
The Relative Strength (RS) of the Nasdaq100 versus the S&P500 (dotted green line
on the upper panel) has been stalling below its July top taking the form of a rising
triangle and in Oct 2020 at levels well above the low made in early September though
ACKNOWLEDGEMENTS having flattened. But it shows an uptrend In Oct as the chart does not display a lower
low. Therefore it is likely that the Technology sector mainly represented within the
We thank Radu Ciobanu, Sayooj Balakrishnan, and Roger Hunter for many useful suggestions and technical assistance. Bruno Estier CFTe Nasdaq100 is still a leading sector for the US equity market.
Bruno Estier is a Global Market Advisor
and Technical Analyst coach in Geneva,
References Switzerland for professional Traders and
Portfolio Managers.
• Hastie, T., R. Tobshirani and J. Friedman (2009) The elements of statistical learning. 2nd Edition, Spring.
• López de Prado, M. (2018) Advances in financial machine learning. John Wiley & Sons. Past President of the Swiss Association
• López de Prado, M. (2020) Machine learning for asset managers. Cambridge University Press.
• Man, X. and E.P. Chan (2021) The best way to select features: Comparing MDA, LIME and SHAP. Journal of Financial Data Science, Winter 2021. DOI: of Market Technicians (SAMT) for 12
https://doi.org/10.3905/jfds.2020.1.047 years, he served also as Chairman and
• Rousseeuw, P. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics 20: 53-65. as Secretary on the board of directors of
• Xiong, H., J. Wu and J. Chen (2008) K-means clustering versus validation measures: a data distribution perspective. IEEE Transactions on Systems, Man, and
Cybernetics, Part B 39(2): 318-331. IFTA. Bruno founded the French Society
of Technical Analysts (AFATE) in 1990. He
holds the Diploma from the STA and the
Appendix 1: Reasons to choose hierarchical clustering professional certifications from IFTA.

K-means uses the Euclidean distance metric and results in nearly identical cluster sizes , with a limited number of clusters. By contrast, hierarchical clustering can You can find his work at Bruno Estier
generate many more clusters and other forms of distance metrics can be used. For example, Jaccard similarity, which measures the distance between two binary Strategic Technicals, www.estier.net/
categorical variables, is not a Euclidean metric and cannot be applied to K-means clustering, but it can be used for hierarchical clustering. In the example datasets bruno [email protected] and
we studied, some clusters do contain more features than others. The number of features in each cluster should adapt to the nature of the features, and a tendency to The bullishness is not limited to a few large Technology stocks as we note that the
produce clusters of equal size is not desirable. he here shares his US equities outlook,
Relative Strength of Small Caps versus S&P500 (black line on upper panel) has been
written in November 2020 for Wealthgram.
bottoming and rose mid-September to October 2020, which is a classic bullish sign
of widening Breadth. Thus, the pullback of the Nasdaq100 in Nov is seen as a pause
Appendix 2: The algorithm for generating synthetic clusters in the Bull market than the beginning of a Bear market. This pullback relieves an
Suppose we want to generate a dataset of n samples with K synthetic clusters, m informative features, p noisy features, we can follow the procedures below: overbought situation, which was highlighted by its rise since May 2020 between the
first & second Bollinger band and by a spike in late August and a retest in October
1. Sample k centroids independently from Uniform(-10,10);
2. The number of informative features per cluster c₁, c₂, …, c K is [m/K] or [m/K]+1;
of the second upper Band. toward the Moving average 40 week (9931). The fear can
3. For the ith cluster, independently generate ci features by sampling n times per feature from a univariate normal distribution with mean equal to the value of the ith be noted on the VXN (orange dotted line on the upper panel) reaching the previous
centroid and with standard deviation as 0.5. In other words, we draw n× ci random numbers from N(ci, 0.5) to populate all features within the ith cluster for all the n spike high of September near 41.30 %.
samples;
4. For a classification model, randomly assign the label 0 or 1 with probability 0.5 to each of the sample. Create a random matrix Mm×m by sampling from Uniform(-1 ,1).
Form a product of M and the informative feature matrix of each class label X m×(n/2) to create MX m×(n/2). Stack the matrix of two classes to get MX m×n. In other words, we However , volatility above 40% in the VXN is rare and often signals a nearby low on the
map two different linear combinations of informative features to the two class labels, where the coefficients of the linear combinations are random but fixed over the underlying Equity index! So overall, in Nov 2020 it may well be time to be contrarian
samples with the same label; and not to panic along the classic price momentum indicators, like STOCHASTICS or
5. For a regression model, create a random matrix Mm×m by sampling from Uniform(-1 ,1). Form a product of M and the entire informative feature matrix X m×n to create
MX m×n. Create a random vector βm×1 by sampling from Uniform(0, 100), and then set the label for sample n as yn×1 = (MX m×n)T βm×1. In other words, we map a linear MACD on the lower panel which are crossing down. Of course, the Nasdaq100 needs
combination of informative features to the continuous labels, where the coefficients of the linear combinations are random but fixed over all samples; to display a move up to avoid breaking below the previous low of 10,677, ideally
6. Add the p×n noisy features matrix by sampling from a standard normal distribution. holding above 10,900 the rising former resistance trend line dating from October
2018. Such a rebound in price will validate the ranging pattern between 12,430 and
10,700, which medium term will open the door for higher prices toward 14,000. So,
³ Due to the ‘uniform effect’ proposed and discussed in Xiong, Wu and Chen (2008), K-mean tends to generate clusters with relatively uniform sizes. trend-followers beware!

You might also like