Can you imagine someone telling you how likely you are to win a trade in oil if you open it today? How much would you pay for such information assuming it was for real? I stumbled upon an interview with Franc Van Hauten, President of Philips, who said that his company is working on products that monitor and collect data on our life style, habits and health, and that in the future they will be able to predict a heart attack 8 hours before it occurs. Wow! If that is the case– why not predict how one will trade and win or lose? Is it possible to collect enough data to create predictions on how one will be profitable?
Today, the amount of data accumulated and the relatively ease of access to this data enables the collection of variables on the personal profile of a person, his habits and lifestyle, and comparison of his data to millions of other people (traders). It may still be impossible to predict the exact time and size of a person’s trade but it may be possible to predict his success rate and trading patterns. It could save lives when predicting heart attacks and could potentially help people increase their revenues and reduce their losses.
The Tricky Part about Predictions
Scientists, when asked to make predictions, will politely discuss associations: when A happens then B is likely to occur. They will talk about correlations but be very careful about claiming causality.
Let’s take an example: imagine you are an account manager doing retention work for a broker. Now I tell you that a particular account who made a first deposit of $300 is likely to trade in a volume of $2,000. A week goes by and this account trades a volume of $2,100. Now you say “Wow. Amazing prediction!” Right? Not necessarily. This information could be a self-fulfilling prophecy. Meaning, the information that I gave you directed your thoughts and actions which led you to this result (I think it was called “The Pygmalion Effect”).
If I had told you this account would make a volume of $300 you may have reduced your efforts and the account would have quit (you would then assume that the prediction was true – again).
My point is that if you want to test predictions it has to be a carefully controlled testing with double blind settings, which is very hard to do. Example: the account managers should have a control group and that needs to be blind to both the information and the results.
How It All Started
Did you know that the term ‘Big Data’ was originally referred to as a technical term? Think about how much data each of us produces in one day- daily activities, browsing, purchasing stuff, or just getting updates on Finance Magnates.
So a few years back, large companies, especially in the online industry, realized that they need to manage a tremendous amount of data that their standard infrastructure just could not cope with. Then an industry of servers and databases started emerging to aid companies store and manage their data without crashing every 2 seconds.
If I may put my own definition here – we can define ‘Big Data’ today as anything that is not a small sample of data and that involves data from a variety of sources. Example: history of browsing with many other personal details (age, sex, geography, education and so on).
People today have begun to realize that large data sets could be turned into the fuel that makes every business work.
Size Matters but Also Other Stuff
Those who took basic statistics classes (and listened) should probably remember the issue of sample size. Statisticians would tell you that if you have a large enough sample then you can make claims with higher validity – meaning they resemble reality with less bias. Example: if you run a questionnaire about how satisfied people are with your website and collect 100 responses from 150 visitors then you can have high confidence that reflects the general satisfaction level of all your visitors.
However, if you have 10,000 visitors and your sample is 100 then your results will be limited to only these 100 visitors. So the more data you collect the more valid your results are.
Today, technology is heaven for data freaks: it gives us access to a large amount of data, enabling us to not just take a sample but to observe the entire target audience.
Other than size, you need data to be reliable: clean of noise, errors and holes. It needs to be accurate and complete. Sounds simple, but when trying to create a data set from multiple resources it could become very challenging.
The higher the quantity and quality of your data the more valid your results/conclusions will be.
And most importantly: data can tell you an interesting story but you also need to be able to ask the right questions. For this you need good people with the right skills. It all goes back to people!
From Data to Revenues
This is not an easy task. But if I had all the data I wanted on a group of a million traders: sex, age, country, residence location, occupation, family status, education level and grades, medical history, medication used and so on, I would have an endless list of variables. Machine learning algorithms today are able to find all the possible interconnections between these variables and the trading history of each individual.
Now, when we know predictions are tricky, it is important to first understand the complexity of the data we see. Let the data reveal its story – this is something that should be done by professionals. Why? Because we are all imperfect and have blind spots. Professionals are expected to understand that and try to see the real story behind the numbers.
I think if companies, including financial brokerages, would focus on understanding their own data it could be an important step forward. Wait with predictions – let’s start with understanding. We see many who are already doing that. Companies can always have a better understanding of their clients and employees, preferences and tendencies, what works and what doesn’t.
They can use their own (big) data to make better assumptions, become more sensitive to needs and perceptions, be more flexible and adaptive, and create better products and better services. Relying on small data sets in a limited time period from few resources could lead you to see only a small fraction of reality and reach the wrong conclusions. Working properly with large data sets could and should be one of the foundations of any business. From what I see – we are all going in that direction.