TY - UNPB
T1 - On-chain Analytics for Sentiment-driven Statistical Causality in Cryptocurrencies
AU - Chalkiadakis, Ioannis
AU - Zaremba, Anna
AU - Peters, Gareth
AU - Chantler, Michael John
PY - 2021/2/11
Y1 - 2021/2/11
N2 - This paper presents an efficient algorithm for multimodal statistical causality analysis based on Multiple-Output Gaussian Processes. Signals from different information sources (modalities) are jointly modelled as a Multiple-Output Gaussian Process, and then, using a novel approach to statistical causality based on Gaussian Processes (GP), we study linear and non-linear causal effects between the different modalities. We demonstrate the effectiveness of our approach in a novel machine learning application on studying the relationship between electronic cryptocurrency spot price dynamics and Natural Language Data specific to the crypto sector, which we conjecture influences retail investor behaviour. The investor sentiment is extracted from the Natural Language Data via methods developed in the area of statistical machine learning known as Natural Language Processing (NLP), and we develop novel sentiment index models that add to existing approaches. To capture sentiment, we present a novel framework for text to time series embedding, which we then use to construct a sentiment index from publicly available news articles. We compare our sentiment statistical index model to alternative methods in the NLP literature. Furthermore, in regards to the multimodal causality, the investor sentiment is our primary modality of exploration, in addition to price and a technology-related indicator (hash rate). Analysis shows that our approach is effective in modelling causal structures of variable degree of complexity between heterogeneous data sources, and illustrates the impact that certain modelling choices for the different modalities can have on detecting causality.
AB - This paper presents an efficient algorithm for multimodal statistical causality analysis based on Multiple-Output Gaussian Processes. Signals from different information sources (modalities) are jointly modelled as a Multiple-Output Gaussian Process, and then, using a novel approach to statistical causality based on Gaussian Processes (GP), we study linear and non-linear causal effects between the different modalities. We demonstrate the effectiveness of our approach in a novel machine learning application on studying the relationship between electronic cryptocurrency spot price dynamics and Natural Language Data specific to the crypto sector, which we conjecture influences retail investor behaviour. The investor sentiment is extracted from the Natural Language Data via methods developed in the area of statistical machine learning known as Natural Language Processing (NLP), and we develop novel sentiment index models that add to existing approaches. To capture sentiment, we present a novel framework for text to time series embedding, which we then use to construct a sentiment index from publicly available news articles. We compare our sentiment statistical index model to alternative methods in the NLP literature. Furthermore, in regards to the multimodal causality, the investor sentiment is our primary modality of exploration, in addition to price and a technology-related indicator (hash rate). Analysis shows that our approach is effective in modelling causal structures of variable degree of complexity between heterogeneous data sources, and illustrates the impact that certain modelling choices for the different modalities can have on detecting causality.
KW - Multiple-Output Gaussian Process
KW - Granger causality
KW - sentiment index
KW - sentiment analysis
KW - text mining
KW - multimodal systems
KW - heterogeneous data
KW - cryptocurrencies
KW - cryptocoin markets
KW - natural language processing
U2 - 10.2139/ssrn.3742063
DO - 10.2139/ssrn.3742063
M3 - Preprint
BT - On-chain Analytics for Sentiment-driven Statistical Causality in Cryptocurrencies
ER -