Machine Learning

  • 详情 Treasury Bond Pricing Via No Arbitrage Arguments and Machine Learning: Evidence from China
    This paper proposes a novel bond return (price or yield curve) prediction methodology, unifying the classical no arbitrage pricing framework, which is ubiquitous and serves as a fundamental and theoretical building block in mathematical finance, and empirical asset (bond) pricing methodologies, e.g., Bianchi, Büchner, & Tamoni (2021) for treasury bonds and Gu, Kelly, & Xiu (2020) for equities. The methodology can be viewed as a unification of theoretical and empirical asset pricing frameworks. Our method is mathematically and theoretically rigorous, arbitrage-free and meantime enjoys the flexibility offered by the empirical asset pricing framework, i.e., a potentially rich factor structure and accurate function approximations via machine learning regression. Real market back-testing studies show that our predictions are accurate, in the sense that the formulated equally-weighted treasury bond portfolios in China exchange-based markets bear significant positive returns. The average hit rate for yield curve prediction reaches 77.71% across all tenors and the related long-only trading strategy based on the prediction results in an annualized absolute return as high as 12.35% with Calmar ratio achieving 7.31 for equally-weighted portfolios. As a by-product of our prediction framework, spot yield curves can be predicted accurately in an arbitrage-free manner.
  • 详情 AI-mimicked Behavior and Fundamental Momentum: The Evidence from China
    We track the fundamental informed traders' (FITs) behavior and show the fundamental momentum effect in the Chinese stock market. We train the deep learning model with a set of fundamental characteristics to extract fundamental implied component from realized returns. The fundamental part characterizes the price movement driven by FITs. Fundamental momentum differentiates from the fundamental trend and is not quality minus junk (QMJ) factor. Underreaction bias helps explain the strategy, as it generates stronger profit during periods of low investor sentiment and aggregate idiosyncratic volatility. Fundamental momentum is not sensitive to changing beta and robust in subsamples and machine learning models.
  • 详情 Memory and Beliefs in Financial Markets: A Machine Learning Approach
    We develop a machine learning (ML) approach to establish new insights into how memory affects ffnancial market participants’ belief formation processes in the field. Using analyst forecasts as proxies for market beliefs, we extract analysts’ mental contexts and recalls that shape forecasts by training an ML memory model. First, we find that long-term memories are salient in analysts’ recalls. However, compared to an ML benchmark trained to fit realized earnings, analysts pay more attention to distant episodes in regular times but less during crisis times, leading to recall distortions and therefore forecast errors. Second, we decompose analysts’ mental contexts and show that they are mainly shaped by past earnings and forecasting decisions instead of current firm fundamentals as indicated by the ML benchmark. This difference in contexts further explains the recall distortion. Third, our comprehensive memory model reveals the significance of specific memory features and channels in analysts’ belief formation, including the temporal contiguity effect and selective forgetting.
  • 详情 Predicting Stock Moves: An Example from China
    In this paper, we examine the prediction performance using a principal component analysis (PCA). In particular, we perform a PCA to identify significant factors (principal components) and then use these factors to form predictions of stock price movements. We apply this strategy on the Chinese stock markets. Using data from January 2, 2019 till September 16, 2021, the empirical results show substantial out-performances from the PCA-based predictions against a naïve buy-and-hold strategy and also single time-series predictions of individual stocks. Next we examine if the factors retrieved from PCA are indeed important contributing factors in explaining stock price movements. To do this, we adopt a machine learning technique popular in studying stock performances – random forest. We discover that, comparing to widely used descriptive factors such as industry sector, geographical location, and market types (known as “board” or “ban” in Mandarin), principal components rank very highly among those descriptive factors.
  • 详情 New Forecasting Framework for Portfolio Decisions with Machine Learning Algorithms: Evidence from Stock Markets
    This paper proposes a new forecasting framework for the stock market that combines machine learning algorithms with several technical analyses. The paper considers three different algorithms: the Random Forests (RF), the Gradient-boosted Trees (GBT), and the Deep Neural Networks (DNN), and performs forecasting tasks and statistical arbitrage strategies. The portfolio weight optimization strategy is also proposed to capture the model's return and risk information from output probabilities. The paper then uses the stock data in the Chinese A-share market from January 1, 2011, to December 31, 2020, and observes that all three machine learning models achieve significant returns in the Chinese stock market. The DNN achieves an average daily return of 0.78% before transaction costs, outperforming the 0.58% of the RF and 0.48% of the GBT, far exceeding the general market level. The performance of the weighted portfolio based on the ESG score is also improved in all three machine learning strategies compared to the equally weighted portfolio. These results help bridge the gap between academic research and professional investments and offer practical implications for financial asset pricing modelling and corporate investment decisions.
  • 详情 Cutting Operational Costs by Integrating Fintech into Traditional Banking Firms
    Fintech firms mobilize information technology to provide intermediation services using a broker methodology, whereas dealer banks intermediate using leveraged balance sheets. The integration of Fintech into banking may reduce the unit cost of intermediation by shifting the production function from dealer to broker. A “Fintech score” is derived using nonlinear and machine learning algorithms that show on-balance sheet lending for low Fintech score dealer banks versus securitization, brokered deposits, and non-interest income for high score, broker banks. Using Data Envelopment and Stochastic Cost Frontier Analyses, we find that banks with higher Fintech scores are more operationally efficient and resilient in crises.
  • 详情 Detecting Short-selling in US-listed Chinese Firms Using Ensemble Learning
    This paper uses ensemble learning to build a predictive model to analyze the short selling mechanism of short institutions. We demonstrate the value of combining domain knowledge and machine learning methods in financial market. On the basis of the benchmark model, we use three input data: stock price, financial data and textual data and we employ one of the most powerful machine learning methods, ensemble learning, rather than the commonly used method of logistic regression. In specific methods, we use LSTM-AdaBoost and CART-AdaBoost for model prediction. The results show that the model we train have strong prediction ability for short-selling and the company' s financial text data is more likely to have an impression of whether it would be shorted or not.
  • 详情 Language and Domain Specificity: A Chinese Financial Sentiment Dictionary
    We use supervised machine learning to develop a Chinese language financial sentiment dictionary from 3.1 million financial news articles. Our dictionary maps semantically similar words to a subset of human-expert generated financial sentiment words. In article-level validation tests, our dictionary scores the sentiment of articles consistently with a human reading of full articles. In return validation tests, our dictionary outperforms and subsumes previous Chinese financial sentiment dictionaries such as direct translations of Loughran and McDonald’s (2011) financial words. We also generate a list of politically-related positive words that is unique to China; this list has a weaker association with returns than does the list of otherwise positive words. We demonstrate that state media exhibits a sentiment bias by using more politically-related positive and fewer negative words, and this bias renders state media’s sentiment less return-informative. Our findings demonstrate that dictionary-based sentiment analysis exhibits strong language and domain specificity.