Machine Learning

  • 详情 Different Opinion or Information Asymmetry: Machine-Based Measure and Consequences
    We leverage machine learning to introduce belief dispersion measures to distinguish different opinion (DO) and information asymmetry (IA). Our measures align with the human-based measure and relate to economic outcomes in a manner consistent with theoretical prediction: DO positively relates to trading volume and negatively linked to bid-ask spread, whereas IA shows the opposite effects. Moreover, IA negatively predicts the cross-section of stock returns, while DO positively predicts returns for underpriced stocks and negatively for overpriced ones. Our findings reconcile conflicting disagree-return relations in the literature and are consistent with Atmaz and Basak (2018)’s model. We also show that the return predictability of DO and IA stems from their unique economic rationales, underscoring that components of disagreement can influence market equilibrium via distinct mechanisms.
  • 详情 Risk-Based Peer Networks and Return Predictability: Evidence from textual analysis on 10-K filings
    We construct a novel risk-based similarity peer network by applying machine learning techniques to extract a comprehensive set of disclosed risk factors from firms' annual reports. We find that a firm's future returns can be significantly predicted by the past returns of its risk-similar peers, even after excluding firms within the same industry. A long-short portfolio, formed based on the returns of these risk-similar peers, generates an alpha of 84 basis points per month. This return predictability is particularly pronounced for negative-return stocks and those with limited investor attention, suggesting that the effect is driven by slow information diffusion across firms with similar risk exposures. Our findings highlight that the risk factors disclosed in 10-K filings contain valuable information that is often overlooked by investors.
  • 详情 The Transformative Role of Artificial Intelligence and Big Data in Banking
    This paper examines how the integration of artificial intelligence (AI) and big data affects banking operations, emphasizing the crucial role of big data in unlocking the full potential of AI. Leveraging a comprehensive dataset of over 4.5 million loans issued by a leading commercial bank in China and exploiting a policy mandate as an exogenous shock, we document significant improvements in credit rating accuracy and loan performance, particularly for SMEs. Specifically, the adoption of AI and big data reduces the rate of unclassified credit ratings by 40.1% and decreases loan default rates by 29.6%. Analyzing the bank's phased implementation, we find that integrating big data analytics substantially enhances the effectiveness of AI models. We further identify significant heterogeneity: improvements are especially pronounced for unsecured and short-term loans, borrowers with incomplete financial records, first-time borrowers, long-distance borrowers, and firms located in economically underdeveloped or linguistically diverse regions. Our findings underscore the powerful synergy between big data and AI, demonstrating their joint capability to alleviate information frictions and enhance credit allocation efficiency.
  • 详情 How Does China's Household Portfolio Selection Vary with Financial Inclusion?
    Portfolio underdiversification is one of the most costly losses accumulated over a household’s life cycle. We provide new evidence on the impact of financial inclusion services on households’ portfolio choice and investment efficiency using 2015, 2017, and 2019 survey data for Chinese households. We hypothesize that higher financial inclusion penetration encourages households to participate in the financial market, leading to better portfolio diversification and investment efficiency. The results of the baseline model are consistent with our proposed hypothesis that higher accessibility to financial inclusion encourages households to invest in risky assets and increases investment efficiency. We further estimate a dynamic double machine learning model to quantitatively investigate the non-linear causal effects and track the dynamic change of those effects over time. We observe that the marginal effect increases over time, and those effects are more pronounced among low-asset, less-educated households and those located in non-rural areas, except for investment efficiency for high-asset households.
  • 详情 Uncertainty and Market Efficiency: An Information Choice Perspective
    We develop an information choice model where information costs are sticky and co-move with firm-level intrinsic uncertainty as opposed to temporal variations in uncertainty. Incorporating analysts' forecasts, we predict a negative relationship between information costs and information acquisition, as proxied by the predictability of analysts' forecast biases. Finally, the model shows a contrasting pattern between information acquisition and intrinsic and temporal uncertainty, where intrinsic uncertainty strengthens return predictability of analysts' biases through the information cost channel, while temporal uncertainty weakens it through the information benefit channel. We empirically confirm these opposing relationships that existing theories struggle to explain.
  • 详情 Chinese Housing Market Sentiment Index: A Generative AI Approach and An Application to Monetary Policy Transmission
    We construct a daily Chinese Housing Market Sentiment Index by applying GPT-4o to Chinese news articles. Our method outperforms traditional models in several validation tests, including a test based on a suite of machine learning models. Applying this index to household-level data, we find that after monetary easing, an important group of homebuyers (who have a college degree and are aged between 30 and 50) in cities with more optimistic housing sentiment have lower responses in non-housing consumption, whereas for homebuyers in other age-education groups, such a pattern does not exist. This suggests that current monetary easing might be more effective in boosting non-housing consumption than in the past for China due to weaker crowding-out effects from pessimistic housing sentiment. The paper also highlights the need for complementary structural reforms to enhance monetary policy transmission in China, a lesson relevant for other similar countries. Methodologically, it offers a tool for monitoring housing sentiment and lays out some principles for applying generative AI models, adaptable to other studies globally.
  • 详情 Disagreement on Tail
    We propose a novel measure, DOT, to capture belief divergence on extreme tail events in stock returns. Defined as the standard deviation of expected probability forecasts generated by distinct information processing functions and neural network models, DOT exhibits significant predictive power for future stock returns. A value-weighted (equal-weighted) long-short portfolio based on DOT yields an average return of -1.07% (-0.98%) per month. Furthermore, we document novel evidence supporting a risk-sharing channel underlying the negative relation between DOT and the equity premium following extreme negative shocks. Finally, our findings are also in line with a mispricing channel in normal periods.
  • 详情 Spatiotemporal Correlation in Stock Liquidity Through Corporate Networks from Information Disclosure Texts
    The healthy operation of the stock market relies on sound liquidity. We utilize the semantic information from disclosure texts of listed companies on the China Science and Technology Innovation Board (STAR Market) to construct a daily corporate network. Through empirical tests and performance analyses of machine learning models, we elucidate the relationship between the similarity of company disclosure text contents and the temporal and spatial correlations of stock liquidity. Our liquidity indicators encompass trading costs, market depth, trading speed, and price impact, recognized across four dimensions. Furthermore, we reveal that the information loss caused by employing Minimum Spanning Tree (MST) topology significantly affects the explanatory power of network topology indicators for stock liquidity, with a more pronounced impact observed at the document level. Subsequently, by establishing a neural network model to predict next-day liquidity indicators, we demonstrate the temporal relationship of stock liquidity. We model a liquidity predicting task and train a daily liquidity prediction model incorporating Graph Convolutional Network (GCN) modules to solve it. Compared to models with the same parameter structure containing only fully connected layers, the GCN prediction model, which leverages company network structure information, exhibits stronger performance and faster convergence. We provide new insights for research on company disclosure and capital market liquidity.
  • 详情 Customers’ emotional impact on star rating and thumbs-up behavior towards food delivery service Apps
    This study explores the intricate relationship between emotional cues present in food delivery app reviews, normative ratings, and reader engagement. Utilizing lexicon-based unsupervised machine learning, our aim is to identify eight distinct emotional states within user reviews sourced from the Google Play Store. Our primary goal is to understand how reviewer star ratings impact reader engagement, particularly through thumbs-up reactions. By analyzing the influence of emotional expressions in user-generated content on review scores and subsequent reader engagement, we seek to provide insights into their complex interplay. Our methodology employs advanced machine learning techniques to uncover subtle emotional nuances within user-generated content, offering novel insights into their relationship. The findings reveal an inverse correlation between review length and positive sentiment, emphasizing the importance of concise feedback. Additionally, the study highlights the differential impact of emotional tones on review scores and reader engagement metrics. Surprisingly, user-assigned ratings negatively affect reader engagement, suggesting potential disparities between perceived quality and reader preferences. In summary, this study pioneers the use of advanced machine learning techniques to unravel the complex relationship between emotional cues in customer evaluations, normative ratings, and subsequent reader engagement within the food delivery app context.
  • 详情 Do Enterprises Adopting Digital Finance Exhibit Higher Values? Based on Textual Analysis
    In this paper, we investigate whether those enterprises adopting digital finance exhibit higher values. On the basis of the constructed fintech-related lexicon developed by the machine learning-based Word2Vec model, we employ the frequency of fintech-related words (phrases) in the management discussion sections of annual reports as a proxy variable for the degree to which enterprises apply digital finance. We utilize panel data regression and mediation models based on data of Chinese A-share listed companies from 2016 to 2022 and explore the impact of this degree of digital finance application on enterprise value. We find that the degree to which enterprises apply digital finance elevates their values. The in-depth integration of digital technology and finance directly enhances enterprise value by reducing financing costs. Additionally, the effects are more evident among small-scale firms and enterprises located in regions with lower marketization levels. However, in the face of the impact of the COVID-19 pandemic, the positive effects on enterprises are relatively low.