semantic embeddings

  • 详情 Technological Momentum in China: Large Language Model Meets Simple Classifications
    This study applies large language models (LLMs) to measure technological links and examines its predictive power in the Chinese stock market. Using the BAAI General Embedding (BGE) model, we extract semantic information from patent textual data to construct the technological momentum measure. As a comparison, the measure based on traditional International Patent Classification (IPC) is also considered. Empirical analysis shows that both measures significantly predict stock returns and they capture complementary dimensions of technological links. Further investigation through stratified analysis reveals the critical role of investor inattention in explaining their differential performance: in stocks with low investor inattention, IPC-based measure loses its predictive power while BGE-based measure remains significant, indicating that straightforward information is fully priced in while complex semantic relationships require greater cognitive processing; in stocks with high investor inattention, both measures exhibit predictability, with BGE-based measure showing stronger effects. These findings support behavioral finance theories suggesting that complex information diffuses more slowly in markets, especially under significant cognitive constraints, and demonstrate LLMs’ advantage in uncovering subtle technological connections that traditional methods overlook.