1. Introduction
With the rapid development of Internet technology, it has become the most extensive and fast-reacting media channel for social changes. Online social media platforms provide a convenient space for public participation, sharing, and interaction, leading to a large number of texts expressing personal views, which are rich in users' emotions and attitudes towards products, events, and services. Understanding and analyzing this emotional information is crucial for product optimization and service improvement. Thus, aspect-based sentiment analysis (ABSA) of text reviews has grown increasingly popular in both research and applications. ABSA aims to determine the sentiment polarities of given aspects in an input sentence. Its core is to accurately identify and deeply analyze sentiment elements at different aspect levels, such as aspect terms, categories, opinion expressions, and sentiment polarities.
ABSA can be classified into explicit ABSA (EABSA) and implicit ABSA (IABSA), depending on the presence or absence of obvious opinion expressions in the statements. Relevant studies indicate that implicit sentiment sentences in text-type statements account for over 30% of sentiment expressions, representing a crucial component. To explore this phenomenon in depth, we selected four examples from Restaurant to compare implicit and explicit sentiment sentences, as shown in Table 1. In explicit sentiment sentences, for aspect terms “delivery times” and “food,” with clear opinion expressions “fastest” and “fresh and hot,” it is easy to determine that the sentiment polarity is positive. However, in implicit sentiment sentences, for aspect terms “waitress” and “server,” without obvious opinion expressions, inferring the corresponding sentiment polarity is challenging. Thus, the focus of this study is on uncovering hidden sentiment elements in sentiment sentences to construct a comprehensive sentiment map, thereby reducing the difficulty of implicit sentiment analysis.
Examples of explicit and implicit sentiment samples
With the emergence of the new paradigm in natural language processing, “pre-training, fine-tuning, and prediction” [1,2], downstream tasks can be reformulated with the aid of prompt texts. This makes them resemble the tasks solved during the original large language model (LLM) training, narrowing the gap between pre-trained models and downstream tasks and overcoming the limitations of pre-trained models. Moreover, with the deepening research on human reasoning processes, Wei et al. first proposed the concept of Chain-of-Thought (COT). By using a series of intermediate reasoning steps, COT enables the model to construct the task's logical thinking before generating the reasoning result. As a result, the model learns to output reasoning steps step by step until the final answer is obtained [3,4].
Inspired by previous research, this study transforms the IABSA task into a natural language inference (NLI) task. Through carefully designed prompt texts, the model is guided step-by-step to infer the sentiment elements required for IABSA. Meanwhile, the self-correction mechanism of LLMs is utilized to ensure the accuracy of inference results. To enrich the semantic and syntactic information of implicit review sentences, this work introduces relevant concept representations from external knowledge bases and uses LLMs to integrate the acquired knowledge into implicit review sentences, thereby generating synthetic sentences. Subsequently, syntactic passivation transformations are applied to these synthetic sentences to generate additional syntactic information and strengthen the training data. Consequently, this study presents an implicit sentiment analysis model that incorporates data augmentation and automatic feedback correction within a three-stage cascaded prompt reasoning framework (DA-AFC-TSCPR). The main contributions are as follows:
· DA-AFC-TSCPR offers an approach to address implicit sentiment analysis by simulating human reasoning processes. It breaks down complex tasks into simple subtasks. Through the design of three prompt sentences, it guides the inference of three key elements essential for aspect-based sentiment analysis step by step, thus compensating for the obscurity of opinion expressions in implicit review sentences.
· This study alleviates the scarcity of limited-labeled data and enriches the semantic information of implicit review sentences by integrating concept knowledge from the OpenHowNet knowledge base. Additionally, it employs the syntactic passivation transformation method to generate sentences with more syntactic information, thereby overcoming the lack of grammatical sensitivity in NLI.
· Automated feedback mechanism leveraging LLMs to refine the inference outcomes of pre-trained models, enhances the accuracy of the inferences.
2. Related Works
In the practical applications of ABSA, EABSA remains the dominant scenario in daily life. However, on social media platforms, netizens' language expressions are rather implicit and non-intuitive, with a large number of implicit sentiment expressions. This indicates that IABSA has clear and extensive application scenarios, such as online public opinion analysis, e-commerce reviews, and online anti-fraud.
To address the challenge of IABSA caused by the lack of sentiment-feature words, researchers have made extensive efforts using methods such as attention mechanisms, graph neural networks, and knowledge enhancement. Xu et al. [5] proposed an implicit sentiment analysis model that combines knowledge enhancement and context features, using knowledge graphs to supplement implicit sentiment expressions. This model effectively integrates external knowledge and context features through a co-attention mechanism. Li et al. [6] applied supervised contrastive pre-training when handling large-scale sentiment-annotated corpora retrieved from domain-specific corpora. This approach aligns the representations of implicit sentiment expressions with those having the same sentiment labels, enabling the pre-training process to more effectively capture implicit and explicit sentiment inclinations in reviews. Ouyang et al. [7] introduced syntactic-distance weighting and non-likelihood contrastive regularization techniques during the training phase to guide the model to generate explicit opinion words consistent with the sentiment polarity of the input sentences. They also employed a constrained beam-search method to ensure that the augmented content is closely related to specific aspects. Zuo et al. [8] proposed a context-specific heterogeneous graph convolutional network framework, leveraging the complete context of implicit sentiment sentences. This framework improves the accuracy of implicit sentiment sentence analysis using context semantic information.
IABSA, a key subfield of ABSA, has been widely studied. Existing research methods have improved performance on specific implicit sentiment analysis tasks but remain in the “pre-training, fine-tuning” stage. They require adapting pre-trained language model (PLM) to downstream tasks, suffer from low classification accuracy, are not applicable to all implicit sentiment sentences, and have unsatisfactory practical performance compared to explicit sentiment analysis tasks. In contrast, the proposed DA-AFC-TSCPR method combines LLMs and PLM. It uses a prompt-based multi-hop reasoning approach to infer hidden sentiment elements step-by-step, enhancing sentence sentiment features. With data augmentation to expand training data and an automatic feedback correction mechanism to optimize reasoning, it helps the model infer the final sentiment polarity more easily and improves IABSA performance.
3. Model
3.1 Problem Definition
In this paper, the implicit aspect-based sentiment analysis task is formalized as follows:
Input: X=< C, t>, where C represents the context information of the input sentence, and t is the entity target mentioned in the input sentence.
Output: Y=<A, O, S>, which represents the sentiment elements regarding entity t. Here, A is the aspect term related to entity t, O is the opinion expression corresponding to aspect term A, and S is the sentiment orientation corresponding to aspect term A.
The objective of the model in this study is to predict the sentiment polarity S (i.e., positive, negative, or neutral) of entity target t within sentence X.
3.2 Model Architecture
Existing sentiment analysis models struggle to accurately analyze implicit sentiment, yet humans can easily infer the true intent behind implicit sentiment sentences. Thus, this study aims to endow sentiment analysis models with human-like reasoning capabilities. For example, consider the implicit sentiment sentence “I waited a long time for the order to be delivered.” If a sentiment analysis model could decompose the process of inferring sentiment polarity as humans do—first identifying the aspect term “delivery speed” related to “order” and then determining the latent opinion expression “slow speed”—it could accurately predict the “negative” sentiment polarity regarding “order” in the sentence. To enable sentiment analysis models to break down complex tasks into simpler subtasks like humans, we designed the DA-AFC-TSCPR model for implicit sentiment analysis, which combines data augmentation and automatic feedback correction with a three-stage cascaded prompt reasoning mechanism, as shown in Fig. 1. This model consists of three modules: data augmentation, three-stage cascaded prompt reasoning, and automatic feedback correction.
The overall architecture of DA-AFC-TSCPR.
Examples of knowledge enhancement.
3.2.1 Data augmentation module
Implicit sentiment analysis encounters difficulties in obtaining annotated data. Moreover, due to the lack of explicit sentiment cues, it suffers from the scarcity of semantic information. However, the performance of LLMs critically depends on the quality and quantity of training data. To address this, we employ knowledge enhancement techniques to mitigate the scarcity of annotated data and the ambiguity of semantic information, examples of the enhanced data are shown in Fig. 2.
For NLI tasks, Min et al. [9] have demonstrated the necessity of being sensitive to syntactic structures. Given the limited ability of the T5 model to capture syntax, we adopt the syntactic blunting transformation method to enhance the syntactic information of newly generated copies, aiming to produce sentences with richer syntactic structures. Eventually, the model can learn more features from the augmented data. The processes of knowledge enhancement and syntactic enhancement are illustrated in Algorithm 1.
Knowledge enhancement and syntactic enhancement
Process of three-stage cascaded prompt inference.
3.2.2 Three-stage cascaded prompt reasoning module
Inspired by human reasoning processes and the COT prompting method, in this study, instead of directly asking LLMs for the sentiment polarity S result, we propose a three-stage cascaded prompt reasoning method. We aim to have LLMs infer the underlying aspect terms and opinion expressions step-by-step before determining the final sentiment polarity S. An example of this method is illustrated in Fig. 3.
This module's main task is constructing prompt texts with sentiment knowledge via the prompt mechanism. Each-stage prompt incorporates prior-stage knowledge, enhancing the model's context and aspect-term understanding and unifying sentiment-analysis tasks. Key is designing a semantically- relevant prompt template. We first define possible aspect terms and potential opinion expressions o in sentiment statements. The design process of the three-stage cascaded-prompt text template based on the Prompt mechanism is as follows:
Step 1: The objective of this step is to obtain the possible aspect term for a given target t in sentence X. We use the context semantic information C of the original input sentence and the given target t as sentiment knowledge to construct the prompt text. The design process of the text template for the first-level prompt is shown in Fig. 4. Here, C1 represents the context for the first-level prompt, and Px1 is the prompt text constructed based on the sentiment knowledge of the first level.
Design procedure of the first-stage prompt text template.
Step 2: Based on the possible aspect an obtained from C1 and the first-level prompt, we further construct a template to obtain the latent opinion o regarding the target t in sentence X. The specific design process of the text template is shown in Fig. 5. Here, C2 represents the context of the second-level prompt that connects C, t, and a, and Px2 is the prompt text constructed based on the sentiment knowledge of the second level.
Design procedure of the second-stage prompt text template.
Step 3. Using the complete sentiment framework (X, t, a, and o) as the context, we construct a prompt template to infer the final sentiment polarity S with a LLM. The design process of the text template for the third-level prompt is shown in Fig. 6. Here, C3 represents the context of the third-level prompt, and Px3 is the prompt text constructed based on the sentiment knowledge of the third level.
Design procedure of the third-stage prompt text template.
Given the integration of LLMs in this study, the aforementioned three-stage cascaded prompt templates require further modification to construct new templates adaptable to LLMs. Table 2 presents specific examples of the three-stage prompt templates.
Examples of prompt text template design based on LLM
3.2.3 Automatic feedback correction module
In the three-stage cascaded prompt reasoning module, each stage incorporates the sentiment knowledge from the previous one. An incorrect output in the previous stage can thus compromise the performance of subsequent-stage prompt reasoning. To mitigate this, we implement an automated feedback correction strategy based on the COT self-consistency mechanism proposed by Wang et al. [10] to ensure reasoning accuracy. During each of the three reasoning stages, we use the decoder of a LLM to generate a set of possible answers. These answers may yield diverse predictions for aspect term, opinion expression, and sentiment polarity. At each stage, we filter candidate answers showing high consistency in predicting aspects, opinions, or sentiment polarities. We then select the answer with the highest confidence from the filtered ones and input it into another LLM. Each LLM instance presents and debates its responses over multiple rounds to reach a consensus on the final answer. This approach enhances reasoning accuracy and reliability.
4. Experimental Results and Analysis
4.1 Experimental Data and Evaluation Metrics
To validate the effectiveness of the proposed DA-AFC-TSCPR method, we conducted experiments on the publicly available SemEval2014 Laptop and Restaurant datasets. Following the work of Li et al. [6], we classified all instances in these two datasets into explicit and implicit sentiment. Given that the original datasets did not provide a validation set, we selected the last 150 data points from the training set as the validation dataset for hyperparameter tuning. Table 3 presents the detailed information on dataset partitioning.
To validate the sentiment classification performance of the method proposed in this paper, we adopt accuracy and F1-score as evaluation metrics. Accuracy represents the overall prediction performance of the model, and the F1-score has a good measurement effect for imbalanced samples. Denote the number of correctly classified samples as T and the total number of data samples as N. The calculation methods of the evaluation metrics are shown in Eqs. (1) and (2):
Here, P represents the precision rate, and R represents the recall rate.
4.2 Experimental Setup
The experiments were conducted using Python 3.9 as the development environment and PyTorch 2.5.1 as the development framework. The encoder-decoder style Flan-T5 was adopted as our main LLM. The experimental hyperparameter configurations are presented in Table 4.
4.3 Comparative Experiments
To further validate the performance of the proposed DA-AFC-TSCPR, we compared it with current mainstream sentiment analysis models. The comparison models were selected by referring to relevant literature in the field of implicit sentiment analysis and grouped into five categories. The specific experimental data and results are presented in Table 5. Here, F1-All and F1-ISA denote the F1-score obtained by using all data and implicit data.
Experimental results (unit: %)
Best results are marked in bold, and the second-best results are marked with underlines.
Based on the comparison experimental results presented in Table 5, our proposed DA-AFC-TSCPR method demonstrates outstanding performance on both the Laptop and Restaurant datasets. Notably, it achieves significant improvements in the F1-ISA metric, validating the effectiveness of the proposed approach.
Compared with attention-based models, MGAN captures semantic info at different granularities but may neglect implicit expressions due to reliance on explicit sentiment words. In contrast, this study's three-stage cascaded prompt reasoning module simplifies implicit sentiment analysis, improving F1-ISA. Versus graph neural network-based models like ASGCN, BiGCN, and RGAT, which rely on explicit dependencies, this study's data augmentation module uses OpenHownet for knowledge-aware representations and syntactic passivation for grammar-aware ones, resulting in significant F1-ISA improvement.
Knowledge-enhanced models like BERT+PT, CapsNet+BERT, BERT+SPC, and BERT+ADA underperform the optimal baseline BERTAsp+SCAPT. BERTAsp+SCAPT pre-trained on large sentiment-aspect corpora, excels in learning implicit sentiment. Our DA-AFC-TSCPR outdoes BERTAsp+SCAPT, with F1-score increases on SemEval14 datasets. This is because BERTAsp+SCAPT may rely too much on corpora. Our model employs an automated feedback correction mechanism for self-correction, and conducts knowledge enhancement and syntactic enhancement based on prompts, thus enriching the sentiment features of implicit samples and reducing the difficulty of implicit sentiment analysis. When comparing prompt-based models such as BERT+Prompt, Flan-T5+Prompt(250M), and Flan-T5+THOR(250M) [11], our proposed model, in addition to using prompts, incorporates data augmentation and an LLMs automatic feedback correction strategy. This enriches the semantic and syntactic information of implicit sentiment sentences and reduces the likelihood of LLMs hallucinations.
4.4 Ablation Study
To further validate the effectiveness of the data augmentation (DA) module, three-stage cascaded prompt reasoning (TSCPR) module, and automatic feedback correction (AFC) module in our proposed method, we conducted ablation experiments on the implicit sentiment dataset. The F1-scores of different experimental groups are presented in Table 6. All other experimental parameter configurations remain the same.
Experiment 1 only fine-tuned the T5 model with prompts. Experiment 2 applied data augmentation to fine-tune the T5+prompt model, increasing F1-scores on two datasets by 1.87% and 0.51%, showing the importance of knowledge and syntactic data augmentation for handling implicit sentiment. Experiment 3 added three-stage cascaded prompts and LLM automatic feedback correction, boosting F1-scores by 4.16% and 1.84% compared to Experiment 1, indicating the approach simplifies problems and the correction refines answers. Experiment 4, our proposed method, combined data augmentation, three-stage cascaded prompts, and automatic feedback correction, achieving the best F1-scores. The three modules help the model extract latent sentiment and determine polarities accurately.
In the automatic feedback correction module, the choice of LLM significantly impacts the answer-correction ability. Fig. 7 presents the accuracy comparison of different LLMs on the Laptop and Restaurant datasets. Evidently, the Qwen2.5 model exhibits the highest correction accuracy. Qwen2.5 can dynamically optimize based on real-time error feedback to adjust its output. It performs more stably in complex scenarios, such as the Restaurant dataset. Moreover, its feedback mechanism enables the model to rapidly adapt to the characteristics of new datasets, for example, the domain differences between the Laptop and Restaurant datasets.
4.5 Case Study Analysis
To analyze the practical performance of the proposed DA-AFC-TSCPR, we selected samples with misidentified sentiment polarities by comparison models for case-by- case analysis. Specifically, two typical samples were chosen from the Laptop and Restaurant test datasets for instance analysis, and the results are presented in Table 7. Given that the base model of DA-AFC-TSCPR is T5, we selected the Flan-T5+THOR(250M) model for comparison, which is also based on T5.
For instance, consider the example sentence in Table 7, “When asked, we had to ask more detailed questions so that we knew what the specials were.” Through knowledge enhancement, syntactic enhancement, and prompt-based reasoning proposed in this study, it is quite straightforward to obtain the possible aspect “service” and the latent opinion “unsatisfactory” of this sentence. This enriches the sentiment features of the implicit sentiment sentence. Furthermore, in the third-level prompt, the model can effortlessly analyze the ultimate “negative” sentiment polarity of this sentiment-laden statement.
Experimental results for different LLMs.
5. Conclusion
We propose an innovative IABSA model, DA-AFC-TSCPR. By integrating knowledge enhancement, syntactic passivation transformation, self-feedback correction, and a three-stage cascaded prompt reasoning mechanism, it effectively addresses the challenges of semantic ambiguity and the lack of explicit sentiment elements in IABSA. Experimental results show that on the Laptop and Restaurant datasets, the F1-ISA metrics of this model reach 76.17% and 74.63%, respectively. Compared with existing models, it significantly improves the accuracy of implicit sentiment analysis. This achievement not only advances sentiment analysis technology but also offers new perspectives and tools for understanding and analyzing implicit sentiment expressions in online social media. However, the current method is limited by the simplicity of manually designed prompt templates. In future research, we will explore automated prompt engineering to handle more complex implicit sentiment analysis tasks and further enhance the model's generalization and robustness.
Conflict of Interest
The authors declare that they have no competing interests.
Funding
This work was supported by the Scientific Research Fund for the Higher Education Institutions of Liaoning Province of China under Grant LJ212410152070.