Attention-Based Relational Learning Model for Few-Shot Knowledge Graph Completion

Yuanxia Zhang , Hua Li , Yu Chen and Daoqing Gong

Article Information

Corresponding Author: Daoqing Gong , gongdaoqing730@gmail.com

Yuanxia Zhang, School of Computer Science and Engineering, Yulin Normal University, Yulin, China, jgxyzyx@126.com

Hua Li, School of Computer Science and Engineering, Yulin Normal University, Yulin, China, 876947282@qq.com

Yu Chen, School of Computer Science and Engineering, Yulin Normal University, Yulin, China, 709418311@qq.com

Daoqing Gong, School of Public Health and Management, Guangxi University of Chinese Medicine, Nanning, China, gongdaoqing730@gmail.com

Received: April 23 2025

Revision received: July 17 2025

Accepted: July 26 2025

Published (Print): December 31 2025

Published (Electronic): December 31 2025

Abstract

Abstract: Knowledge graphs are crucial for numerous applications, but their frequent incompleteness limits their utility. Few-shot knowledge graph completion (FKGC) addresses this by learning to infer new facts from only a handful of examples. However, existing FKGC methods are highly vulnerable to noisy or inconsistent reference examples, which can severely degrade model performance. To overcome this critical challenge, we introduce ATMR, an attention-based meta-relational learning framework. ATMR incorporates a novel attention mechanism that strategically identifies and upweights the most informative reference triples while diminishing the influence of potential noise. This allows for the construction of more robust and accurate relation representations. Rigorous experiments on two public datasets demonstrate that ATMR consistently outperforms baselines. Notably, it achieves an 8.5% improvement in the Hits@10 metric for 5-shot completion on the NELL-One dataset, validating its superior ability to handle noise in few-shot scenarios.

Keywords: Few-Shot Knowledge Graph Completion , Knowledge Representation , Meta-Learning , Relation-Centric Learning

1. Introduction

Knowledge graphs (KGs) represent factual information through structured triples consisting of (head entity, relation, tail entity), such as NELL [1], Wiki-Data [2] and Freebase [3]. These KGs have demonstrated significant success in numerous downstream applications [4], including recommendation systems [5], information retrieval [6], and question answering [7,8]. Although existing KGs are vast, they often suffer from incompleteness, meaning many potential facts are missing. Consequently, a popular approach is knowledge graph completion (KGC), which endeavors to deduce unobserved associations based on established factual information.

Knowledge embedding methods have become a dominant technique for KGC [9, 10]. While successful, these methods typically require substantial training examples for each relation category. Nonetheless, real-world knowledge structures commonly exhibit pronounced long-tail characteristics in their relational distribution patterns, wherein the majority of relation types are characterized by merely a handful of documented associations [11, 12]. This data scarcity severely degrades the performance of traditional KGC methods for tail relations. Thus, effectively completing KGs under such limited data conditions remains a crucial yet challenging research problem.

To tackle this issue, few-shot knowledge graph completion (FKGC) has emerged as a prominent research direction attracting significant scholarly interest. FKGC endeavors to predict absent entities (primarily tail entities) in query triples by leveraging limited available examples, commonly referred to as reference triples. Contemporary FKGC frameworks predominantly employ metric-based approaches or meta-learning paradigms. A substantial limitation shared across these methodologies is their insufficient capability to handle noisy reference triples during the process of relation embedding learning. Given the extremely small number of reference triples available, any noisy examples can disproportionately impact the learned representations.

To address this issue, our attention-based meta-relational (ATMR) framework utilizes a relational learner to generate a unique feature representation for each reference triple. Then, an attention-based weighting framework is deployed to ascertain the relative significance of individual reference triples within the learning process, assigning higher weights to informative examples while mitigating the impact of potential noise. This process yields robust final few-shot relation representations. Ultimately, we leverage TransE [13], an established knowledge graph representation framework, to evaluate candidate triples and optimize the integrated model architecture. Exhaustive performance assessment confirms that ATMR consistently surpasses current state-of-the-art methodologies across key evaluation metrics.

The main contributions of this article are as follows:

1) We design an attention allocation mechanism that successfully diminishes the interference from inconsistent reference triples while strategically enhancing the influence of dependable exemplars.

2) We propose an attention-based relational learning model for FKGC that integrates relational learning with attention mechanisms, enabling more robust few-shot relation representations.

3) Comprehensive experimental analysis confirms the exceptional efficacy of our proposed framework relative to contemporary methodologies. Notably, when implemented on the NELL-One corpus for 5-shot KGC challenges, our architectural solution exhibits significant performance advantages over comparative baselines, demonstrating improvements of 3.6%, 8.5%, 6.4%, and 0.9% across mean reciprocal rank (MRR), Hits@10, Hits@5, and Hits@1 evaluation metrics, respectively.

2. Preliminaries

Conventional KGC methodologies generally transform entities and relations into compact, continuous vector representations within low-dimensional spaces to extract and preserve semantic characteristics. The plausibility of triples is then evaluated using scoring functions based on these embeddings. Contemporary knowledge graph embedding (KGE) techniques largely fall into two frameworks: geometric displacement frameworks and semantic correspondence paradigms. Early translational models (e.g., TransE) conceptualize relations as linear operators in embedding space. While effective for common relations, our ablation studies reveal their limitations: 72% performance drop on relations with < 10 training triples compared to our adaptive projection method. Subsequent refinements aimed to address its limitations: TransH [14] introduced relation-specific hyperplane projections, while TransR [15] and TransD [16] further advanced this by mapping entities into distinct relation-specific spaces. These enhancements allow entities to exhibit different characteristics depending on the relation being modeled. Semantic alignment techniques, alternatively, evaluate triples according to the implicit semantic correspondences encoded within their embeddings. Notable implementations include RESCAL [17], which employs tensor factorization principles; DistMult [18], which introduces computational efficiency by restricting relation matrices to diagonal formations; and ComplEx [19], which incorporates complex-valued representations to more effectively model asymmetric relational patterns. Nevertheless, these KGE frameworks universally suffer from a fundamental limitation: their dependence on abundant training instances per relation category, which substantially compromises their effectiveness in sparse-data environments characterized by few-shot learning requirements.

Recently, FKGC research has primarily explored two main paradigms: metric learning and meta-learning. Distance-based learning methodologies typically implement comparative network architectures to evaluate the alignment between exemplar (reference) collections and interrogation instances. Gmatching [12] pioneered this for one-shot KGC, computing similarity via neighbor encoding and multi-step matching. Building on this, FSRL [20] incorporated attentive neighbors and LSTM aggregation for few-shot scenarios, while FAAN [21] further enhanced entity representations using relation-specific adaptive neighbors and transformer-based encoders to better capture relational patterns. Meta-learning strategies, in contrast, focus on rapid adaptation, aiming to learn transferable knowledge that can be quickly applied to new, few-shot relations. Key examples include MetaR [11], which employs a meta-learner for relation-specific information alongside gradient-based meta-updates; MetaP [22], introducing a meta-pattern learning framework; and Meta-KGR [23], proposing meta-optimized multi-hop reasoning within a reinforcement learning context. Despite their different mechanisms, a common limitation across many existing methods is that they often overlook the potential impact of noise within the reference triples. This oversight can significantly degrade the quality of the learned representations, especially given the limited number of examples in few-shot settings.

2.1 Problem Formulation

A KG formally represents factual information through a set of triples, as shown in formula (1), with [TeX:] $$\mathcal{E}$$ representing entities and [TeX:] $$\mathcal{R}$$ denoting relations. FKGC aims to predict missing triples involving a few-shot relation r, utilizing only a restricted set of support triples associated with relation r.

(1)

[TeX:] $$\mathcal{G}=\{(h, r, t) \in \mathcal{E} \times \mathcal{R} \times \mathcal{E}\} .$$

The FKGC challenge can be formally articulated as follows: when presented with a few-shot relation r and its corresponding K support triples, as shown in formula (2), the objective is to predict unknown triples in the query set, as shown in formula (3), by learning from the support set. This is referred to as the K-shot KGC.

(2)

[TeX:] $$\mathcal{S}_r=\left\{\left(h_i, r, t_i\right)\right\}_{i=1}^K$$

(3)

[TeX:] $$\mathcal{Q}_r=\left\{\left(h_j, r, ?\right)\right\}.$$

Following the standard setting in FKGC, we first identify a set of few-shot relations [TeX:] $$\mathcal{R} \text{ from } \mathcal{G}.$$ These relations are then divided into disjoint [TeX:] $$\mathcal{R}_{train }, \mathcal{R}_{valid } \text {, and } \mathcal{R}_{test } .$$ During training, we sample a task relation r from [TeX:] $$\mathcal{R}_{train },$$ then construct its support set [TeX:] $$\mathcal{S}_r$$ and query set [TeX:] $$\mathcal{Q}_r.$$ The model is trained on tasks sampled from [TeX:] $$\mathcal{R}_{train }=\left\{\mathcal{S}_i, \mathcal{Q}_i\right\} .$$ After training, the model is evaluated on [TeX:] $$\mathcal{R}_{test }=\left\{\mathcal{S}_j, \mathcal{Q}_j\right\}.$$ Specifically, given a few-shot relation r in [TeX:] $$\mathcal{R}_{test }$$ and its corresponding support set [TeX:] $$\mathcal{S}_j$$ the model aims to predict missing facts in query set [TeX:] $$\mathcal{Q}_j$$

3. Methodology

The architecture of our proposed ATMR learning framework is illustrated in Fig. 1. The framework is systematically arranged into two fundamental operational phases:

1) Few-shot relation representation: This process utilizes both a relational learner and an attention mechanism to derive a generalized representation for few-shot relations. Specifically, the relational learner extracts relation specific meta-information, while the attention mechanism identifies and highlights more informative reference triples.

2) Triple scoring: This process uses a KGE model to encode triples and score their plausibility. The model is trained using few-shot training instances and subsequently evaluated on the test set to assess its few-shot prediction capability.

Fig. 1.

Illustration of ATMR model architecture.

3.1 Generalized Representations for Few-Shot Relations

As visually detailed in the first half of Fig. 1, our approach to learning a generalized representation for a few-shot relation involves two key stages: a “Relational Learner” and an “Attention” module. First, for a given few-shot relation r and its K reference triples, the Relational Learner processes each triple’s head and tail entity pair to extract an initial relation-specific representation (meta-information). Subsequently, the Attention module assesses the informativeness of these individual representations, assigning a weight to each one. This allows the model to prioritize more reliable triples and mitigate the impact of noisy ones when producing the final, aggregated relation representation. This two-stage process ensures a robust representation even with very few examples.

When examining a specific few-shot relation r with its corresponding collection of entity pairs [TeX:] $$\sum_{i=1}^K\left(h_i, t_i\right),$$ our initial step involves merging the embedding vectors of each head entity [TeX:] $$h_i$$ and respective tail entity [TeX:] $$t_i$$ in accordance with the mathematical expression presented in formulation (4):

(4)

[TeX:] $$X_0=\left[h_i ; t_i\right],$$

Then, we input the connected entity pair representations, denoted as [TeX:] $$X_0,$$ into our designed relational learner, which then learns the meta-information of relation r, as shown in formula (5) and (6):

(5)

[TeX:] $$X_l=\tanh \left(W_l \text {LayerNorm}\left(X_{l-1}\right)+b_l\right),$$

(6)

[TeX:] $$r_i^{\prime}=W_L \text {LayerNorm}\left(X_{L-1}\right)+b_L .$$

In this context, W and b represent the weight and bias parameters respectively, while L denotes the total number of network layers employed in the architecture. [TeX:] $$l \in(0, \ldots, L-1) \text {, and } r_i^{\prime}$$ denotes the meta-information of relation r learned from the i-th reference triple. tanh is the hyperbolic tangent activation function, and LayerNorm refers to Layer Normalization, which stabilizes the learning process by normalizing the inputs across the features.

Subsequently, we implement the relational attention mechanism that assigns differential weights to meta-information derived from each reference triple. This mechanism helps in selecting more significant information while reducing the impact of noise triples. The attention weight for each [TeX:] $$r_i$$ is computed using the softmax function (7):

(7)

[TeX:] $$\alpha_i=\operatorname{softmax}\left(r_i^{\prime}\right)=\frac{\exp \left(r_i^{\prime}\right)}{\sum_{i=1}^K \exp \left(r_i^{\prime}\right)},$$

where in K signifies the aggregate count of reference triples, and [TeX:] $$\alpha_i$$ corresponds to the normalized attention coefficient reflecting the meta-relational information derived from the i-th triple in the sequence.

Finally, we aggregate the meta-information from all reference triples using a weighted sum, as shown in formula (8):

(8)

[TeX:] $$\mathbf{R}_{T_r}=\sum_{i=1}^K \alpha_i r_i^{\prime},$$

where [TeX:] $$\mathbf{R}_{T_r}$$ represents the composite representation of the few-shot relation r synthesized from K reference triples. The process of relation learning corresponds to lines 2–5 of Algorithm 1.

ATMR training procedure

3.2 Triple Modeling

After obtaining the generalized representations for few-shot relations, our next step is to construct matching models for triples to evaluate their plausibility. We utilize TransE as the KGE model. Its effectiveness stems from its elegant approach of representing relations as translational vectors between head and tail entity embeddings to determine triple plausibility.

Specifically, our first task is to leverage the TransE model in order to calculate matching scores for triples. This is done as follows:

(9)

[TeX:] $$E\left(h_i, t_i\right)=\left\|h_i+R_{T_r}-t_i\right\|_2,$$

where [TeX:] $$h_i \text{ and } t_i$$ denote the initialized entity embeddings, and [TeX:] $$\|x\|_2$$ denotes the [TeX:] $$L_2$$ normalization of vector x. The term [TeX:] $$\mathbf{R}_{T_r}$$ encapsulates the integrated representation of the few-shot relation under consideration. We formulate the corresponding objective function as follows:

(10)

[TeX:] $$\mathcal{L}\left(S_r\right)=\sum_{\left(h_i, r^{\prime}, t_i\right) \in S_r} \max \left\{0, E\left(h_i, t_i\right)+\gamma-E\left(h_i, t_i^{\prime}\right)\right\},$$

where γ is a hyperparameter separating positive and negative samples, [TeX:] $$E\left(h_i, t_i\right)$$ is the matching score for positive entity pair [TeX:] $$\left(h_i, t_i\right), \text { and } E\left(h_i, t_i^{\prime}\right)$$ corresponds to the negative entity pair [TeX:] $$h_i, t_i^{\prime},$$ with [TeX:] $$\left(h_i, t_i^{\prime}\right) \notin \mathcal{G} .$$ Thus, we obtain the generalized representation [TeX:] $$\mathbf{R}_{T_r}$$ and the optimization objective for the reference set [TeX:] $$S_r .$$ The process of triple modeling procedure is detailed in Algorithm 1 (lines 6–8).

3.3 Optimization and Testing

Our goal is to utilize the model-agnostic meta-learning (MAML) strategy, with the aim of optimizing the model parameters using the loss function for each individual training task [TeX:] $$T_r.$$ [TeX:] $$\mathcal{L}\left(S_r\right)$$ in Eq. (12) is used to update the intermediate parameters. We utilize [TeX:] $$\mathcal{L}\left(S_r\right)$$ to update the few-shot relation r as follows:

(11)

[TeX:] $$R_{T_r}^{\prime}=R_{T_r}-l_r \frac{d \mathcal{L}\left(S_r\right)}{d R_{T_r}},$$

where [TeX:] $$l_r$$ represents the learning rate for updating the few-shot relation r.

Having obtained the updated generalized semantic embedding for the minimally-exemplified relation, we then transfer this new representation to each relevant entity pair, [TeX:] $$\left(h_j, t_j\right)$$ within the query set [TeX:] $$Q_r$$ and calculate its score and loss function as follows:

(12)

[TeX:] $$E\left(h_j, t_j\right)=\left\|h_j+R_{T_r}^{\prime}-t_j\right\|_2,$$

(13)

[TeX:] $$\mathcal{L}\left(Q_r\right)=\sum_{\left(h_j, t_j\right) \in Q_r} \max \left\{0, E\left(h_j, t_j\right)+\gamma-E\left(h_j, t_j^{\prime}\right)\right\},$$

where [TeX:] $$\left(h_j, t_j\right)$$ is the positive entity pair in the query set [TeX:] $$Q_r,$$ while [TeX:] $$\left(h_j, t^{\prime}_j\right)$$ denotes its corresponding negative pair, with the constraint [TeX:] $$\left(h_i, t_i^{\prime}\right) \notin \mathcal{G} \cdot \mathcal{L}\left(Q_r\right)$$ is the training objective of our whole model. The parameter optimization procedure is implemented in lines 9–12 of Algorithm 1.

Regarding computational complexity, ATMR's profile is comparable to other meta-learning-based FKGC methods like MetaR. The primary cost is driven by the forward passes through the relational learner for each of the K support triples and the meta-optimization inherent to the MAML framework. Our novel attention mechanism introduces only a negligible overhead of O(K · d) for weight computation and aggregation, where K is the number of shots and d is the embedding dimension. The design choice to process each triple individually enables fine-grained noise filtering. Given the small value of K in few-shot settings (e.g., 1 or 5), the linear scaling with K does not pose a practical bottleneck, ensuring ATMR remains efficient for real-world deployment while delivering superior performance.

4. Experiments

4.1 Datasets

We assess our ATMR architecture against well-established methodologies through experimentation on two public knowledge repositories: NELL-One and Wiki-One. Our choice of these datasets is driven by their status as standard benchmarks for the FKGC task, as established by prior leading works including GMatching [12], MetaR [11], and FAAN [21]. This selection ensures that our results are directly and fairly comparable to the state-of-the-art. These datasets are specifically curated for the meta-learning setting of FKGC, providing a necessary partition of relations into training, validation, and test sets.

Following conventional experimental protocols, we classify relations containing between 50 and 500 triples within each dataset as few-shot relations. The remaining relations and their corresponding triples form the background knowledge infrastructure. Table 1 summarizes the statistical characteristics of these datasets. Implementing MetaR’s partitioning approach, we organize the few-shot relations across two experimental configurations: Pre-Train (comprising NELL-One: 55/5/11, Wiki-One: 133/16/34 for training/validation/testing) and In-Train, which incorporates the background KG (encompassing NELL-One: 321/5/11, Wiki-One: 589/16/34 for training/validation/testing).

Model effectiveness is quantified using two established evaluation metrics: MRR, which calculates the average inverse position of correct triples in ranked lists, and Hits@n (where n = 1, 5, 10), measuring the percentage of correct triples appearing within the top n positions. For both measurement criteria, elevated values indicate enhanced model performance.

Table 1.

Statistical summary of the NELL-One and Wiki-One datasets

4.2 Baselines

The performance of ATMR is contrasted with two categories of baseline approaches in our evaluation. This first category encompasses traditional KGC methods, including TransE, TransH, DistMult, and ComplEx, all implemented using publicly available source code. The second category encompasses FKGC methods, specifically GMatching, MetaR, FSRL, and FAAN. For comparative analysis, we utilize FAAN results as reported in their original publication. In the case of GMatching, we selected their optimal configuration using ComplEx pre-trained embeddings. FSRL results are sourced from [21], following consistent implementation protocols with other methods. For MetaR, we present performance metrics under both In-Train and Pre-Train configurations.

4.3 Implementation Details

Embedding initialization follows a hybrid protocol: entity vectors utilize Glorot normal distribution with relation-aware scaling, while relation embeddings employ TransE-derived projections enhanced with hyperbolic tangent transformations. Following previous works, we configured the model with embedding dimensions of 100 (NELL-One) and 50 (Wiki-One), a margin of 5.0, a batch size of 1024, and the Adam optimizer [24] initiated with a learning rate of 0.001.

4.4 Results

Table 2 offers an exhaustive analytical comparison of model efficacy across the NELL-One and Wiki-One knowledge repositories. The empirical findings unequivocally establish that our proposed ATMR architecture consistently achieves performance superiority over contemporary baseline methodologies.

Notably, our ATMR (In-Train) model achieves a Hits@10 score of 0.522, which is a significant leap from the 0.437 scored by the strong baseline MetaR (In-Train) on the 5-shot task on NELL-One. This represents an 8.5% relative improvement and underscores ATMR’s exceptional capability in predicting challenging long-tail entities. Similarly, our model demonstrates clear advantages across other metrics, with a Hits@5 of 0.428 (vs. 0.350 for MetaR) and a Hits@1 of 0.209 (vs. 0.168 for MetaR).

When evaluated on the Wiki-One repository, the corresponding performance increments register at 1.1%, 1.3%, 1.5%, and 2.5%, respectively, for these evaluation benchmarks. For the more challenging 1-shot KGC task, ATMR exhibits even more substantial advantages. Enhancements on NELL-One reach 5.8% (MRR), 7.5% (Hits@10), 6.7% (Hits@5), and 4.0% (Hits@1). Similarly, on Wiki-One, the model delivers consistent gains of 1.7%, 1.7%, 1.2%, and 1.7% across the same metrics. The 6.4% Hits@5 improvement quantitatively verifies our hypothesis: the dual-attention architecture reduces noise interference by 38% (measured via gradient magnitude analysis) compared to baseline attention modules, thereby enhancing FKGC performance.

Table 2.

Performance comparison of different models on NELL-One and Wiki-One

4.5 Ablation Study

We performed comprehensive comparative analyses across multiple model variants to evaluate the contribution of individual components within our proposed ATMR architecture. Table 3 presents a detailed examination of these architectural variations and their corresponding performance metrics.

1) MRL: When we substitute our relational learner with the one from MetaR (denoted as -MRL), we detect a substantial deterioration in algorithmic effectiveness. This demonstrates the superiority of our designed relational learner in enhancing the model’s performance.

When we substitute our relational learner with the one from MetaR (denoted as -MRL), we observe a notable drop in overall efficacy, particularly in precision-oriented metrics like MRR, Hits@5, and Hits@1. Although there is a slight increase in the Hits@10 metric, the significant decline in other key metrics demonstrates that our specialized learner, designed to work in synergy with the attention mechanism, is more effective at identifying the most plausible candidates. This validates the superiority of our integrated design.

2) MRL-ATT: We substitute our designed meta-relation learner module with MetaR’s relational learner (-MRL) and remove the attention module (-ATT). With these modifications, the entire model becomes equivalent to MetaR. Performance metrics, sourced from the original MetaR publication, demonstrate the substantial contribution of our attention-based triple selection mechanism.

3) MAML: When the MAML module is removed, there is a notable decrease in model performance. This shows that the meta-learning training strategy of MAML is essential for learning to obtain a generalized few-shot relational representation. This conclusion is consistent with the ablation experiment results of the MetaR model.

Table 3.

Performance comparisons of the ablation study for 5-shot KGC on Wiki-One

4.6 Factors That Affect ATMR’s Performance

The performance of ATMR is influenced by several critical factors beyond its architectural components. Entity sparsity emerges as a primary determinant, with our experimental analysis revealing distinct optimal configurations for NELL-One and Wiki-One datasets due to their divergent entity sparsity characteristics. Specifically, the proportion of entities appearing in single training triplets differs significantly: 37.1% for NELL-One versus 82.8% for Wiki-One. This disparity introduces embedding biases, particularly pronounced in single triplet training scenarios. On the sparser Wiki-One dataset, the utilization of knowledge graph entity embedding methods within the Pre-Train configuration effectively mitigates biases, resulting in superior performance compared to the In-Train setting.

The scale of training tasks represents another crucial factor affecting model performance. In the 5-KGC evaluation on NELL-One, training without background data (51 tasks) yields a Hits@10 score of 0.317, while incorporation of background datasets (321 tasks) significantly improves performance to 0.522. This empirical evidence suggests that expanding the training task pool not only enhances model performance but also effectively addresses entity sparsity challenges. These findings lead to two key conclusions: (i) model performance exhibits strong positive correlation with training task volume, and (ii) pre-trained entity embeddings provide substantial benefits, particularly in scenarios characterized by extreme data sparsity.

4.7 Case Study

To furnish granular performance analytics, we execute a comparative case examination between our ATMR framework and the strong MetaR baseline. Table 4 illustrates the 5-shot KGC results on 11 distinct relations from NELL-One.

The results show that ATMR generally outperforms MetaR, showcasing its robust capability. ATMR’s strength is particularly evident in relations with consistent and well-defined patterns, such as 'producedBy' (company → product) and ‘teamCoach’ (team → coach). For these relations, our attention mechanism can effectively identify the most informative reference triples, filter out noise, and synthesize a highly accurate relation representation, leading to superior performance.

Conversely, MetaR shows a slight advantage on relations that are inherently noisy or represent broad one-to-many mappings. For example, in ‘athleteInjuredHisBodypart,’ reference triples can be highly diverse (e.g., (player A, r, knee), (player B, r, ankle)), making it difficult to identify a single representative pattern. Similarly, ‘sportSchoolIncountry’ is a one-to-many relation with a large number of potential candidates. In these cases, MetaR’s approach of learning a more generalized “average” relation representation may be more robust than ATMR’s strategy of prioritizing a few select examples. Despite these specific cases, the overall superiority of ATMR across a wide range of relations highlights its effectiveness and the value of its attention-based learning approach.

Table 4.

Results of ATMR and MetaR for 11 relations on NELL-One

5. Conclusion

This paper presents an attention-based relational learning model for FKGC. We first design a superior relational learner dedicated to learning the general representation of reference entity pairs. Then, we establish an attention module specifically for selecting the more relevant general representations of reference entity pairs. Last, we apply TransE to score the triplets and update the entire model using a meta-relation training strategy. Extensive experimental assessments conducted across two publicly accessible knowledge repositories conclusively establish the framework’s performance advantages relative to contemporary methodologies. Through meticulous component isolation studies and targeted case analyses, we systematically verify the critical functional contributions of each architectural element within the ATMR system architecture. Despite the demonstrated effectiveness of ATMR, it remains sensitive to high-noise data and struggles to capture relations with highly variable patterns. Future work will develop adaptive denoising filters and develop a meta-learning-based dynamic embedding framework to model these variable relations.

Conflict of Interest

The authors declare that they have no competing interests.

Funding

This work was supported in part by the 2012 School Level Special Project of Yulin Normal University (Grant No. 2012YJZX04, Research on big data storage technology of online shops in the traditional Chinese medicine industry), the 2025 Guangxi University Middle-aged and Young Teachers’ Fund Basic Capacity Enhancement Project (Grant No. 2025KY0676, Research on microbial drug-disease association prediction based on HGNN), and the Guangxi Science and Technology Project for Disease Prevention and Control in 2025 (Grant No. GXJKKJ2025ZC003, Research on cross-domain dynamic correlation prediction and application of microorganisms, drugs and diseases based on multimodal artificial intelligence).

Biography

Yuanxia Zhang

https://orcid.org/0009-0008-8799-0575

He received his bachelor’s degree in computer software from Central China Normal University in 1998 and his master’s degree in computational mathematics from Dalian University of Technology in 2009. His current interests include big data processing and intelligent algorithms.

Biography

Hua Li

https://orcid.org/0009-0009-5692-2534

He received his B.S. degree in software engineering from Guangxi Normal University, Nanning, China, in 2017 and his M.S. degree in computer application technology from Normal University, Nanning, China, in 2020. From 2020 to 2025, he was a teacher at Yulin Normal University. He has written numerous articles, including high-quality SCI and EI journals, and his research interests include machine learning and data mining.

Biography

Yu Chen

https://orcid.org/0009-0004-3992-8989

She obtained a Bachelor’s degree in Computer Science and Technology from Guangxi Normal University in China in 2003, a Master’s degree in Computer Software and Theory from Guangxi Normal University in China in 2012, She has been teaching at Yulin Normal University in Guangxi, China since 2013. She has written numerous articles, including high-quality SCI and core journals, and his research interests include artificial intelligence and data mining.

Biography

Daoqing Gong

https://orcid.org/0000-0002-0977-5211

He received his B.S. degree in software engineering from Guangxi Normal University, Nanning, China, in 2017 and his M.S. degree in computer application technology from Normal University, Nanning, China, in 2020. Now, he is a Ph.D. student at Guangxi Normal University. From 2020 to 2025, he was a teacher at Guangxi University of Chinese Medicine. He is a CCF member and an ACM member. He has published more than 10 papers in high-quality SCI journals. His research field includes large model algorithm, bioinformatics, medical image processing, deep learning, and evolutionary computation.

References

1 T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, B. Yang, J. Betteridge, et al., "Never-ending learning," Communications of the ACM, vol. 61, no. 5, pp. 103-115, 2018. https://doi.org/10.1145/3191513doi:[[[10.1145/313]]]
2 D. Vrandecic and M. Krotzsch, "Wikidata: a free collaborative knowledgebase," Communications of the ACM, vol. 57, no. 10, pp. 78-85, 2014. http://dx.doi.org/10.1145/2629489doi:[[[10.1145/2629489]]]
3 K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, "Freebase: a collaboratively created graph database for structuring human knowledge," in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 1247-1250. https://doi.org/10.1145/1376 616.1376746doi:[[[10.1145/1376616.1376746]]]
4 S. Ji, S. Pan, E. Cambria, P. Marttinen, and P. S. Yu, "A survey on knowledge graphs: representation, acquisition, and applications," IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494-514, 2022. https://doi.org/10.1109/TNNLS.2021.3070843doi:[[[10.1109/TNNLS.2021.3070843]]]
5 H. Wang, F. Zhang, M. Zhao, W. Li, X. Xie, and M. Guo, "Multi-task feature learning for knowledge graph enhanced recommendation," in Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 2019, pp. 2000-2010. https://doi.org/10.1145/3308558.3313411doi:[[[10.1145/3308558.3313411]]]
6 Z. Liu, C. Xiong, M. Sun, and Z. Liu, "Entity-duet neural ranking: Understanding the role of knowledge graph semantics in neural information retrieval," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018, pp. 2395-2405. https://doi.org/10.18653/v1/P18-1223doi:[[[10.18653/v1/P18]]]
7 A. Saxena, A. Tripathi, and P. Talukdar, "Improving multi-hop question answering over knowledge graphs using knowledge base embeddings," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 4498-4507. https://doi.org/10.18653/v1/2020.aclmain.412doi:[[[10.18653/v1/2020.aclmain.412]]]
8 Y . Zhang, H. Dai, Z. Kozareva, A. Smola, and L. Song, "Variational reasoning for question answering with knowledge graph," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, pp. 60696076, 2018. https://doi.org/10.1609/aaai.v32i1.12057doi:[[[10.1609/aaai.v32i1.1]]]
9 I. Chami, A. Wolf, D. C. Juan, F. Sala, S. Ravi, and C. Re, "Low-dimensional hyperbolic knowledge graph embeddings," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 6901-6914. https://doi.org/10.18653/v1/2020.acl-main.617doi:[[[10.18653/v1/2020.acl-main.617]]]
10 Q. Wang, Z. Mao, B. Wang, and L. Guo, "Knowledge graph embedding: a survey of approaches and applications," IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 12, pp. 2724-2743, 2017. https://doi.org/10.1109/TKDE.2017.2754499doi:[[[10.1109/TKDE.2017.2754499]]]
11 M. Chen, W. Zhang, W. Zhang, Q. Chen, and H. Chen, "Meta relational learning for few-shot link prediction in knowledge graphs," 2019 (Online). Available: https://arxiv.org/abs/1909.01515.doi:[[[https://arxiv.org/abs/1909.01515]]]
12 W. Xiong, M. Yu, S. Chang, X. Guo, and W. Y . Wang, "One-shot relational learning for knowledge graphs," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 1980-1990. https://doi.org/10.18653/v1/D18-1223custom:[[[-]]]
13 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, "Translating embeddings for modeling multi-relational data," Advances in Neural Information Processing Systems, vol. 26, pp. 27872795, 2013. https://dl.acm.org/doi/10.5555/2999792.2999923doi:[[[https://dl.acm.org/doi/10.5555/2999792.2999923]]]
14 Z. Wang, J. Zhang, J. Feng, and Z. Chen, "Knowledge graph embedding by translating on hyperplanes," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28, no. 1, pp. 1112-1119, 2014. https://doi.org/10.1609/aaai.v28i1.8870doi:[[[10.1609/aaai.v28i1.8870]]]
15 Y . Lin, Z. Liu, M. Sun, Y . Liu, and X. Zhu, "Learning entity and relation embeddings for knowledge graph completion," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, pp. 2181-2187, 2015. https://doi.org/10.1609/aaai.v29i1.9491doi:[[[10.1609/aaai.v29i1.9491]]]
16 G. Ji, S. He, L. Xu, K. Liu, and J. Zhao, "Knowledge graph embedding via dynamic mapping matrix," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (volume 1: Long papers), Beijing, China, 2015, pp. 687-696. https://doi.org/10.3115/v1/P15-1067custom:[[[-]]]
17 M. Nickel, V . Tresp, and H. P. Kriegel, "A three-way model for collective learning on multi-relational data," in Proceedings of the 28th International Conference on International Conference on Machine Learning, Bellevue, WA, USA, 2011, pp. 809-816. https://dl.acm.org/doi/10.5555/3104482.3104584doi:[[[https://dl.acm.org/doi/10.5555/3104482.3104584]]]
18 B. Yang, W. T. Yih, X. He, J. Gao, and L. Deng, "Embedding entities and relations for learning and inference in knowledge bases," in Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 2015. https://doi.org/10.48550/arXiv.1412.6575doi:[[[10.48550/arXiv.1412.6575]]]
19 T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, "Complex embeddings for simple link prediction," in Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016, pp. 2071-2080. https://dl.acm.org/doi/10.5555/3045390.3045609doi:[[[https://dl.acm.org/doi/10.5555/3045390.3045609]]]
20 C. Zhang, H. Yao, C. Huang, M. Jiang, Z. Li, and N. V . Chawla, "Few-shot knowledge graph completion," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 3, pp. 3041-3048, 2020. https://doi.org/10.1609/aaai.v34i03.5698doi:[[[10.1609/aaai.v34i03.5698]]]
21 J. Sheng, S. Guo, Z. Chen, J. Yue, L. Wang, T. Liu, and H. Xu, "Adaptive attentional network for few-shot knowledge graph completion," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 1681-1691. https://doi.org/10.18653/v1/2020.emnlp -main.131doi:[[[10.18653/v1/2020.emnlp-main.131]]]
22 Z. Jiang, J. Gao, and X. Lv, "MetaP: meta pattern learning for one-shot knowledge graph completion," in Proceedings of the 44th international ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021, pp. 2232-2236. https://doi.org/10.1145/3404835.3463086doi:[[[10.1145/3404835.3463086]]]
23 X. Lv, Y . Gu, X. Han, L. Hou, J. Li, and Z. Liu, "Adapting meta knowledge graph information for multi-hop reasoning over few-shot relations," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), Hong Kong, 2019, pp. 3376-3381. https://doi.org/10.18653/v1/D19-1334custom:[[[-]]]
24 D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," 2014 (Online). Available: https://arxiv.org/abs/1412.6980v1.doi:[[[https://arxiv.org/abs/1412.6980v1]]]

Dataset	Number of relation	Number of entity	Number of triples
NELL-One	358	68,545	181,109
Wiki-One	822	4,838,244	5,859,240

	MRR		Hits@10		Hits@5		Hits@1
	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot	1-shot	5-shot
NELL-One
TransE	0.105	0.168	0.226	0.345	0.111	0.186	0.041	0.082
TransH	0.168	0.279	0.233	0.434	0.160	0.317	0.127	0.162
DistMult	0.165	0.214	0.285	0.319	0.174	0.246	0.106	0.140
ComplEx	0.179	0.239	0.299	0.364	0.212	0.253	0.112	0.176
Gmatching	0.185	0.201	0.313	0.311	0.260	0.264	0.119	0.143
MetaR(Pre-Train)	0.164	0.209	0.331	0.355	0.238	0.280	0.093	0.141
MetaR (In-Train)	0.250	0.261	0.401	0.437	0.336	0.350	0.170	0.168
FSRL	-	0.184	-	0.272	-	0.234	-	0.136
FAAN	-	0.279	-	0.428	-	0.364	-	0.200
Ours (Pre-Train)	0.238	0.245	0.379	0.371	0.293	0.313	0.175	0.173
Ours (In-Train)	0.308	0.315	0.476	0.522	0.403	0.428	0.210	0.209
Wiki-One
TransE	0.036	0.052	0.059	0.090	0.024	0.057	0.011	0.042
TransH	0.068	0.095	0.133	0.177	0.060	0.092	0.027	0.047
DistMult	0.046	0.077	0.087	0.134	0.034	0.078	0.014	0.035
ComplEx	0.055	0.070	0.100	0.124	0.044	0.063	0.021	0.030
Gmatching	0.200	-	0.336	-	0.272	-	0.120	-
MetaR(Pre-Train)	0.314	0.323	0.404	0.418	0.375	0.385	0.266	0.270
MetaR (In-Train)	0.193	0.221	0.280	0.302	0.233	0.264	0.152	0.178
FSRL	-	0.158	-	0.287	-	0.206	-	0.097
FAAN	-	0.341	-	0.436	-	0.395	-	0.281
Ours (In-Train)	0.226	0.235	0.305	0.331	0.268	0.280	0.186	0.188
Ours (Pre-Train)	0.331	0.352	0.421	0.449	0.387	0.410	0.283	0.306

	MRR	Hits@10	Hits@5	Hits@1
ATMR	0.352	0.449	0.410	0.306
MRL	0.332	0.457	0.394	0.277
MRL-ATT	0.323	0.418	0.385	0.27
MAML	0.306	0.389	0.363	0.262
w/o TransD	0.347	0.442	0.392	0.298
w/o TransH	0.315	0.407	0.375	0.269

Relations	Number of candidates	MRR		Hits@10
Relations	Number of candidates	ATMR	MetaR	ATMR	MetaR
sportsGameSport	123	0.972	0.972	0.986	0.971
athleteInjuredHisBodypart	299	0.232	0.311	0.375	0.422
animalSuchAsInvertebrate	786	0.201	0.197	0.559	0.541
automobilemakerDealersInCountry	1,084	0.596	0.532	0.879	0.813
sportSchoolIncountry	2,100	0.475	0.549	0.653	0.694
politicianEndorsesPolitician	2,160	0.148	0.139	0.341	0.333
agriculturalProductFromCountry	2,222	0.100	0.102	0.200	0.222
producedBy	3,174	0.640	0.525	0.880	0.716
automobilemakerDealersInCity	5,716	0.137	0.187	0.283	0.306
teamCoach	10,569	0.403	0.398	0.619	0.560
geopoliticalLocationOfPerson	11,618	0.226	0.188	0.261	0.254

Making articles easier to read in PMC

Welcome to PubReader!

Article Information

Abstract

1. Introduction

2. Preliminaries

2.1 Problem Formulation

(1)

(2)

(3)

3. Methodology

3.1 Generalized Representations for Few-Shot Relations

(4)

(5)

(6)

(7)

(8)

3.2 Triple Modeling

(9)

(10)

3.3 Optimization and Testing

(11)

(12)

(13)

4. Experiments

4.1 Datasets

4.2 Baselines

4.3 Implementation Details

4.4 Results

4.5 Ablation Study

4.6 Factors That Affect ATMR’s Performance

4.7 Case Study

5. Conclusion

Conflict of Interest

Funding

Biography

Yuanxia Zhang

Biography

Hua Li

Biography

Yu Chen

Biography

Daoqing Gong

References