## Kai Wang , Wei Pan and Xingzhi Chen## |

Sampling date | No. of communities | Community number | No. of nodes | No. of edges |
---|---|---|---|---|

1.15 | 10 | 105 | 2,629 | 30,156 |

46 | 1,772 | 9,758 | ||

2.15 | 15 | 74 | 3,577 | 24,405 |

13 | 768 | 5,607 | ||

3.15 | 12 | 273 | 3,008 | 16,721 |

144 | 2,794 | 18,630 |

Table 2.

ID | Name | No. of label | Environmental index | Leadership index | Influence index |
---|---|---|---|---|---|

1541 | Tiny Kitty | 4,772 | 0.0672 | 0.1472 | 0.2144 |

2799 | Poetry and piece | 3,580 | 0.0359 | 0.1269 | 0.1628 |

582 | Whistleword | 6,403 | 0.0584 | 0.1037 | 0.1621 |

7890 | Tipsy sunshine V | 2,887 | 0.0617 | 0.0884 | 0.1501 |

According to the identified opinion leaders, the reader community ID is mapped to the dataset of opinion leaders. To address this need, the opinion leader label matrix is constructed, and the LDA topic model is recognized by Pathon language. The calculations show that the model has the least confusion when the number of topics is K=45. After 1,000 iterations, the probability distribution of the label topic-feature of Community 105 is obtained. According to the parameter analysis in Section 4.2, when [TeX:] $$h l_{u_i}=1.75$$ and =0.6, the weight of the interest label can be calculated accurately. Therefore, the above parameters are substituted to calculate the weight of the interest labels of the top-n opinion leaders in the community at moment t, as shown in Table 3.

The topic label matrix and interest label matrix of opinion leaders are substituted to obtain the similarity of opinion leaders under different interest topics. Afterward, the similarity distance of interest labels in different communities can be identified. Table 4 is a partial result of Community 105.

Table 3.

ID | Interest label | Label intensity | Label stability | Interest weight |
---|---|---|---|---|

2799 | movie | 0.075 | 0.133 | 0.098 |

novel | 0.071 | 0.126 | 0.093 | |

346 | essays | 0.063 | 0.123 | 0.083 |

politics | 0.051 | 0.114 | 0.076 | |

1227 | literature | 0.082 | 0.145 | 0.107 |

prose | 0.068 | 0.132 | 0.094 |

Table 4.

Opinion leader ID | Topics with similarities |
---|---|

2799 | topic2/0.855 topic1/0.749 topic16/0.712 topic32/0.629 topic36/0.445 |

582 | topic4/0.747 topic22/0.661 topic44/0.405 topic13/0.388 topic24/0.152 |

7890 | topic20/0.654 topic7/0.582 topic10/0.346 topic38/0.225 topic41/0.187 |

3341 | topic19/0.663 topic5/0.576 topic31/0.542 topic12/0.466 topic38/0.371 |

164 | topic26/0.901 topic4/0.753 topic37/0.694 topic32/0.433 topic22/0.378 |

To verify the rationality of the UIT, the topological dataset of the community recorded for 5 consecutive weeks was named [TeX:] $$D t_1-D t_5$$ based on the sampled data on January 15. Using the five-fold cross-validation method, four of them are successively taken as training datasets, and the rest are taken as test datasets. At the same time, the results of the UIT are compared with those of the TUPRP [3], KCCD [4], BILC [6], and UIM-LDA [7]. Afterward, the mean value of the five results is taken as the final value. The comparison of different algorithms on the F value is shown in Fig. 1.

The following conclusions can be drawn from Fig. 1 in four aspects. First, although the TUPRP has a high recognition quality of readers' new interests on the dataset ([TeX:] $$D_{t 1}, D_{t 2}$$) at the initials of the window, the overall F value of the algorithm is the lowest. The reason is that the social relationship and interest drift between readers are not taken into account. Second, the reader model based on multiple users, including KCCD and BILC, performs better than TUPRP on the F value, but the improvement is limited. The main reason is that although KCCD extracts the multidimensional topic characteristics of readers to realize topic clustering, it does not integrate the behaviors between the reader features into CKM. BILC combines the behavior label with the interest label, which improves the modeling ability on the behavior interest feature, but it ignores the dynamic change in the reader's interest in the time dimension. Therefore, it is easy to distort the interest cluster. Third, UIM-LDA integrates the characteristics of the topic and user interests, which improves the effectiveness of CKM. However, it ignores the dependence between user behavior and community structure, thus cutting off the intrinsic semantic association between different datasets. Finally, the UIT has the highest average F value (0.551) in the five datasets. The main reason is that the UIT can jointly model community structure, reader influence and interest topics in the smoothing time dimension by introducing model parameters, such as reader parameters, interest half-life, and interest intensity factors. Hence, the accuracy and timeliness of group modeling are improved in adjacent time slices.

This paper proposes a reader CKM method (UIT) based on user interest labels by extracting the topicality and timeliness of reader labels. The model combines the dynamic labeling of reader interest with the LDA topic model and obtains the set of interest labels of the reader community by calculating the reader topic similarity distance to realize the knowledge discovery of the reader community.

In general, the following improvements can be obtained in contrast to the state-of-the-art approaches. (1) The UIT combines dynamic labeling of readers' interests with the LDA topic model to characterize the dependence association between implicit topic features of communities and readers' interests, which can dynamically perceive the changes in readers' interest intensity and interest stability to realize the topic description of users' interest in the time dimension. (2) The UIT obtains the interest labels of the reader community by calculating the topic similarity distance at different moments. On this basis, the influence of opinion leaders is used as a feature vector to quantify the environmental attributes and dynamic attributes of readers. Therefore, it is helpful to predict the characteristics of the panoramic community structure in seeking pan scene knowledge graph construction, which is an important and central contribution of this study. The later research direction of this paper can improve the extensibility of the model in community differentiation modeling regarding aspects of community location and communication modeling.

- 1 L. Liu, S. Wang, and Z. Hu, "A literature review on community profiling,"
*Library and Information Service*, vol. 63, no. 23, pp. 122-130, 2019.custom:[[[-]]] - 2 L. Sun, "Approach of multi-source competitive intelligence fragments fusion based on intelligence element similarity," Information Studies: Theory & Application, vol. 41, no. 10, pp. 8-14, 2018.custom:[[[-]]]
- 3 Y . Cai and Q. Li, "Personalized search by tag-based user profile and resource profile in collaborative tagging systems," in
*Proceedings of the 19th ACM International Conference on Information and Knowledge Management*, Toronto, Canada, 2010, pp. 969-978.doi:[[[10.1145/1871437.1871561]]] - 4 S. Ding, N. Wang, and J. C. Wu, "Hot topic detection of Weibo based on keyword co-occurrence and community discovery,"
*Modern Information*, vol. 38, no. 3, pp. 10-18, 2018.custom:[[[-]]] - 5 A. Salehi, M. Ozer, and H. Davulcu, "Sentiment-driven community profiling and detection on social media," in
*Proceedings of the 29th on Hypertext and Social Media*, Baltimore, MD, 2018, pp. 229-237.doi:[[[10.1145/3209542.3209565]]] - 6 R. Wang and W. Zhang, "Behavior and interest labeling construction and application of academic user portraits,"
*Modern Information*, vol. 39, no. 9, pp. 54-63, 2019.custom:[[[-]]] - 7 X. Tang and L. Xie, "Construction and dynamic update of theme-based user interest model," Information Studies: Theory & Application, vol. 39, no. 2, pp. 116-123, 2016.custom:[[[-]]]
- 8 H. Li and Y . Liang, "Time series clustering method with label propagation based on centrality,"
*Control and Decision*, vol. 33, no. 11, pp. 1950-1958, 2018.doi:[[[10.13195/j.kzyjc.2017.0877]]] - 9 F. Chung and A. Tsiatas, "Finding and visualizing graph clusters using PageRank optimization,"
*Internet Mathematics, vol*. 8, no. 1-2, pp. 46-72, 2012.doi:[[[10.1007/978-3-642-18009-5_9]]] - 10 J. Shi, M. Fan, W. L. Li, "Topic analysis based on LDA model,"
*Acta Automatica Sinica*, vol. 35, no. 12, pp. 1586-1592, 2009.doi:[[[10.3724/sp.j.1004.2009.01586]]] - 11 J. Hu and G. Chen, "Mining and evolution of content topics based on Dynamic LDA,"
*Library and Information Service*, vol. 58, no. 2, pp. 138-142, 2014.custom:[[[https://ieeexplore.ieee.org/iel7/6287639/8600701/08580532.pdf]]] - 12 J. A. Nunez, P . M. Cincotta, and F. C. Wachlin, "Information entropy,"
*Celestial Mechanics and Dynamical Astronomy, vol*. 64, pp. 43-53, 1996.custom:[[[https://link.springer.com/article/10.1007/BF00051604]]] - 13 G. Zhu and L. Zhou, "Hybrid recommendation based on forgetting curve and domain nearest neighbor,"
*Journal of Management Sciences in China*, vol. 15, no. 5, pp. 55-64, 2012.custom:[[[-]]] - 14 Y . Liu, K. Wang, and Y . Liu, "Online recognition approach for opinion leaders using influence heredity," Information Studies: Theory & Application, vol. 42, no. 7, pp. 126-131, 2019.custom:[[[-]]]
- 15 S. Zhu and X. Jiang, "Analysis of Literature obsolescence for humanities and social sciences journals based on CSSCI data,"
*Journal of the China Society for Scientific and Technical Information*, vol. 36, no. 10, pp. 1031-1037, 2017.custom:[[[-]]] - 16 W. Meng and J. Pang, "Application of Pajek in visualization of coauthored networks in information science," Information Studies: Theory & Application, vol. 31, no. 4, pp. 573-575, 2008.custom:[[[-]]]