首页/文章/ 详情

通过Euclidean距离计算向量值来对句子相似度排序

2年前浏览1381

1. 引言

句子和段落的相似度检查是自然语言处理中一个非常重要的研究课题,在《一个快速的句子和段落相似查询方法》一文中,我们提出一种方法从大数据集中查找相似的句子,这种方法首先把一个句子分词,然后通过分词之间的组合来进行句子匹配,最后得出的结果是基于被查找文本的自然顺序排列的。本笔记在此基础上对这个生成的corpus进行相似度排序。


2. 相似度排序算法

相似度排序的算法有很多种,在此没有必要逐一介绍。本文直接使用了sklearn库,首先导入模块:



from sklearn.feature_extraction.text import CountVectorizerfrom sklearn.metrics.pairwise import euclidean_distances

然后进行矢量化运算,



vectorizer = CountVectorizer()features = vectorizer.fit_transform(corpus).todense()

接着按照欧氏距离计算句子集中的每个句子与首句,也就是目标句子的相似度,sim值越小,说明相似度越大。





similarity = []for f in features:    sim = euclidean_distances(features[0],f)[0][0]    similarity.append(sim)

然后对结果按照欧氏距离从小到大进行排序。


sort_sims = sorted(enumerate(similarity), key=lambda item: item[1])

最后对所有结果进行输出,当然也可以选择Top 5 或Top 10.


3. 试验结果

这个试验例子是关于岩石边坡稳定性的,corpus使用《一个快速的句子和段落相似查询方法》中的算法得出,我们要计算“Joint Rock Slope Design and Analysis”与这个corpus内所有句子的相似度,运行上面的代码,现在要对其进行相似度排序,得出的结果如下所示:





















(1) 0.0 Joint Rock Slope Design and Analysis(2) 1.73 rock slope stability analysis and slope design(3) 2.83 General two-dimensional slope stability analysis(4) 3.0 Slope Stability In Surface Mining(5) 3.16 Step Path Rock Bridge Percentage for Analysis of Slope Stability(6) 3.16 Rock Slope Stability in Open Pit Mining and Civil Engineering(7) 3.16 how rock bridges may be incorporated into slope stability analysis(8) 3.32 The effect of discontinuity Persistence on Rock Slope Stability(9) 3.46 Prediction of step path failure geometry for slope stability analysis(10) 3.46 A DEM analysis of step-path failure in jointed rock slopes(11) 3.61 determine relative slope stability based simply on site observations(12) 3.61 Application of SMR method to highway slope stability classification(13) 3.74 Modifications to the RMR-SRM system for slope stability evaluation(14) 3.87 several rock mass classification systems developed for rock slope stability assessment are evaluated(15) 4.24 This paper discusses this issue with particular focus on limit equilibrium analysis of rock slope stability.(16) 4.24 Methods for analysis of slope stability are described and are illustrated by examples in the appendixes(17) 4.24 Incorporation of these data into various rock slope stability numerical modeling methods highlights a complex failure mechanism(18) 4.58 Impacts of material property variability and spatial variability on slope stability were analyzed using Monte Carlo simulation(19) 4.58 This study was made with mining situations in mind and showed the beneficial effects of curvature on slope stability(20) 4.58 DEM analysis of rock bridges and the contribution to rock slope stability in the case of translational sliding failures

接下来我们把上一个笔记中的PFC2D的边坡稳定性分析查询结果加进去,然后改变相似目标句子为:"DEM PFC2D slope stability", 其运行结果如下所示。





















(1) 0.0 DEM PFC2D slope stability(2) 2.24 Slope Stability In Surface Mining(3) 2.45 General two-dimensional slope stability analysis(4) 2.45 PFC2D Simulation on Stability of Loose Deposits Slope.(5) 2.65 rock slope stability analysis and slope design(6) 2.83 Joint Rock Slope Design and Analysis(7) 2.83 loose deposit slope, pfc2d, seismic wave, microscopic, failure, simulation, stability(8) 2.83 slope instability, pfc2d, numerical simulation, parallel bond model, stability analysis(9) 3.0 determine relative slope stability based simply on site observations(10) 3.0 The effect of discontinuity Persistence on Rock Slope Stability(11) 3.0 Application of SMR method to highway slope stability classification(12) 3.0 Strength Reduction Method for Rock Slope Stability Analysis Based on PFC2D(13) 3.16 Prediction of step path failure geometry for slope stability analysis(14) 3.16 Step Path Rock Bridge Percentage for Analysis of Slope Stability(15) 3.16 Rock Slope Stability in Open Pit Mining and Civil Engineering(16) 3.16 how rock bridges may be incorporated into slope stability analysis(17) 3.16 Modifications to the RMR-SRM system for slope stability evaluation(18) 3.16 rock slope step-path failure rock bridge slope stability PFC2D(19) 3.16 rock slope;step-path failure;rock bridge;slope stability;PFC2D(20) 3.16 PFC2D Simulation on Stability of Loose Deposits Slope in Highway Cutting Excavation


4. 结束语

本文简要描述了使用sklearn库计算句子之间相似度的方法。欧氏距离计算相似度的算法有一定缺陷,以后考虑增加其它的算法。除了对相似度算法进行改进之外,还有一个改进是增加循环,遍历目录下的所有文档然后进行排序 ,这个工作应该很容易完成。



来源:计算岩土力学
科普代码&命令PFC试验
著作权归作者所有,欢迎分享,未经许可,不得转载
首次发布时间:2022-09-27
最近编辑:2年前
计算岩土力学
传播岩土工程教育理念、工程分析...
获赞 145粉丝 1051文章 1776课程 0
点赞
收藏
未登录
还没有评论
课程
培训
服务
行家
VIP会员 学习 福利任务 兑换礼品
下载APP
联系我们
帮助与反馈