首页/文章/ 详情

使用Transformers确定句子之间的相似度

2年前浏览1237

1 引言

在上一篇文章中,呈现了使用WMD(Word Moving Distance)---单词移动距离确定句子相似度的方法。这种方法基于Doc2Vec的训练向量,效果比单纯使用余弦计算要好。 这篇文章使用一种新的方法Transformers确定句子之间的相似度。


2 Transformers的工作原理

Transformers首先需要建立model, 于WMD不同的是,这个model不是依靠自身的corpus来训练的,而是基于一些预训练的模型。


model = SentenceTransformer('my corpus-model')

Trained on NLI data






bert-base-nli-mean-tokensbert-large-nli-mean-tokensroberta-base-nli-mean-tokensroberta-large-nli-mean-tokensdistilbert-base-nli-mean-tokens

Trained on STS data






bert-base-nli-stsb-mean-tokensbert-large-nli-stsb-mean-tokensroberta-base-nli-stsb-mean-tokensroberta-large-nli-stsb-mean-tokensdistilbert-base-nli-stsb-mean-tokens

这些模型需要先下载才能使用。

读取自己的文件:



with open('corpus-pfc.txt','r', encoding='utf-8') as outfile:    _c = outfile.read()

转换文本文件成为列表文件


corpus=[i for i in _c.split('\n')if i != ''and len(i.split(' '))>=4]

对每一个句子获取矢量


corpus_embeddings = model.encode(corpus)

查询语句获取矢量



queries = ['PFC2D PFC3D slope stability simulation']query_embeddings = model.encode(queries)

返回相似句子




for query, query_embedding in zip(queries, query_embeddings):    distances = scipy.spatial.distance.cdist( \        [query_embedding], corpus_embeddings, "cosine")[0]


3 Transformers计算结果

使用corpus-pfc.txt(E:\Geotech\mydata)作为corpus, 这个文档是上一篇文章产生的一个经过优化的PFC数据集。查询句子仍然如下:


query = 'PFC2D PFC3D slope stability simulation'

Top10 相似结果如下:

PFC2D PFC3D slope stability simulation (Similarity: 1.00)

PFC2D PFC3D slope stability (Similarity: 0.89)

slope instability, pfc2d, numerical simulation, parallel bond model, stability analysis (Similarity: 0.84)

PFC2D rock slopes stability simulation (Similarity: 0.81)

General two-dimensional slope stability analysis (Similarity: 0.81)

Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level. (Similarity: 0.80)

"System reliability analysis of slope stability using generalized Subset Simulation". (Similarity: 0.80)

Application of distinct element analysis in slope stability problems (Similarity: 0.78)

Fluid coupling in PFC2D and PFC3D (Similarity: 0.78)

'Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level.' (Similarity: 0.82) (Similarity: 0.77)


使用WMD进行相似查询,得出的Top 10相似结果如下:

PFC2D PFC3D slope stability simulation (Similarity: 1.00)

PFC2D Simulation on Stability of Loose Deposits Slope in Highway Cutting Excavation (Similarity: 0.99)

PFC2D PFC3D slope stability (Similarity: 0.98)

The PFC3D simulation platform was employed to calcaulate the single-hole blasting processes with different heights,buried depths and charge amounts in the open mine slope,and the slope stability after blasting was discussed. (Similarity: 0.98)

Simulation and analysis of the earthquake stability of the tailing reservoir based on PFC3D (Similarity: 0.98)

"Study on the similar materials simulation of the slope stability of the west-l zone in Luming Molybdenum Mine". (Similarity: 0.98)

NUMERICAL SIMULATION OF A FILLED SLOPE STABILITY ON SOFT SOIL ROADBED REINFORCED BY GRAVEL PILE USING PFC2D (Similarity: 0.98)

'DEM simulation pfc2d slope', (Similarity: 0.84) (Similarity: 0.98)

"The Numerical Simulation on the Stability of Steep Rock Slope by DDA". (Similarity: 0.97)

We show by simulation that the proposed robot model can walk down a slope passively and also verify the stability of this walking by calculating the eigenvalues of the Jacobian of the Poincare map. (Similarity: 0.97)


二者比较,可以发现,Transformers的结果更好一些。


4 Transformers聚类

Transformers能够实现聚类,通过输入sklearn模块:


from sklearn.cluster import KMeans

下面是聚类后其中的一个结果,通过词频统计我们发现这个聚类的主题是 "Failure"。 聚类能帮助我们集中关注某一类论题。

'3-D Granular Simulation on the Process of Slope Failure and Collapse', 

'Failure process simulation of sliding unstable rock based on PFC2D', 

'rock slope; step-path failure; rock bridge; slope stability; PFC2D', 

'Similar to slope stability failure', 

'The effect of discontinuity Persistence an Rock Slope Stability', 

'slope stability 1; wedge failure', 

'Jointed rock slope Step-path failure Rock bridges Slope stability PFC', 

'rock slope step-path failure rock bridge slope stability after blasting was discussed.'


5 结束语

本文使用Transformers确定句子之间的相似度。结果发现,Transformers得出的结果优于WMD得出的结果,同时,Transformers的聚类能帮助我们集中关注某一类论题。今后将继续开发Tranformers的功能。


本文相似文档


来源:计算岩土力学
SystemUMPFC
著作权归作者所有,欢迎分享,未经许可,不得转载
首次发布时间:2022-09-28
最近编辑:2年前
计算岩土力学
传播岩土工程教育理念、工程分析...
获赞 144粉丝 1047文章 1776课程 0
点赞
收藏
未登录
还没有评论
课程
培训
服务
行家
VIP会员 学习 福利任务 兑换礼品
下载APP
联系我们
帮助与反馈