首页/文章/ 详情

文本摘要生成的确定过程和随机过程

2年前浏览1344

1. 引言

在先前的一篇笔记中【一个快速的句子和段落相似查询方法】讨论了句子相似度检查的方法。为了最大程度地获取相关信息,我们对一个查询句子的关键词进行组合计算,然后分别对每一个组合在数据集中查询。假定一个句子的关键词有n个,一个相对有效的组合算法建议为C(n,3), 也就是取3个关键词作为一个组合,2个关键词的限定范围太宽,而且计算的组合数会显著增多,导致查询时间太长。当查询结果出来后,我们希望确定出最值得阅读的东西,这就涉及到文本摘要的生成方法。这个笔记讨论本文摘要生成时出现的一些问题。


2. 摘要生成的方法

本文只讨论所谓的“抽取式摘要”的生成方法。这种方法不改变原文句子的结构,只是按照文本中句子的相对权重来生成摘要,类似于google网页排名的TextRank算法。TextRank是一种无监督算法,参考Federico Barrios等人(2015)的文章。【Variations of the Similarity Function of TextRank for Automated Summarization.  Abstract: This article presents new alternatives to the similarity function for the TextRank algorithm for automatic summarization of texts. We describe the generalities of the algorithm and the different functions we propose. Some of these variants achieve a significative improvement using the same metrics and dataset as the original publication. 】在本实验中,我们使用了两个不同的模块来实现,一个是sumy, 另一个是gensim。试验文本是《一个快速的句子和段落相似查询方法》中的例子,使用关键词“pfc2d slope stability pfc3d”查询出来的结果,结果保存在pfctext.txt文件中。我们对每种方法都选择生成5-6个句子作为摘要。


2.1 gensim生成摘要

使用下面代码产生一个ratio=0.3的摘要, 在目前的情况下对应于6个句子。gensim产生的摘要如下所示。


(1) PFC2D Simulation on Stability of Loose Deposits Slope in Highway Cutting Excavation

(2) PFC2D (particle flow code in two dimensions) is applied to simulate the stability of loose deposits slope during highway cutting excavation.

(3) To understand the slope stability of steep inclined seam with horizontal fracture and the relationship between the internal damage and the external form, the process of spontaneous combustion are simulated based on PFC3D under particle flow theory.

(4) To understand the slope stability with steeply inclined seam and horizontal fracture in the process of earthquake,which is simulated on the basis of PFC3D according to particle flow theory.

(5) In order to understand the effects of the different slope dips on the slope stability, we have made simulated slopes based on the PFC3D in accordance with the particle flow theory.

(6) Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level.


2.2 sumy生成摘要

使用下面的代码产生5个句子的摘要

(1) The paper is inclined to make a simulated study in hoping to identify the regularity and clarify the slope stability after the slope being exploded with a steep inclined seam or horizontal fracture,based on PFC3D of particle flying theory.

(2) To understand the slope stability of steep inclined seam with horizontal fracture and the relationship between the internal damage and the external form, the process of spontaneous combustion are simulated based on PFC3D under particle flow theory.

(3) To understand the slope stability with steeply inclined seam and horizontal fracture in the process of earthquake,which is simulated on the basis of PFC3D according to particle flow theory.

(4) The obtained factor of safety for the stability of dam slope from PFC2D analysis is higher than expectation mainly because the micro-parameters in PFC2D affect many macro-parameters simultaneously and the relation between micro-parameters and macro-parameters are nonlinear, which well fits engineering practice.

(5) Then the particle discrete element software PFC2D is used to simulate the stability of slope excavation from the meso-mechanical level.


2.3 gensim与sumy的结果比较

从外形上看,sumy产生的结果更像一个摘要,从开始的The paper is inclined to 到中间的to understand the slope stability 再到最后的then, 有点一般摘要写作格式的意思。sumy确实结合了一些自然语言处理方面的处理技巧,我们将继续改进这个模块的算法 。  


3. 使用马尔科夫链随机生成句子

抽取式摘要的所有句子都是从原文中原封不动地取出来的,一个更有趣的想法是在这个摘要中能产生一些随机生成的句子。于是想到了马尔可夫链(Markov chain)。马尔可夫链在状态空间中可以从一个状态到另一个状态随机转换,下一状态的概率分布只由当前状态决定。利用上面sumy生成的摘要作为初始文件,让它随机生成5个句子。

因为马尔可夫链是一个随机过程,所以对于一个长文本来说,每次运行的结果应该是不一样的。取原始产生的文件pfctest.txt作为研究对象。

运行结果(1)

PFC3D-based research on the PFC3D in accordance with the particle discrete element software PFC2D is used to model certain slope Coal Mine.

To understand the slope stability with steeply inclined seam and horizontal fracture and the relation between micro-parameters and macro-parameters are nonlinear, which well fits engineering practice.

The obtained factor of safety for the stability of open-pit slope in residual coal spontaneous combustion

rock slope step-path failure rock bridge slope stability with steeply inclined seam with horizontal fracture in the process of spontaneous combustion are simulated based on PFC3D

PFC3D-based research on the PFC3D in accordance with the particle discrete element software PFC2D is used to model certain slope Coal Mine.



运行结果(2)

The obtained factor of safety for the stability of slope excavation from the meso-mechanical level.

rock slope step-path failure rock bridge slope stability after blasting was discussed.

To understand the influence of spontaneous combustion of the different slope dips on the PFC3D in accordance with the particle discrete element software PFC2D is used to simulate the stability of open-pit slope in residual coal in open-pit coal mine slope on slope stability, we have made simulated slopes based on the stability of slope excavation from the meso-mechanical level.

Strength Reduction Method for Rock Slope Stability Analysis Based on PFC3D of particle flying theory.

To understand the effects of the different slope dips on the slope stability after the slope stability after blasting was discussed.



行结果(3)

Then the particle flow theory.

To understand the influence of spontaneous combustion are simulated based on PFC3D

Then the particle discrete element software PFC2D is used to model certain slope Coal Mine.

rock slope step-path failure rock bridge slope stability after blasting was discussed.

In order to understand the slope stability with steeply inclined seam with horizontal fracture in the open mine slope,and the slope stability after blasting was discussed.


4. 结束语

本文简单讨论了文本摘要生成的确定过程和随机过程。确定过程使用了Textrank算法,随机过程使用了马尔可夫链。两种方法目前生成摘要的结果都不令人满意,需要进一步研究和改进算法。

来源:计算岩土力学
科普代码&命令PFC试验python
著作权归作者所有,欢迎分享,未经许可,不得转载
首次发布时间:2022-09-27
最近编辑:2年前
计算岩土力学
传播岩土工程教育理念、工程分析...
获赞 147粉丝 1054文章 1776课程 0
点赞
收藏
未登录
还没有评论
课程
培训
服务
行家
VIP会员 学习 福利任务 兑换礼品
下载APP
联系我们
帮助与反馈