语料库研究中的主题词分析方法及其扩展.ppt

上传人:本田雅阁 文档编号:3306049 上传时间:2019-08-10 格式:PPT 页数:21 大小:831.54KB
返回 下载 相关 举报
语料库研究中的主题词分析方法及其扩展.ppt_第1页
第1页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第2页
第2页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第3页
第3页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第4页
第4页 / 共21页
语料库研究中的主题词分析方法及其扩展.ppt_第5页
第5页 / 共21页
点击查看更多>>
资源描述

《语料库研究中的主题词分析方法及其扩展.ppt》由会员分享,可在线阅读,更多相关《语料库研究中的主题词分析方法及其扩展.ppt(21页珍藏版)》请在三一文库上搜索。

1、语料库研究中的 主题词分析方法及其扩展,中国外语教育研究中心 梁茂成,An extension to the keyword approach in corpus analysis,主要内容,Keywords Applications of corpus comparison Limitations to the keyword approach Keywords+ Demo,Keywords,Keywords: Keywords are words whose frequency is unusually high (or low) in comparison with some norm.

2、 (Scott, 2003),Keywords,Positive keywords: Words which occur more often than would be expected by chance in comparison with the reference corpus.,Keywords,Negative keywords: Words which occur less often than would be expected by chance in comparison with the reference corpus.,Keywords,Positive and n

3、egative keywords In a corpus of business English, words such as business, profit and companies are likely to be positive keywords if the corpus is to be compared with a general corpus.,Keywords,Positive and negative keywords In a corpus of academic English, words such as morning, afternoon and eveni

4、ng are likely to be negative keywords if the corpus is to be compared with a general corpus.,Keywords,Calculating keyness (Rayson et al. 2004, Oakes 1998) Chi-square,Keywords,Chi-square,Keywords,Chi-square with Yates correction,Keywords,Loglikelihood References: http:/ucrel.lancs.ac.uk/llwizard.html

5、,Keywords,Previous research has revealed that loglikelihood is a better measure than chi-square when comparing word frequencies in corpora.,Keywords,Ways to find keywords: Top-down: corpus-based Buttom-up: corpus-driven,Applicatons of,Comparison across users Comparison across genres Comparison acros

6、s times Comparison across (varieties of) languages,Applicatons of,Compiling a specialized dictionary Detecting the topic Genre analysis Contrastive Interlanguage Analysis ,Limitations to,Keywords: Do keywords have to be single words? Phraseology seems more interesting! Do keywords have to be lexical

7、 words? POS tag sequences may also be interesting. Can we bring together the bottom-up approach and the top-down approach?,Limitations to,Top-down: the problem is I do not yet know what may be interesting.,Limitations to,Buttom-up: the problem is that I have been given a long list of keywords, only some of which are interesting, buried among many others which do not seem interesting at all.,Keywords+,Support multiword sequences Support online search Support POS tag sequences Support regex search,Demo,demo,Thank you.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1