语料的标注与句法结构的提取.ppt

上传人:本田雅阁 文档编号:2922973 上传时间:2019-06-06 格式:PPT 页数:29 大小:615.02KB
返回 下载 相关 举报
语料的标注与句法结构的提取.ppt_第1页
第1页 / 共29页
语料的标注与句法结构的提取.ppt_第2页
第2页 / 共29页
语料的标注与句法结构的提取.ppt_第3页
第3页 / 共29页
语料的标注与句法结构的提取.ppt_第4页
第4页 / 共29页
语料的标注与句法结构的提取.ppt_第5页
第5页 / 共29页
点击查看更多>>
资源描述

《语料的标注与句法结构的提取.ppt》由会员分享,可在线阅读,更多相关《语料的标注与句法结构的提取.ppt(29页珍藏版)》请在三一文库上搜索。

1、语料的标注与句法结构的提取,王金铨 ,Part I 语料的标注 Part II 句法结构提取,Part I 语料的标注,1. What is annotation? 2. How to do it?,Annotation of corpora,Annotation: The process of making explicit linguistic categories implicit within a corpus text, for example, by adding layers of information on the grammatical classes of words,

2、or on the classes of speech acts which have taken place in the course of the transcribed speech, or the classes of errors learners made in writing. (Edwards 1995 : 20).,A. Part-of-speech tagging B. Syntactic annotation C. Semantic annotation D. Discourse annotation E. Pragmatic annotation,POS-Taggin

3、g,- also known as grammatical tagging - divides words into categories, based on how they can be combined to form sentences - most common used form of corpus annotation,Nowadays , it is fashionable to speak of a generation gap . The parents complain that children are self-centered and do not show the

4、m proper respect and obedience , while children are complaining that parents do not understand them . How does the generation gap form ?,How to do it? manually computer-assisted fully automatic,computer-assisted annotation,Annotool,Fully automatic annotation,CLAWS Constituent Likelihood Automatic Wo

5、rd-tagging System developed by UCREL (University Centre for Computer Corpus Research on Language) at Lancaster POS-tagger for English exists since early 1980s has several tagsets,Tagset variation,Fully automatic annotation,Go tagger,When_WRB we_PRP are_VBP born_VBN ,_, the_DT education_NN our_PRP$ p

6、arents_NNS give_VBP us_PRP is_VBZ to_TO learn_VB how_WRB to_TO speak_VB and_CC how_WRB to_TO recognize_VB them_PRP ._. It_PRP is_VBZ a_DT basic_JJ education_NN and_CC we_PRP start_VBP to_TO face_VB the_DT colorful_JJ world_NN ._. The_DT education_NN is_VBZ very_RB important_JJ which_WDT influences_N

7、NS children_NNS s_POS nature_NN ._. According_VBG to_TO that_IN ,_, education_NN gives_VBZ the_DT first_JJ step_NN to_TO people_NNS and_CC influences_NNS them_PRP gradually_RB ._.,Part II 动词被动结构提取,1. 动词被动结构的概念 2.动词被动结构提取,动词被动结构的概念: (passive constructions of verbs),被动结构的种形式: long passive (with by) sh

8、ort passive (without by),(LGSWE),语料库研究发现(LGSWE) :,SP are predominant in all syntactic positions in English. Be-passives sharply differ by register, with conversation and academic prose at the opposite poles. LP are most common in news and academic prose.,动词被动结构提取,研究问题: 1、中国学生书面语中使用被动结构的情况如何?与英语本族语者有

9、何不同? 2、中国学生英语书面语和口语在被动结构上存在何种差异? 3、中国学生书面语中的被动结构是否随二语水平的提高而发生变化?,回答问题一: 提取中国学生书面语中被动结构,提取英语本族语者的被动结构,进行对比. 回答问题三: 提取中国学生1-4年级书面语中的被动结构观察发展趋势.,练习运用CONCORD 单独提取某个被动结构:,动词+过去分词被动结构:(V+PP) 例如:1) be forced (to do) 2) Be supported (by) 3) Be discussed 结构编码:* ,代码的含义?,代表be动词 : 代表任何动词的过去分词 如: 表示过去分词been,第一组:

10、 中国学生作文 本族语书面语,第二组: 中国学生作文 中国学生口语,练习提取:,(V+PP)结构统计结果(万分率):,书面语中,中国学生与美国学生在被动语态使用上差异巨大。,(V+PP)结构统计结果:,中国学生口语中的被动结构比书面语中要少,被动结构在口笔语中的分布基本合理。,(V+PP)结构统计结果:,呈现逐年递减的总趋势,但有变异。,(V+PP)结构统计结果:,外国L2学生比中国学生高,但低于英语本族语者。, by 例如:be affected by * 例如:be treated as,练习单独提取 “带by的被动结构”,练习批量提取被动结构:,* * * * * * * * * ,Thank You,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1