1、静态的表达 与 动态的激活,董振东 WWW 清华 2007-12,提纲,开场白 - 知网不是什么? 知网系统的概貌 知网的创新点 结语,开场白 - 知网不是什么?(1),在中文方面,也已有了一个类似词汇网路的资源,叫做知网(HowNet, http:/)。知网做法的特色是独树一帜;不采用英文词汇网路的架构只要采取他自己的架构。而且他先把世界知识本体做个定义,在这定义里再去做区分。这个由上而下的方法,与英语与欧语词汇网路由下而上的方法不同,当然有其可取之处。可惜的是,由于当年资源与讯息的限制,并未与世界相关的研究接轨。基本上跟其他语言的词汇网路连接,并无架构上的基础,而其上层知识分类,也是两人

2、的自由心证,不能说错,却也缺乏理论的基础,面临一些其他系统互通性(inter-operability)的问题。,开场白 -知网不是什么?(2),近年他在另外的场合又说: “HowNet is a database/network of semantic relationships among Chinese words. Conceptually its similar to WordNet of English, but the author claims they differ substantially. For one thing, HowNet is NOT free. Well,

3、they are making words A-D free for download, as a teaser.”,开场白 -知网不是什么?(3),知网不是语义词典、义类词典、概念词典、英汉双语词典 知网不是词典 知网不是汉化的WordNet、不是WordNet的中文代用品 知网不是语言学研究的产物,知网系统的概貌,数据统计 系统组成,数据统计,Chinese character 7152 Chinese word & expression 92159 English word & expression 86141 Chinese meaning 106591 English meaning

4、 106731 Definition 27877 Record 172097,知网的创新点,理论创新 知识获取和表达创新 知网的知识力量,理论创新,知识论 事件类概念间关系 双轴论,知识获取和表达创新,义原的获取和选择 义原的组织和分类体系的建构 用结构化语言(KDML)来定义概念,定义由两种语言词语表示的概念,义原的获取和选择,Sememes 2090 Entity 150 thing (physical, mental, fact) component (part, fitting) time space (direction, location) Event (relation, sta

5、te; action) 810 Attribute 245 AttributeValue 885 Secondary feature 121,义原的组织和分类体系的建构,实体 Entity 事件 Event 属性 Attribute 属性值 AttributeValue 次要特征 Secondary features 事件角色 Event roles 事件角色的典型演员 Typical actors of event roles 公理关系与角色转换 Axiomatic relations and role shifting 反义义原对 Antonymous sememe pairs 对义义原对

6、 Converse sememe pairs,知网中概念的定义 (1),Concept definitions in HowNet “buy” 1. GiveAsGift|赠:manner=guilty|有罪, purpose=entice|勾引 2. buy|买 Cf. Synset definition in WordNet “buy” 1. buy, purchase (obtain by purchase;) 2. bribe, corrupt, buy, make grease palm (make illeagal payment),知网中概念的定义 (2),Concept def

7、initions in HowNet “buyer” human|人:domain=commerce|商业,buy|买:agent= Cf. Synset definition in WordNet “buyer” buyer, purchaser, emptor, vendee (a person who buys) 哪个 “buy”? - 在 WordNet中是歧义的; 但在 HowNet中是没有歧义的,知网的知识力量 动态的激活,知网常识推理举例 概念相似度计算 概念相关关系的建立,知网常识推理举例,Can a doctor walk? 下列句子的省略如何推导的? “我在南京买了几本很好

8、的词典,到家发现全都丢了。” - 谁丢?丢什么?,Can a doctor walk? (1),1. “doctor”的定义 DEF=human|人:HostOf=Occupation|职位, domain=medical|医,doctor|医治:agent= 2. “entity”义原分类体系表 AnimalHuman|动物 animate|生物:HostOf=Sex|性别,AlterLocation|变空间位置:agent=,StateMental|精神状态:experiencer= human|人 AnimalHuman|动物:HostOf=Name|姓名Wisdom|智慧Ability

9、|能力,think|思考:agent=,speak|说:agent=,Can a doctor walk? (2),3. “event”义原分类体系表 AlterLocation|变空间位置 SelfMove|自移 SelfMoveInManner|方式性自移 roam|流浪 walk|走,公理关系与角色转换 - 1,我在南京买了几本很好的词典,到家发现全都丢了。 buy|买 obtain|得到 consequence; agent OF buy|买=possessor OF obtain|得到; possession OF buy|买=possession OF obtain|得到. obt

10、ain|得到 own|有 hypernym; possessor OF obtain|得到=possessor OF own|有; possession OF obtain|得到=possession OF own|有.,公理关系与角色转换 - 2,lose|失去 own|有 precondition; possessor OF lose|失去=possessor OF own|有; possession OF lose|失去=possession OF own|有. lose|失去 obtain|得到 mutual precondition; possessor OF lose|失去=pos

11、sessor OF obtain|得到; possession OF lose|失去=possession OF obtain|得到.,概念相似度计算,贪官 学生 0.307692 贪官 教师 0.355556 贪官 校长 0.386667 贪官 市长 0.454545 walk run 0.144444 walk jump 0.144444 walk swim 0.130159 walk fly 0.124444 walk buy 0.018605,概念相关关系的建立,试比较HowNet 关于WordNet的评述,试比较HowNet,举例: buy 床,关于WordNet的评述(1),On

12、WordNet 1 Jordan Bo Boyd-Graber et al., Oct. 2005, Adding Dense, Weighted Connections to WordNet (Princeton paper) 2 Rila Mandala et al., ACL W98, 1998, The Use of WordNet in Information Retrieval (TIT paper),关于WordNet的评述(2),Princeton paper reads: “1.1 Shortcomings of WordNet No cross-part-of-speech

13、 links traffic (n) stop (v) Too few relations chopsticks Chinese restaurant No weighted arcs run:jog; run:move”,关于WordNet的评述(3),Princeton paper continues: To address these shortcomings, we are working to enhance WordNet by adding a radically different kind of information. The idea is to add quantifi

14、ed, oriented arcs between pairs of synsets, e.g. from car, auto to road, route, from buy, purchase to shop, store, and also in the opposite direction. Each of these arcs will bear a number corresponding to the strength of the relationship. We chose to use the concept of evocation how much one concep

15、t evokes or brings to mind the other to model the relationships between synsets.,结语,知识是关系的系统; 知网是描述概念与概念间的关系以及概念的属性与属性间的关系的知识系统; 知网描述的关系是可计算的; 知网在本质上不同于WordNet; 知网在发展。,谢谢! 欢迎来到 ,附录 - 普遍的语义机制,跳: 跳河 - jump into a river (LocationFin) 跳楼 - jump off a high building (LocationIni) 跳墙 - jump over a wall (Lo

16、cationThru) 导: 导游 - 导购 - 导诊 托: 医托 - 婚托 野: 野餐 - 野炊 - 野营 - 野游 - 野泳 / 野浴,附录 - 基本数据统计,中文 : 06-04-07 synset: Set = 13700 (13692) (13463 ) Word Form = 55180 (55150) (54312) antonym: Set = 13154 (13145) (12777) converse: Set = 6803 (6804) (6753) 英文 : synset: Set = 18622 (18610) (18575) Word Form = 58622 (5

17、8588) (58488) antonym: Set = 12269 (12268) (12032) converse: Set = 6455 (6454) (6442),附录 - 1. 事件框架 Verb frame - event|事件 static|静态 event|事件 relation|关系 static|静态 possession|领属关系 relation|关系 own|有 possession|领属关系:possessor=*,possession=* obtain|得到 own|有:possessor=*,possession=*,source=* act|行动 event|

18、事件:agent=* ActGeneral|泛动 act|行动:agent=* ActSpecific|实动 act|行动:agent=* AlterSpecific|实变 ActSpecific|实动:agent=* AlterRelation|变关系 AlterSpecific|实变:agent=* AlterPossession|变领属 AlterRelation|变关系:agent=*,possession=* take|取 AlterPossession|变领属:agent=*,possession=*,source=* buy|买 take|取:agent=*, possession=*, source=*, cost=*, beneficiary=*,附录 - 2. 事件角色的典型演员 VerbNet buy|买 take|取:agent=human|人group|群体-, possession=artifact|人工物-, source=human|人InstitutePlace|场所, cost=money|货币, beneficiary=human|人group|群体-, domain=economy|经济,附录 - 关系类型,


