Under the Supervision of.doc

上传人:土8路 文档编号:10328950 上传时间:2021-05-09 格式:DOC 页数:13 大小:97KB
返回 下载 相关 举报
Under the Supervision of.doc_第1页
第1页 / 共13页
Under the Supervision of.doc_第2页
第2页 / 共13页
Under the Supervision of.doc_第3页
第3页 / 共13页
Under the Supervision of.doc_第4页
第4页 / 共13页
Under the Supervision of.doc_第5页
第5页 / 共13页
点击查看更多>>
资源描述

《Under the Supervision of.doc》由会员分享,可在线阅读,更多相关《Under the Supervision of.doc(13页珍藏版)》请在三一文库上搜索。

1、Thesis OnArtificial Intelligence for Universal Networking Language (UNL)(Perspective Bengali Language)ByDeen Islam MuslimID: 200720851Ariful Hoque TuhinID: 200710698Shohanur RahmanID: 200720100 Under the Supervision ofMd. Ahsan Arif, Sr. LecturerDept. of Computer Science and EngineeringAsian Univers

2、ity of Bangladesh Artificial Intelligence for Universal Networking Language (UNL)(Perspective Bengali Language)Deen Islam Muslim, Ariful Hoque Tuhin, Shohanur RahmamDepartment of Computer Science and EngineeringAsian University of BangladeshAbstract:In this paper we present the computational analysi

3、s of the complex case structure of Bengali- a member of the Indo Aryan family of languages- with a view toward interlingua based MT. Bengali is ranked 4th in the list of languages ordered according to the size of the population that speaks the language. Extremely interesting language phenomena invol

4、ving morphology, case structure, word order and word senses make the processing of Bengali a worthwhile and challenging proposition. A recently proposed scheme called the Universal Networking Language has been used as the Interlingua. The approach is adaptable to other members of the vast Indo Aryan

5、 language family. The parallel development of both the analyzer and the generator system leads to an insightful intra-system verification process in place. Our approach is rule based and makes use of authoritative treatises on Bengali Grammar and develop rules for certain Bengali to UNL conversion p

6、rocess. Introduction:About 189 Million people speak Bengali and is ranked 4th in the world in terms of the number of peoplespeakingthelanguage(ref:http:/www.harpercollege.edu/mhealy/g101ilec/intro/clt/cltclt/top100.html). Like most languages in the Indo Aryan family, descended from Sanskrit, Bengali

7、 has the SOV structure with some typical characteristics. A motivating factor for creating a system for processing Bengali is the possibility of laying the framework for processing many other Bengal languages too.Work on Indian language processing abounds. Project Anubaad 1 for machine translation f

8、rom English to Bengali in the newspaper domain uses the direct translation approach. Angalabharati 2 system for English Bengal machine translation is based on pattern directed rules for English, which generates a pseudo-target-language applicable to a group of Indian Languages. In MATRA 3, a web bas

9、ed MT system for English to Bengal in the newspaper domain, the input text is transformed into case-frame like structures and parameterized templates generate the target language. The MANTRA MT system for official documents uses Tree Adjoining Grammar (TAG) to achieve English Bengal MT (ref: http:/

10、Project Anusaaraka 4 is a language accessor system rather than an MT system and addresses multiple Indian languages. Interlingua based MT for English, Bengal and Marathi 5 6, that uses the UNL, transforms the source text into the UNL representation and generates target text from this intermediate re

11、presentation. References to most of these works can also be found at http:/www.tdil.mit.gov.in/mat/ach-mat.htm. Other famous MT systems are Pivot 7, Atlas 8, Kant 9, Aries 10, Geta 11, SysTran 12 etc. The Universal Networking Language (UNL) (http:/www.unl.ias.unu.edu) has been defined as a digital M

12、eta language for describing, summarizing, refining, storing and disseminating information in a machine independent and human language neutral form. The information in a document is represented sentence by sentence. Each sentence is converted into a directed hyper graph having concepts as nodes and r

13、elations as arcs. Knowledge within a document is expressed in three dimensions: 1. Word Knowledge is expressed by Universal Words (UWs), which are language independent. These UWs are tagged using restrictions describing the sense of the word in the current context. For example, drink(icl liquor) den

14、otes the noun sense of drink restricting the sense to a type of liquor. Here, icl stands for inclusion and forms an is-a relationship like in semantic nets.2. Conceptual Knowledge is captured by relating UWs through a set of UNL relations 14. For example, Humans affect the environment is described i

15、n the UNL asagt(affect(icldo).present.entry, human(iclanimal).pl)obj(affect(icldo).present.entry, environment(iclabstract thing).pl)agt means the agent and obj the object. affect(icl do), human(icl animal) and environment(icl abstract thing) are the UWs denotingconcepts.3. Speakers view, aspect, tim

16、e of event, etc. are captured by UNL attributes.For instance, in the above example, the attribute entry denotes the mainpredicate of the sentence, present the present tense and pl the pluralNumber.The above discussion can be summarized using the example below John, who is the chairman of the company

17、, has arranged a meeting at his residenceThe UNL for the sentence is;= UNL =mod(chairman(iclpost).present.def,company(iclinstitution).def)aoj(chairman(iclpost).present.def, John(iclperson)agt(arrange(icldo)plete, John(iclperson)pos(residence(iclshelter), John(iclperson)obj(arrange(icldo)plete, meeti

18、ng(iclevent).indef)plc(arrange(icldo)plete, residence(iclshelter);=In the expressions above, agt denotes the agent relation, obj the object relation, plc the place relation, pos is the possessor relation, mod is the modifier relation and aoj is the attribute-of-the-object (used to express constructs

19、 like A is B) relation. The detailed specification of the Universal Networking Language can be found at http:/www.unl.ias.unu.edu/unlsys.Our work is based on an authoritative treatise on Bengali grammar. The strategies of analysis and generation of linguistic phenomena have been guided by rigorous g

20、rammatical principles.Universal Networking Language Based Analysis and Generation for BengaliBangla-English DictionaryBangla to English dictionary is the source of building a Bangla to UNL dictionary asuniversal words are English words mandated by UNL. Such dictionaries also provide allattributes al

21、ong with the meaning of a word. Any entry in the dictionary is put in thefollowing format:HW ID “UW” (ATTRIBUTE1, ATTRIBUTE2 . . .) Here,HW Head Word (Bangla word)ID Identification of Head Word (omitable)UW Universal WordATTRIBUTE Attribute of the HWFLG Language FlagFRE Frequency of Head WordPRI reg

22、ion)” (N, PLACE) prochur “huge(iclbig)” (ADJ) Here the attributes,N stands for NounPLACE stands for placeADJ stands for AdjectiveFLG field entry is B which stands for BanglaA universal knowledge base is defined in UNL specification. This knowledge base islanguage independent and each native language

23、 word should be referenced to this knowledgebase. The knowledge base of universal words is a hierarch of concepts.En-Converter and De-Converter machinesThe En-Converter (henceforth called EnCo) is a language-independent parser, a multi-headed Turing machine providing a framework for morphological, s

24、yntactic and semantic analysis synchronously using the UW dictionary and analysis rules. The structure of the machine is shown in the figure 1.Fig. 1. The EnCo machineThe machine has two types of heads- processing heads and context heads.The processing heads (2 nos.) are called Analysis Windows (AW)

25、 and the context heads are called Condition Windows (CW). The machine traverses the sentence back and forth, retrieves the relevant universal words from the lexicon and, depending on the attributes of the nodes under the AWs and those under the surrounding CWs, generates semantic relations between t

26、he UWs and/or attaches speech act attributes to them. The final output is a set of UNL expressions equivalent to a UNL graph. The De-Converter (henceforth called the DeCo) 18 is a language-independent generator that produces sentences from UNL graphs (figure 2).Fig. 2. The DeCo machineLike EnCo, DeC

27、o too is a multi-headed Turing Machine. It does syntactic and morphological generation synchronously using the lexicon and the set of generation rules.Existing Problems:1) Spell Checking:By Somehow, Spell Checker is not included with the current UNL system. For that, there is a possibility to arisin

28、g wrong output during en-conversion or de-conversion process. An example of such situation is given below: A simple English sentence in a right spelling form:I live in BangladeshAccording to 13, the final UNL expression is as follows:aoj(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,i(ic

29、lperson)plc(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,bangladesh(iofasian_countrything)Where “Bangladesh” is assigned to the UW of “Asian_Country”, which tells the de-converter to search “Bangladesh” word in “Asian_Country” Category. But if we if we type the “Bangladesh” word in wron

30、g spelling like “Banladesh” then it convert that word in such form:aoj(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,i(iclperson)plc(live(iclinhabitbe,aojliving_thing,plcplace).entry.present,banladesh)It does not define any UW for “Bangladesh”. There for wrong conversion can be occurred.

31、 2) Maintaining Exact Grammatical Pattern:According to 13, for a single sentence or like some compound sentences it works fine. But we have found some crucial problems when en-converting and de-converting some multiple sentences.For An example, using this sentence:I like rice and I play football.En-

32、converting this sentence in UNL forms: aoj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.present,i(iclperson):01)obj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.present,rice(iclgrainthing)agt:02(play(iclcompetedo,agtthing,objuw,ptnthing).entry.present,i(iclperson):02)obj:02(play(iclcom

33、petedo,agtthing,objuw,ptnthing).entry.present,football(iclfield_gamething)and(:02,:01)After De-converting this UNL form in English:I like rice and I play football.But, Problem occurs when using some multiple sentences like:We play football. We try to win every match.En-converting this sentence in UN

34、L forms:agt(try(iclattemptdo,agtperson,objuw).entry.present,we(iclgroup):01.pl)pos:01(match(iclcontestthing),we(iclgroup):02)fictit(try(iclattemptdo,agtperson,objuw).entry.present,every(iclquantity,perthing)obj:01(win(iclprizedo,agtthing,objthing,scnthing).entry,match(iclcontestthing)obj(try(iclatte

35、mptdo,agtperson,objuw).entry.present,:01)After De-converting this UNL form back to English:We try win we match.More exampleI am rahim. i am a student of asian university of bangladesh.En-converting this sentence in UNL forms:aoj(rahim.entry.present,i(iclperson):01)fictit(rahim.entry.present,i(iclper

36、son):02)aoj(student(icluniversity_studentperson,objknowledge_domain).indef.present,i(iclperson):02)mod(university(iclbodything),asian(icladj,comasia)obj(student(icluniversity_studentperson,objknowledge_domain).indef.present,university(iclbodything)obj(university(iclbodything),bangladesh(iofasian_cou

37、ntrything)After De-converting this UNL form back to English:I BE rahim. I a student of an Asian university of Bangladesh3) Lack of Exact Rules and Absence of AI: Lets try some tricks with UNL:My name is Casper and I dont like to play football.According to 13, en-conversion takes this sentence in UNL

38、 form like this way:pos(name(icllanguage_unitthing,posthing),i(iclperson):01)aoj(casper(iofcitything).entry.present,name(icllanguage_unitthing,posthing)and(:01,casper(iofcitything).entry.present)aoj:01(like(iclpleasebe,equenjoy,objuw,aojperson).entry.not.present,i(iclperson):02)obj:01(like(iclplease

39、be,equenjoy,objuw,aojperson).entry.not.present,play(iclcompetedo,agtthing,objuw,ptnthing)obj:01(play(iclcompetedo,agtthing,objuw,ptnthing),football(iclfield_gamething)Where name “Casper” categorized under “City”, not as a person name. Similarly for Bangla to UNL conversion we can give an example lik

40、e:Golapi ekhon aar train-e uthena.During the En-conversion, we may face the problem to identify whether Golapi is a person name or a color name.4) Verb Representation Problem:Let take a look at this example:I am a good boy.En-converted UNL form:aoj(boy(iclchildperson,antgirl).entry.indef.present,i(i

41、clperson)mod(boy(iclchildperson,antgirl).entry.indef.present,good(icladj,antbad)De-Converted English:I Be a good boy.Another Example:My name is Karim and I like flowers.En-converted UNL form:pos(name(icllanguage_unitthing,posthing),i(iclperson):01)aoj(kerim(iclname,iofperson,commale).entry.present,n

42、ame(icllanguage_unitthing,posthing)and(i(iclperson):02,karim(iclname,iofperson,commale).entry.present)man(kerim(iclname,iofperson,commale).entry.present,like(iclhow,objthing)man(i(iclperson):02,like(iclhow,objthing)obj(like(iclhow,objthing),flower(iclangiospermthing).pl)De-converted English:The name

43、 of I like is Karim and I this like the flowers.Recommended Solution: 1) Implementation of a good spell checker 2) Rearrange the Sentence in Exact Grammatical Pattern3) Implementation of AI and Exact Rules4) Determine the Exact Form of Verb RepresentationConclusionsSystematic analysis of the case st

44、ructure forms the foundation for any natural language processing system. In this paper, we have described a system for the computational analysis of the Bengali case structure for the purpose of Interlingua based MT using UNL. The complementary generator system too has been implemented, which provid

45、es the platform for intra system verification. Verification via cross system generation is being done using the Bengal generation system (also under development.) Apart from the case structure, computational analysis based on authoritative grammatical treatise, addressing complex phenomena involving verbs, adjectives and adverbs is under way.References:1 Dey, K.: Project Anubaad: an English-Bengali MT system. Jadavpur University, Kolkata (2001)2 Sinha, R.: Machine translation: The Indian context. AKSHARA94, New Delhi (1994)3 Rao

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 社会民生


经营许可证编号:宁ICP备18001539号-1