1、Internationalization&LocalizationCopyright 2000,SAS Institute Inc.All rights reserved.Jane Xue,GTD2,China R&D,APRD June 23,2005Objectives of the PresentationWhatThisPresentationISItprovidesdefinitionsofterminology.ItgivesbackgroundinformationaboutinternationalizationandlocalizationItidentifiesthehig
2、h-riskareasforNLStesting.ItgivesI18NandI10Nchecklist.WhatThisPresentationIsNOTItisnotanin-depthdiscussionofallpossibleaspects.Itdoesnotprovidesolutionstoveryspecificproblems.Questions about internationalization and localizationQ1:IonlyprovideEnglishsoftware,soIneednotdointernationalization?A1:No,eve
3、nifyoudidntprovidelocalizedsoftware,youstillneedtointernationalizeyoursoftwaretoguaranteeitrunsuccessfullyonNon-EnglishOS.Q2:Localizationissimple,youonlyneedtotranslatetheUItolocallanguage?A2:No,Translationisonlypartoflocalization.Q3:LocalizationtestingonlytotesttheUItranslationiscorrectornot?A3:No,
4、wedoallunitetestaswellassystemtesttoguaranteeallfunctionworkscorrectlyinDBCSOS.Part I:Basic conceptsInternationalization&LocalizationLocale&EncodingCharactersSets&FontTranscoding&TranslateTableBackgroundofUnicodeNLS TerminologyInternationalization(I18N):Internationalizationabbreviatedasi18n,becauset
5、hereare18lettersbetweentheiandthen.I18Nistheprocessofdesigninganapplicationsothatthefeaturedesignandcodedesigndonotmakeassumptionsbasedonasinglelanguageorlocale.Internationalizationalsoassumesthatthesourcecodebasesimplifiesthecreationofdifferentlanguageeditionsofaprogram.Onegoalofinternationalizatio
6、nistoensurethatinternationalconventions(includingrulesforsortingstringsandforformattingdates,times,numbers,andcurrencies)aresupported.Anothergoalistodesigntheproductinsuchawaythatuserswillhaveaconsistentlook,feel,andfunctionalityacrossdifferentlanguageeditionsofaproduct.NLS Terminology(Cont)Localiza
7、tion(L10n):Localizationabbreviatedasl10n,becausethereare10lettersbetweenthelandthen.Localizationistheprocessofadaptingsoftwareforaparticulargeographicalregion(orlocale).Translationoftheuserinterface,systemmessagesanddocumentationisalargepartlocalization,butnotallofit.NLS Terminology(Cont)Locale:Loca
8、leismorethanlanguage.Alocalereflectsthelocalconventions,language,andcultureforaparticulargeographicalregion.Forexample,PortugueseisspokeninBrazilaswellasinPortugal,buttheculturesaredifferent.Therefore,theyareconsideredtwodifferentlocales.Additionally,acountrymayhavemorethanoneofficiallanguage.Forexa
9、mple,Canadahastwoofficiallanguages:EnglishandFrench.Thus,Canadahastwodistinctlocales:Canadian-EnglishandCanadian-French.Somelocale-dependentcategoriesincludetheformattingofdatesandthedisplayformatformonetaryvalues.NLS Terminology(Cont)CharacterSet:Therepertoireofcharactersandsymbolsthatareusedbyalan
10、guageorgroupoflanguages.EverycharactersetincludesNationalcharacters(acharacterspecifictoaparticularnationorgroupofnations)PunctuationmarksEnglishalphabetNumbersControlcharacters(neededbythecomputer)NLS Terminology(Cont)Encoding:Anencodingmapseachcharacterinacharactersettoauniquenumericrepresentation
11、Aparticularcharactercanhavedifferentnumericrepresentationsindifferentencodings.Forexampletheisrepresentedas91intheDanishEBCDICencodingandas197intheWindowsLatin1encoding.NLS Terminology(Cont)Single-bytecharacterset(SBCS):Acharacterencodingwhereeachcharacterisrepresentedbyonebyte.Single-bytecharacter
12、setsarelimitedto256characters.GroupsoflanguagesarerepresentedinaSBCScharacterset.ExamplesofanSBCScharactersetaretheASCIILatin1andLatin2charactersets,whichrepresentthecharactersofWesternandCentralEurope,respectively.Double-bytecharacterset(DBCS):AtSAS,werefertotheAsiancharactersets(Japanese,Korean,Si
13、mplifiedChinese,andTraditionalChinese)asdouble-bytecharactersets(DBCS).DBCSisamisnomersinceindividualcharacterscantakeoneortwobytestorepresent.Amoreappropriatetermismulti-bytecharacterset(MBCS),whichisusedasasynonymforDBCS.IttakesaspecialbuildoftheSASSystemtohandleDBCSuserdata.NLS Terminology(Cont)U
14、nicode:Amultilingualcharactersetwhichcanrepresentcharactersfrommorethanonelanguagegroup.JavalanguageuseUnicodeencodings.TheMVAsystemuseseitheranSBCSencodingoraDBCSencoding.UCS2:Afixed-length,2-byteformofUnicode.UTF16:Aformof2-byteUnicodethatmakesallowancesforsomecharacterstobe4byteswide.A4-bytechara
15、cterinUTF16iscalledasurrogatepair.UTF8:Avarying-lengthformofUnicodewhereU.S.charactersareallrepresentedin1byte.Othercharacterswilltakefrom2to4bytestorepresent.UCS4:Afixed-length,4-byteformofUnicode.NLS Terminology(Cont)Transcoding:Theprocessofmovingdatafromoneencodingtoanother.Forexample,theLatin1en
16、codingforthe$is24hex.TheDanishEBCDICencodingfor$is67hex.IfdataisproperlytranscodedfromASCIILatin1toDanishEBCDIC(CP1142),thevalue24isconvertedtoa67.TranscodingcantakeastringfromASCIItoEBCDICorfromoneDBCSencodingtoanother.Trantab(translatetable):Atableusedtotranscodedatafromoneencodingtoanother.SASlan
17、guageelementsthatcontrollocalevaluesandencodingpropertiesautomaticallyinvoketheappropriatetranslationtable.Translationtablesarespecifictotheoperatingenvironment.NLS Terminology(Cont)Font:Fontdefinesthegraphicalrepresentationofacharacter.Thereisa1:nrelationshipbetweenacharacteranditsrepresentationind
18、ifferentfonts.Fontisasetofprintableordisplayabletextcharactersinaspecificstyleandsize.Thetypedesignforasetoffontsisthetypefaceandvariationsofthisdesignformthetypefacefamily.Inpractice,fontandtypeface(printfont)areoftenusedwithoutmuchprecision,sometimesinterchangeably.Quote“Insteadofwastingyourtimeto
19、teachFrenchtoyourcomputers,youdbetterteachEnglishtoallyourpersonnel.Amanagerofauniversitycomputingcentre(inthestoneageofcomputing).I18n goalsOnegoalofinternationalizationistoensurethatinternationalconventions(includingrulesforsortingstringsandforformattingdates,times,numbers,andcurrencies)aresupport
20、ed.Anothergoalistodesigntheproductinsuchawaythatuserswillhaveaconsistentlook,feel,andfunctionalityacrossdifferentlanguageeditionsofaproduct.I18n challengesSoftwareneedstofunctionproperlyineveryglobalmarket.Applicationsneedtoconformlocalconventions.Dataneedtobeprocessedsuccessfullyinnativelanguagesan
21、denvironments.Themeaningofcharactersmustberetained,acrossenvironments,andregardlessoftheencodingused.Languages and EncodingsDifferentlanguagesusedifferentcharactersinwriting.Veryfewuseonlythe26charactersAthroughZoftheLatinalphabet.Acharacterisrepresentedinacomputerbyassociatingitwithanumber,orbinary
22、code.Thesenumbersareuniquewithinaset.Thewholesetofsuchmappingsformsa“characterset“or“encoding”.Languages and Encodings(Cont)Hence,allcharacter-baseddatathatisstored,transmittedorprocessedisinanencoding.Manyencodingshavebeendevelopedtoaddresstherequirementsofdifferentlanguagesandcomputingenvironments
23、Problems with data transferAlthoughitiseasytotransferdatawithinanenvironmentusingthesameencoding(characterset)Itremainsdifficulttotransferanddisplaydatawithinenvironmentsusingdifferentencodings(charactersets).Problems with data transfer(Cont)Encodingsystemsconflictwithoneanother.Thatis,twoencodings
24、canusedifferentpositionsforthesamecharacter,orusethesamecodepositionfortwodifferentcharacters.Yetwheneverdataispassedbetweendifferentencodingsorplatforms,thatdatarunstheriskofcorruption.Quote“Inhindsight,whatwehavedoneisinventedacomputerCommunicationsTowerofBabel”.EdwinHart,ASCIIandEBCDICCharacterSe
25、tandCodeIssuesinSystemsApplicationArchitecture,1989.UnicodeUnicodeischangingallthat!Unicodebasicallyprovidesauniquepositionforeverycharacter,nomatterwhattheplatform,nomatterwhattheprogram,nomatterwhatthelanguage.UnicodeUnicodeisa16-bitencodingthatcanrepresentmorethan65,000characters.Itsolvestwomajor
26、problems,commoninmultilingualcomputerprograms:theexistenceofseveralinconsistentcharacterencodingsfontavailabilityfordifferentcharacterencodings.When the world wants to talk,it speaks Unicode Nr verden vil tale,taler den Unicode.Cnd lumea vrea s comunice,vorbete Unicode.Kiam la mondo volas paroli,i p
27、arolas Unicode.,Unicode.当世界需要沟通时,当世界需要沟通时,请使用请使用Unicode!Questions?Part II:I18N TestingWhyI18NtestingisimportantToptenI18NtipsMajorI18NIssuesCommonI18NMistakesHigh-RiskI18NoperationsSASproductionknownI18NproblemI18NCheckListI10NCheckListWhy I18N testing is importantCantsellaproductthatdoesnotwork!Not
28、onenewSASproducthasbeenfullyfunctionaloutsideoftheU.S.andWesternEuropeoninitialrelease.SAS World Wide RevenueWhy I18N testing is important(Cont)ImagineourrevenuepotentialwhenallSASproductsworkinallmarkets!I18Nisimportantforourcompanysgrowth.Astesters,weshouldtreatI18Nbugsasseriouslyasfunctionalbugs,
29、tousethestrategiesandtoolsthatareavailable,andtoverifythatoursoftwareisreadyforinternationalcustomers.Ifwedothis,wewillexpandourmarketandincreaseourproductrevenue.Top ten I18N tips1.DontassumealllettersofthealphabetfallbetweenAandZ2.Donthardcodestrings3.Donthardcodefonts4.Dontconcatenatestring5.Read
30、ingandwritingisfundamental(encoding)6.SortusingtheCollatorclass7.Displayingcomplextext(garbageeasy)8.UseComponentOrientationforGUIlayout(righttoleftwriting)9.UseDateFormattodisplaydates10.UseNumberFormattoformatnumbersMajor I18N IssuesTextMessagesFormats/ConventionsNumbersCurrenciesDate/TimePostalCo
31、desAddresses Some issues are simplerMajor I18N Issues(Cont)SymbolsandAllowedUsagesIconsMeaningofGraphicsMeaningofColorsWritingSystems,CharactersCharacterConversionLocalecustoms(forexample,humor,idioms,allusions)PoliticalIssueThatisthereasonwhywesaylocalizationtranslation Some issues are not so simpl
32、erCommon I18N MistakesAssumptionoftextlength:-hardcodedtextlength-hardcodeddisplaylengthAssumptionoffonts-hardcodedfontnamesAssumptionofLanguageStructure:-usingEnglishdependentLanguagesyntax(of,in,to,with,etc).-concatenatestringAssumptionoftextposition-hardcodeddisplaypositionCommon I18N Mistakes(Co
33、nt)Assumptionofvalidcharactercoderange-hardcodedrange-useof8bitcharacterconstantsAssumptionofdefaultvalue-hardcodeddefaultvalueInsufficientinternalbuffer-hardcodedbufferlengthUseofMACROvariablereference-hardcodedtextCommon I18N Mistakes(Cont)Storingexternalizedmessagesinapplicationdata-LanguageDepen
34、dentApplication-hardcodedmessageinapplicationOnlyexternalizingpartialitems-hardcodedtextbymistake-hardcodedkeywordsMisuseofexternalizeditems:-sharingsametextfordisplayandasprogramconstant-usingsametextindifferentcontextHigh-Risk I18N operationsFromaninternationalizationtestingperspective,severalsyst
35、emfunctionsandoperationsrepresentahigherriskforinternationalizationerrors.Yourcodesupportstheseareasdoesnotmeanthatyouwillhaveproblems,onlythereispotentialYoumayhaveinternationalizationproblemsifyourcodesupportsthefollowing:High-Risk I18N operations(Cont)YourcodesupportsNLScharactertypesSinglebyteen
36、coding(SBCS)Multibyteencoding(MBCS/DBCS)Widecharactersupport(WCS)UnicodesupportBIDItextsupport(bi-directional)High-Risk I18N operations(Cont)YourcodeacceptsmultiplecharactertypesNationalcharactersMathematicalandTechnicalSymbolsGeometricShapesPublishingcharactersBasicdingbats,suchassmileyfacesusedine
37、mailPunctuationmarksLongvariablecharacterstringHigh-Risk I18N operations(Cont)Yourcodesupportssorting,collation,orcomparisonResultsshouldbepredictable,suchasdescendingorderingResultsshouldbeinaculturallyexpectedorderYourcodesupportsfontsFontdefinitionisaffectedbylocale.Readfromalternateencodingintos
38、essionencoding,Readfromotherplatformsandapplications,ApplicationencodingsareconsistentwithSASencodings,suchasdatabasesgeneratedinOracleLatin2areconsistentwithSASLatin2codepagesHigh-Risk I18N operations(Cont)YourcodesupportsreadingexternalfilesUseoffilesgeneratedinotherapplicationsandcodepages.Readfr
39、omalternateencodingintosessionencodingReadfromotherplatformsandapplicationsApplicationencodingsareconsistwithSASencodingsHigh-Risk I18N operations(Cont)YourcodesupportwritingexternalfilesFilescreatedinalternateencodingsforstorageandusebyotherapplicationsonotherhostandoperatingsystems.Savefilesinalte
40、rnateencodings.SavefilesusingfilenamesthatcontainnationalcharactersSupporthierarchicalencodingprecedenceCreatefilesinclient/serverenvironmentsHigh-Risk I18N operations(Cont)YourcodesupportsdatatransferSharingofdatabetweentwodifferentSASsessionsorapplications.Client/Serverenvironment,suchasSAS/Sharea
41、ndConnectinaMVS/Windowsenvironment.Localetolocaleenvironment,suchastwodifferentSASsessionswithdifferentlocalesettings.Transferdataandapplicationsfromoneplatformtoanother,suchaswritingaSASapplicationwithChinesecharactersandexpectingittoruninJapaneseSASsession.Cutandpastenationalcharactersbetweenappli
42、cationsHigh-Risk I18N operations(Cont)YourcodedisplaystextProblemsoccurwhenStringlengthsarehardcodedDisplaylengthsarehardcodedDisplaypositionsarehardcodedKeyworduseanddependencyishardcodedInternalbufferstorageisinsufficientMACROvariablesareused(sameashardcodedtext)TextissharedformultiplepurposesLang
43、uageandsentencestructurearenotflexibleAssumptionsofvalidcharactercodepagerangesaremadeHigh-Risk I18N operations(Cont)YourcodedisplaysdatathroughaGUIinterfaceCheckboxesandmenussupportlanguagescriptsSystemautowrapperssupportmulti-bytecharacterssplittingcorrectlyDisplayoperatingsystemstringswithspecial
44、characterscorrectlyLinefeedpositionconsiderationforsplittingwords,especiallydoublebytecharactersConsistentUserInterfacelookandfeelWindowscrollbaravailabilityforusewithlongtextdisplaySpacingconsiderationsforboxes,radiostoavoidwidgetoverwrapsHigh-Risk I18N operations(Cont)Yourcodesupportsfeaturessusce
45、ptibletoculturedefinitionandappropriatenessSystemmeasurements,suchasmetricsCalendarDate/timeformatsandinformatsAddressformatsTelephonenumberformatNumbersCurrencyPapersizeHigh-Risk I18N operations(Cont)YourproductwasdevelopedwithSASandneedstooperatewhenthelocaledynamicallychangesForexample,changingth
46、eregionalsettingsinaWindowsenvironmentandtestingtheresultsinanapplicationgeneratedbyAppDevStudio.TestingSASapplicationsneedtoconsider:LanguageKeyboardlayoutOutputtextlayoutCharacterclassificationSortingandcollationrulesDefaultpageandlinesizesLocalesensitiveparameter,i.e.addressformatSystemoperations
47、suchasbeepsandmeasurementsSAS production known I18N problemMnemonicProblemsDonotseparateamnemonicasasinglepropertyitemDonotaddmnemonicunderbarsignatthefirstoccurenceofaspecifiedcharacterinastringDonotaddmnemonicunderbarsignatspecificpositionDonotsharepropertyitemforUIstringhavingmnemonicandUIstring
48、withoutmnemonicSAS production known I18N problem(Cont)LayoutProblemsDonotuseAutoflowinlinefeedexample1,example2DonotdesigndialogassumingaspecificfontDonotdecidethecomponentssizeassumingEnglishoriginalmessageSAS production known I18N problem(Cont)InvalidPropertyFileDonotuseLatin1-specificcharactersin
49、JavaPropertyfilesDonotuseUnicode-escapecharactersinJavaPropertyfilesDonotexternalizeuntranslatablestringlikeSASoptiontoJavaPropertyfilesTranslatecommentsisnotclearinJavaPropertyfilesSAS production known I18N problem(Cont)Locale-dependantProblemsNLDATEformatmustbeselectableDate-timedisplayinUImustbel
50、ocalizedaspossibleCalendarmustbelocalizableDonotcomposeUItextinprogramSAS production known I18N problem(Cont)Encoding-dependantProblemsAddcorrectencodingtagtoXMLAddcorrectencodingtodatasetAddcorrectencodingtologfileI18N check listGoodI18NiskeytoreducingcostofL10NandtimetomarketWithI18Ntest,Weshouldm