1、第四章 资源管理,龚 斌 山东大学计算机科学与技术学院 山东省高性能计算中心,Globus与资源规范语言RSL,Globus的资源管理,Globus RMS,Globus Components In Action,Local Machine,mpirun,globusrun,Remote Machine,App,Nexus,AIX,PBS,MPI,grid-proxy-init,GRAM Gatekeeper,GSI,GRAM Job Manager,GASS Client,Remote Machine,App,Nexus,Solaris,Unix Fork,MPI,GRAM Gatekeep

2、er,GSI,GRAM Job Manager,GASS Client,RSL multi-request,RSL single request,DUROC,GASS Server,RSL parser,GRAM(Globus Resource Allocation Manager) Overview,定位:资源管理的最低层 功能:远程运行作业,通过提供的API提交,检测与终止作业 GRAM的具体职责 处理Resource Specification Language (RSL)形式的作业请求 对创建的作业进行远程监控与管理 更新MDS的信息,Globus Pre-WS Component I

3、nteraction Diagram,From IBM Redbook SG24-6895-012003: Intro to Grid Computing,GRAM: Grid Resource Allocation Manager GASS: Global Access to Secondary Storage(辅助存储全局访问) MDS: Monitoring and Discovery Service GRIS: Grid Resource Information Service GIIS: Grid Index Information Service,GRAM,Service that

4、 provides remote execution and status management of the request When a job is submitted by a client, the request is sent to the remote host and handled by the gatekeeper daemon located in the remote host. Then the gatekeeper creates a job manager to start and monitor the job. When the job is finishe

5、d, the job manager sends the status information back to the client and terminates.,GRAM Architecture,From IBM Redbook SG24-6895-012003: Intro to Grid Computing,GRAM Elements,Clients Gatekeeper daemon门户监护进程 Job Manager Global Access to Secondary Storage (GASS)辅助存储全局访问 Dynamically-Updated Request Onli

6、ne Coallocator (DUROC)动态更新请求在线协同分配器 User Resource Specification Language (RSL),GRAM Clients,Three clients: globusrun globus-job-run globus-job-submit,GRAM管理流程图示,Client API,Job Manager Scheduler Specific Plugin,Job Process,Gatekeeper,Job Request,Job cancel,state change callback,fork/su/exec,fork/exec

7、/wait spsubmit/spq condor,lsf,gatekeeper的作用,gatekeeper:A process, running as root, which begins the process of handling allocation requests performing mutual authentication of user and resource, determining a local user name for the remote user, starting a job manager which executes as that local us

8、er and actually handles the request. In order to start the job manager, the gatekeeper must run as a privileged program,相关名词解释,Resource An entity capable of running one or more processes on behalf of a user Client The process that is using the resource allocation client-side API Job A process or set

9、 of processes resulting from a job request. Job Request A request to gatekeeper to create one or more job processes, expressed in the supplied Resource Specification Language. Job Manager One job manager is created by the gatekeeper to fulfill every request submitted to the gatekeeper.,GRAM调度与状态转换模型

10、,对各个阶段的解释,Unsubmitted :The job has not yet been submitted to the scheduler StageIn :The job manager is staging executable, input, or data files to the job Pending :The job has been submitted to the scheduler, but resources have not yet been allocated for the job. Active :The job has received all of

11、its resources, and the application is executing Suspended :The job has been stopped temporarily by the scheduler StageOut :The job manager is staging output files from the job manager host to remote storage. Done :The job completed successfully. Failed :The job terminated before completion, as a res

12、ult of an error, or a user or system cancel.,GRAM Components,Globus Security Infrastructure,Job Manager,GRAM client API calls to request resource allocation and process creation.,MDS client API calls to locate resources,Query current status of resource,Create,RSL Library,Parse,Request,Allocate & cre

13、ate processes,Process,Process,Process,Monitor & control,Site boundary,Client,MDS: Grid Index Info Server,Gatekeeper,MDS: Grid Resource Info Server,Local Resource Manager,MDS client API calls to get resource info,GRAM client API state change callbacks,DUROC(Dynamically-Updated Request Online Co-alloc

14、ator),Simultaneous allocation of a resource set Handled via optimistic co-allocation based on free nodes or queue prediction advance reservations will also be supported globusrun will co-allocate specific multi-requests using DUROC,GRAM Examples,The globus-job-run client is a sample GRAM client, usi

15、ng command-line arguments rather than RSL. % globus-job-run pitcairn.mcs.anl.gov /bin/ls % globus-job-run pitcairn.mcs.anl.gov s myprog % globus-job-run pitcairn.mcs.anl.gov s myprog stdin s in.txt stdout s out.txt,GRAM Examples,The globusrun client is a more involved prototype that allows complicat

16、ed RSL expressions. % globusrun r pitcairn.mcs.anl.gov f myjob.rsl % globusrun r pitcairn.mcs.anl.gov &(executable=myprog),Resource Management APIs,Globus Toolkit has APIs for RSL, GRAM, and DUROC: globus_rsl globus_gram_client globus_gram_myjob globus_duroc_control globus_duroc_runtime,Resource Spe

17、cification Language,可以用于说明作业要求的通用语言 RSL是GRAM的核心部分,它提供了不同组件之间交换信息的手段,比如应用与资源代理之间,资源协同分配与资源管理之间的信息交换 形式 (attribute=value) 需要GRAM理解这些属性attribute Globus提供使用RSL的API 可以用于以上之外的更多场合,RSL的一些属性,(executable=string) Program to run A file path (absolute or relative) or URL (directory=string) Directory in which to

18、 run (default is $HOME) (arguments=arg1 arg2 arg3.) List of string arguments to program (environment=(E1 v1)(E2 v2) List of environment variable name/value pairs,RSL的一些属性,(stdin=string) Stdin for program A file path (absolute or relative) or URL (stdout=string) Stdout for program A file path (absolu

19、te or relative) or URL (stderr=string) Stdout for program A file path (absolute or relative) or URL (count=integer) Number of processes to run (default is 1) (hostCount=integer) On SMP multi-computers, number of nodes to distribute the “count” processes across (project=string) Project (account) agai

20、nst which to charge (queue=string) Queue into which to submit job,RSL的一些属性,(maxTime=integer) Maximum wall clock or cpu runtime (schedulerss choice) in minutes (maxWallTime=integer) Maximum wall clock runtime in minutes (maxCpuTime=integer) Maximum CPU runtime in minutes (maxMemory=integer) Maximum a

21、mount of memory for each process in megabytes (minMemory=integer) Minimum amount of memory for each process in megabytes,RSL Attributes For GRAM,(jobType=value) Value is one of “mpi”, “single”, “multiple”, or “condor” mpi: Run the program using “mpirun -np ” single: Only run a single instance of the

22、 program, and let the program start the other count-1 processes. multiple: Start instances of the program using the appropriate scheduler mechanism condor: Start a Condor processes running in “standard universe”,RSL Attributes for GRAM,(gramMyjob=value) Value is one of “collective”, “independent” De

23、fines how the globus_gram_myjob library will operate on the processes collective: Treat all processes as part of a single job independent: Treat each of the processes as an independent uniprocessor job (dryRun=true) Do not actually run job,RSL 的替代符,RSL supports simple variable substitutions Substitu

24、tions are declared using a list of pairs (rslSubstitution=(SUB1 val1)(SUB2 val2) A substitution is invoked with $(SUB) Processing order: Within scope, processed left-to-right, Outer scope processed before inner scope Variable definition can reference previously defined variables,替代符示例,This &(rslSubs

25、titution=(URLBASE “ftp:/host:1234”) (rslSubstitution=(URLDIR $(URLBASE)/dir) (executable=$(URLDIR)/myfile) is equivalent to this &(executable=ftp:/host:1234/dir/myfile),GRAM Defined RSL Substitutions,GRAM defines a set of RSL substitutions before processing the job request Machine Information GLOBUS


27、arrier mechanism values are “strict-barrier”, “loose-barrier”, “no-barrier” (subjobCommsType=value) values are “blocking-join” and “independent” if value is set to “independent”, the subjob wont be seen from the other subjobs when doing inter-subjob communication. (label=string) Identifier for this

28、subjob (resourceManagerContact=string) (resourceManagerName=string) Resource manager to which to submit a subjob,Example: (single resource for now) $ globusrun -r chi/jobmanager-pbs & (executable=“/home/abose/test.exe“) (host_count=2) (count=4) (arguments=“-t 100 f out.dat“) (email_address=“aboseumi

29、ch.edu“) (queue=“cac“) (pbs_stagein=“morpheus:/home/abose/test.exe“) (pbs_stageout=“morpheus:/home/abose/out.dat“) (pbs_stdout=“/tmp/stdout“) (pbs_stderr=“/tmp/stderr“) (maxwalltime=10)(jobtype=“mpi”) “get test.exe from morpheus and run it on hypnos” - submitted by Globus gatekeeper on chi using PBS

30、 job manager,RSL Example Resulting PBS Submission Script on Hypnos: #! /bin/sh # PBS batch job script built by Globus job manager # #PBS -S /bin/sh #PBS -M aboseumich.edu #PBS -m n #PBS -q cac #PBS -W stagein=/home/abose/test.exemorpheus.engin.umich.edu:/home/abose/test.exe #PBS -W stageout=/home/ab

31、ose/out.datmorpheus.engin.umich.edu:/home/abose/out.dat #PBS -l walltime=10:00 #PBS -o hypnos:/tmp/stdout #PBS -e hypnos:/tmp/stderr #PBS -l nodes=2 #PBS -v X509_USER_PROXY=/home/abose/.globus/.gass_cache/local/md5/1c/fd/d3/753b90 28dfec2ddd6df84cd06c/md5/0a/4b/1d/599dac54863d650c2531cb92fc/data,GLO

32、BUS_ LOCATION=/usr/grid,GLOBUS_GRAM_JOB_CONTACT=https:/chi.grid.umich.edu:58963/ 575/1047861360/,GLOBUS_GRAM_MYJOB_CONTACT=URLx-nexus:/chi.grid.umich.edu:58 964/, HOME=/home/abose,LOGNAME=abose,LD_LIBRARY_PATH= #Change to directory requested by user cd /home/abose /usr/gmpi.pgi/bin/mpirun np 4 /home

33、/abose/test.exe t 100 f out.dat,Slides taken from NPACI Training, 2003,Programming with Globus API, Command line programs syntax: grid_* or globus_* Function calls/APIs start with globus_* Library binaries start with libglobus_*.a Includes: #include /defines most common data structures and others de

34、pending on which modules/functions are called in the program. Module Activation/Deactivation: - Functions are arranged in several modules. The corresponding modules must be activated before calling a function: globus_module_activate(MODULE_NAME) globus_module_deactivate(MODULE_NAME) globus_module_de

35、activate_all() GLOBUS_SUCCESS (0) is returned if successful. Example Module Names: GLOBUS_GRAM_CLIENT_MODULE GLOBUS_IO_MODULE GLOBUS_GASS_COPY_MODULE Dependencies among module activations exist. Read API documentation.,评 价,优点: 增加了对JOB资源的描述 定义了很多Attribute,支持GRAM、DUROC等多种资源管理方式 缺点: 也是偏重于对计算资源和资源请求的描述,

36、不够广泛 可扩展性不好 目前仅用于Globus,还不被其他Grid项目所支持,WWW服务描述语言WSDL,WSDL,Web Service Description Language 用于描 述Web服务的技术调用语法。 WSDL定义了一套基于 XML的语法,将Web服务描述为能够进行消息交换的服务访问点的集合,从而满足了这种需求。 WSDL服务定义为分布式系统提供了可机器识别的SDK文档,并且可用于描述自动执行应用程序通信中所涉及的细节。 WSDL的当前版本是1.1,规范可以从http:/www.w3.org/TR/wsdl获得。,WSDL,WSDL由Ariba、Intel、IBM和微软等开发

37、商提出。 它用一种和具体语言无关的抽象方式定义了给定Web服务收发的有关操作和消息。 WSDL保持协议中立,但它确实内建了绑定SOAP的支持,从而同SOAP建立了不可分割的联系。,WSDL的信息模型,WSDL信息模型充分利用了抽象规范与规范具体实现的分离,也就是分离了服务接口定义(抽象接口)与服务实现定义(具体端点)。 抽象接口规范描述了终端的处理能力,它在WSDL中表示为portType。束定机制 (binding mechanism)在WSDL中表示为binding元素,它使用特定的通信协议、数据编码模型和底层通信协议,将Web服务的抽象定义映射至特定实现。若束定结合了实现的访问地址,抽象

38、端点也就成为可供服务请求者调用的具体端点(concrete endpoint),WSDL的port元素表示了这一结合。 抽象接口可以支持任何数量的操作(operations)。操作是由一组消息(messages)定义,消息定义了操作的交互定式。与抽象的消息、操作概念相对应的具体实现是由binding元素指定。与XML应用相同,WSDL模式定义了几个高层元素,或称为主要元素。,WSDL描述的基本属性,服务做些什么-服务所提供的操作(方法)。 如何访问服务-数据格式详情以及访问服务操作的必要协议。 服务位于何处-由特定协议决定的网络地址,如URL。,WSDL基本元素的含义,WSDL信息模型,WSD

39、L对象结构图,WSDL文档类型,WSDL文档结构,WSDL工具,Omniopera-图形用户界面的WSDI、XML和XSD编辑器。 Microsoft的SOAP Toolkit-一种工具包,其中包括根据WSDL定义创建COM接口的向导程序,还包括根据COM接口创建WSDL的向导程序。 IBM的Web Services Toolkit-一种工具包,其中包括产生WSDL和SOAP部署说明的向导程序。,资源描述框架RDF,RDF,Resource Description Framework, RDF W3C的资源描述框架(RDF)的目的是提供一个访问网络资源元数据(metadata)的标准,因此也提

40、供了一个描述特定资源内容的标准协议。 W3C应用元数据时的推荐标准 是一个模型,一种句法(syntax(es) 应用在Web上时,RDF 通常用XML来编码 是语义万维网 (semantic Web)的基础、支撑 W3C - Resource Description Framework (RDF) http:/www.w3.org/RDF/,RDF,是一个用于表达关于万维网(World Wide Web)上的资源的信息的语言。 专门用于表达关于Web资源的元数据, 比如Web页面的标题、作者和修改时间,Web文档的版权和许可信息,某个被共享资源的可用计划表等,为什么要使用 RDF?,RDF提供

41、共享元数据的模型(model) 共享语义(meaning) 元数据可以在相互了解不多或根本不了解的应用之间共享 例如一个基于RDF的书目应用能够吸收基于RDF的地理空间应用的元数据并对其意义有所理解。 用(X)HTML和XML置标后,软件应用必须能够理解复杂的编码,RDF的基本思想,用Web标识符(称作统一资源标识符,Uniform Resource Identifiers或URIs)来标识事物,用简单的属性(property)及属性值来描述资源。这使得RDF可以将一个或多个关于资源的简单陈述表示为一个由结点和弧组成的图(graph),其中的结点和弧代表资源、属性或属性值。,举例,有一个人由h

42、ttp:/www.w3.org/People/EM/contact#me 标识, 他的名字是Eric Miller, 他的电子邮件地址是emw3.org,他的头衔是Dr., Eric Miller Dr. ,统一资源标识符URI,URI,Uniform Resource Identifiers,URI 是一种简单的可扩展的指定资源的方法,URI的同一性,尽管不同资源的访问机制可能不同,但URI允许不同类型的资源标识符在相同的上下文环境中使用 URI允许用统一的语义解释跨越不同类型资源标识符的通用语法规范 URI可以在不影响已有的标识符系统的情况下,引入新类型标识符 URI允许在多种不同的环境中

43、重用同一个标识符 URI允许新的应用或协议采用已经存在的、广泛使用的资源标识符,URI举例,ftp:/ftp.is.co.za/rfc/rfc1808.txt Gopher:/spinaltap.micro.umn.edu/00/Weather/Californian/LosAngeles http:/ mailto: News:comp.infosystem.www.servers.unix telnet:/,URI=URL+URN,URL(Uniform Resource Locators)统一资源定位器 URN(Uniform Resource Name)统一

44、资源名字 从不同角度标识一个资源,URL,一般形式是 : scheme: ftp,http,Gopher,mailto,news,nntp,telnet,wais,File,prospero /:/,URN,:=“urn:”“:” :名字空间标识符 是符合名字空间规范的字符串 NID :=1,31 :=|“-” := :=| Upper:大写字母,lower:小写字母,number:数字,URN, :=I* :=|“%” :=|“A”|“B”|“C”| “D”| “E”| “F”| “a”| “b”| “c” |“d” |“e”| “f” := “(”|“)”|“+”| “,”| “-”| “

45、.”| “:”| “=”| “” |“;” |“$”| “_” |“!”|“*”|“”,LDAP中的资源描述,LDAP,用一系列“属性对”的形式来存储记录项,每一个记录项包括类型和属性值。,举例,dn:cn=My Computer, ou=devices, dc=sdu, dc= cn:FB Computer usage:computing resouce:866MHZ resource:512M memory resource:60GB storage resource:Linux OS,资源命名,资源命名的意义和作用,资源名可以把资源进一步抽象,将资源的标识和资源的位置分离开来 资源命名机制可以建立虚拟空间,扩大或缩小用户空间 实现资源的按名访问,方便用户使用,资源的命名有不同形式 逻辑名称:方便用户,便于记忆 物理名称:实际名字 内部名称:系统内部 外部名称:外部提供用户 命名规则,防止冲突 风格统一 全球唯一,


