cms

cms #

CMS-related contrib package. https://home.cern/about/experiments/cms

Task `CrabWorkflow`#

class CrabWorkflow(*args, **kwargs)#

Bases: BaseRemoteWorkflow

workflow_proxy_cls#: alias of CrabWorkflowProxy

crab_workflow_run_context()#: Hook to provide a context manager in which the workflow run implementation is placed. This can be helpful in situations where resurces should be acquired before and released after running a workflow.

abstract crab_stageout_location()#

Hook to define both the “Site.storageSite” and “Data.outLFNDirBase” settings in a 2-tuple, i.e., the name of the storage site to use and the base directory for crab’s own output staging. An example would be ("T2_DE_DESY", "/store/user/...").

In case this is not used, the choice of the output base has no affect, but is still required for crab’s job submission to work.

abstract crab_output_directory()#: Hook to define the location of submission output files, such as the json files containing job data. This method should return a FileSystemDirectoryTarget.

crab_request_name(submit_jobs)#: Returns a name for a request, i.e., the project directory inside the crab job working area.

crab_work_area()#: Returns the location of the crab working area, defaulting to the value of crab_output_directory() in case it refers to a local directory. When None, the value of the “job.crab_work_area” configuration options is used.

crab_job_file()#: Hook to return the location of the job file that is executed on job nodes.

crab_bootstrap_file()#: Hook to define the location of an optional, so-called bootstrap file that is sent alongside jobs and called prior to the actual job payload. It is meant to run a custom setup routine in order for the payload to run successfully (e.g. software setup, data retrieval).

crab_stageout_file()#: Hook to define the location of an optional, so-called stageout file that is sent alongside jobs and called after to the actual job payload. It is meant to run a custom output stageout routine if required so by your workflow or target storage element.

crab_workflow_requires()#: Hook to define requirements for the workflow itself and that need to be resolved before any submission can happen.

crab_output_postfix()#: Hook to define the postfix of outputs, for instance such that workflows with different parameters do not write their intermediate job status information into the same json file.

crab_output_uri()#: Hook to return the URI of the remote crab output directory.

crab_job_resources(job_num, branches)#: Hook to define resources for a specific job with number job_num, processing branches. This method should return a dictionary.

crab_job_manager_cls()#: Hook to define a custom job managet class to use.

crab_create_job_manager(**kwargs)#: Hook to configure how the underlying job manager is instantiated and configured.

crab_job_file_factory_cls()#: Hook to define a custom job file factory class to use.

crab_create_job_file_factory(**kwargs)#: Hook to configure how the underlying job file factory is instantiated and configured.

crab_job_config(config, job_num, branches)#: Hook to inject custom settings into the job config, which is an instance of the Config class defined inside the job manager.

crab_dump_intermediate_job_data()#: Whether to dump intermediate job data to the job submission file while jobs are being submitted.

crab_use_local_scheduler()#: Whether remote jobs should use a local scheduler.

crab_post_submit_delay()#: Configurable delay in seconds to wait after submitting jobs and before starting the status polling.

crab_check_job_completeness()#: Hook to define whether

crab_check_job_completeness_delay()#: Grace period before crab_check_job_completeness() is called to ensure that output files are accessible. Especially useful on distributed file systems with possibly asynchronous behavior.

crab_poll_callback(poll_data)#

Configurable callback that is called after each job status query and before potential resubmission. It receives the variable polling attributes poll_data (PollData) that can be changed within this method.

If False is returned, the polling loop is gracefully terminated. Returning any other value does not have any effect.

crab_post_poll_callback(success, duration)#: Configurable callback that is called after the polling loop has ended. It receives a boolean success that indicates whether the job polling was successful, and the duration of the job polling in seconds.

crab_cmdline_args()#: Hook to add additional cli parameters to “law run” commands executed on job nodes.

crab_destination_info(info)#: Hook to add additional information behind each job status query line by extending an info dictionary whose values will be shown separated by comma.

Task `BundleCMSSW`#

class BundleCMSSW(*args, **kwargs)#

Bases: Task

task_namespace = 'law.cms'#

This value can be overridden to set the namespace that will be used. (See Task.namespaces_famlies_and_ids) If it’s not specified and you try to read this value anyway, it will return garbage. Please use get_task_namespace() to read the namespace.

Note that setting this value with @property will not work, because this is a class level value.

output()#

The output that this Task produces.

The output of the Task determines if the Task needs to be run–the task is considered finished iff the outputs all exist. Subclasses should override this method to return a single Target or a list of Target instances.

Implementation note: If running multiple workers, the output must be a resource that is accessible by all workers, such as a DFS or database. Otherwise, workers might compute the same output since they don’t see the work done by other workers.

See Task.output

run()#

The task run method, to be overridden in a subclass.

See Task.run

Class `CrabJobManager`#

class CrabJobManager(sandbox_name=None, proxy_file=None, myproxy_username=None, instance=None, threads=1)#

Bases: BaseJobManager

class JobId(crab_num, task_name, proj_dir)#

Bases: tuple

crab_num#: Alias for field number 0

proj_dir#: Alias for field number 2

task_name#: Alias for field number 1

classmethod cast_job_id(job_id)#: Converts a job_id, for instance after json deserialization, into a JobId object.

group_job_ids(job_ids)#: Hook that needs to be implemented if the job mananger supports grouping of jobs, i.e., when job_grouping_submit, job_grouping_query, etc. is True, and potentially used during status queries, job cancellation and removal. If so, it should take a sequence of job_ids and return a dictionary mapping ids of group jobs (used for queries etc) to the corresponding lists of original job ids, with an arbitrary grouping mechanism.

submit(job_file, job_files=None, proxy_file=None, myproxy_username=None, instance=None, retries=0, retry_delay=3, silent=False, _processes=None)#: Abstract atomic or group job submission. Can throw exceptions. Should return a list of job ids.

cancel(proj_dir, job_ids=None, proxy_file=None, myproxy_username=None, instance=None, silent=False, _processes=None)#: Abstract atomic or group job cancellation. Can throw exceptions. Should return a dictionary mapping job ids to per-job return values.

cleanup(proj_dir, job_ids=None, proxy_file=None, myproxy_username=None, instance=None, silent=False, _processes=None)#: Abstract atomic or group job cleanup. Can throw exceptions. Should return a dictionary mapping job ids to per-job return values.

query(proj_dir, job_ids=None, proxy_file=None, myproxy_username=None, instance=None, skip_transfers=None, silent=False, _processes=None)#: Abstract atomic or group job status query. Can throw exceptions. Should return a dictionary mapping job ids to per-job return values.

Class `CrabJobFileFactory`#

class CrabJobFileFactory(file_name='crab_job.py', executable=None, arguments=None, work_area=None, request_name=None, input_files=None, output_files=None, storage_site=None, output_lfn_base=None, vo_group=None, vo_role=None, custom_content=None, absolute_paths=False, **kwargs)#

Bases: BaseJobFileFactory

create(**kwargs)#: Abstract job file creation method that must be implemented by inheriting classes.

Class `CMSSWSandbox`#

class CMSSWSandbox(*args, **kwargs)#: Bases: BashSandbox

Class `CMSJobDashboard`#

class CMSJobDashboard(task, cms_user, voms_user, apmon_config=None, log_level='WARNING', max_rate=20, task_type='analysis', site=None, executable='law', application=None, application_version=None, submission_tool='law', submission_type='direct', submission_ui=None, init_timestamp=None)#

Bases: BaseJobDashboard

This CMS job dashboard interface requires apmon to be installed on your system. See http://monalisa.caltech.edu/monalisa__Documentation__ApMon_User_Guide__apmon_ug_py.html and https://twiki.cern.ch/twiki/bin/view/ArdaGrid/CMSJobMonitoringCollector.

classmethod map_status(job_status, event)#

Maps the job_status (see law.job.base.BaseJobManager) for a particular event to the status name that is accepted by the implemented job dashobard. Possible events are:

action.submit

action.cancel

status.pending

status.running

status.finished

status.retry

status.failed

remote_hook_file()#: This method can return the path to a file that is considered as an input file to remote jobs. This file can contain bash functions, environment variables, etc., that are necessary to communicate with the implemented job dashboard. When None is returned, no file is sent.

remote_hook_data(job_num, attempt)#: This method can return a dictionary that is sent with remote jobs in the format key1=value1 key2=value2 .... The returned dictionary should (but does not have to) include the job number job_num and the retry attempt.

create_tracking_url()#: This method can return a tracking url that refers to a web page that visualizes jobs. When set, the url is shown in the central luigi scheduler.

publish(*args, **kwargs)#: Publishes the status of a job to the implemented job dashboard. job_data is a dictionary that contains a job_id and a status string (see law.workflow.remote.StatusData.job_data()).

Class `Site`#

class Site(name=None)#

Bases: object

Helper class that provides site-related data, mostly via simple properties. When name is None, the name of the site is used that the instance of this class is instantiated on. Example:

site = Site()  # executed on T2_DE_RWTH
print(site.name)        # "T2_DE_RWTH"
print(site.country)     # "DE"
print(site.redirector)  # "xrootd-cms.infn.it"

site = Site("T1_US_FNAL")
print(site.name)        # "T1_US_FNAL"
print(site.country)     # "US"
print(site.redirector)  # "cmsxrootd.fnal.gov"

classattribute redirectors#

type: dict

A mapping of country codes to redirectors.

name#

type: string

The name of the site, e.g. T2_DE_RWTH. This is either the name provided in the constructor or it is determined for the current site by reading environment variables.

classmethod get_name_from_env()#: Tries to extract the local site name from the environment. Returns the name on succcess and None otherwise.

property info#: Tier, country and locality information in a 3-tuple, e.g. ("T2", "DE", "RWTH").

property tier#: The tier of the site, e.g. T2.

property country#: The country of the site, e.g. DE.

property locality#: The locality of the site, e.g. RWTH.

property redirector#: The XRD redirector that should be used on this site. For more information on XRD, see this link.

Functions #

lfn_to_pfn(lfn, redirector='global')#: Converts a logical file name lfn to a physical file name pfn using a redirector. Valid values for redirector are defined by Site.redirectors.

renew_vomsproxy(**kwargs)#: Renews a VOMS proxy in the exact same way that law.wlcg.renew_vomsproxy() does, but with the vo argument default to the environment variable LAW_CMS_VO or "cms" when empty.

delegate_myproxy(**kwargs)#: Delegates a X509 proxy to a myproxy server in the exact same way that law.wlcg.delegate_myproxy() does, but with the vo argument default to the environment variable LAW_CMS_VO or "cms" when empty.

cms

Contents