Jobs¶
The API offers methods to retrieve the list of jobs and their status, so that they can be monitored. Additionally, new jobs can be created to build datasets.
Reading the jobs’ status¶
The list of all jobs, finished or not, can be fetched with the list_jobs() method. For example, to retrieve job failures posterior to a given date:
date = '2015/09/24'
date_as_timestamp = int(datetime.datetime.strptime(date, "%Y/%m/%d").strftime('%s')) * 1000
project = client.get_project('TEST_PROJECT')
jobs = project.list_jobs()
failed_jobs = [job for job in jobs if job['state'] == 'FAILED' and job['def']['initiationTimestamp'] >= date_as_timestamp]
The method list_jobs() returns all job information for each job, as a JSON object. Important fields are:
{
'def': { 'id': 'build_cat_train_hdfs_NP_2015-09-28T09-17-37.455', # the identifier for the job
'initiationTimestamp': 1443431857455, # timestamp of when the job was submitted
'initiator': 'API (aa)',
'mailNotification': False,
'name': 'build_cat_train_hdfs_NP',
'outputs': [ { 'targetDataset': 'cat_train_hdfs', # the dataset(s) built by the job
'targetDatasetProjectKey': 'IMPALA',
'targetPartition': 'NP',
'type': 'DATASET'}],
'projectKey': 'IMPALA',
'refreshHiveMetastore': False,
'refreshIntermediateMirrors': True,
'refreshTargetMirrors': True,
'triggeredFrom': 'API',
'type': 'NON_RECURSIVE_FORCED_BUILD'},
'endTime': 0,
'stableState': True,
'startTime': 0,
'state': 'ABORTED', # the stable state of the job
'warningsCount': 0}
The id field is needed to get a handle of the job and call abort() or get_log() on it.
Starting new jobs¶
Datasets can be built by creating a job of which they are the output. A job is created by building a job definition and starting it. For a simple non-partitioned dataset, this is done with:
project = client.get_project('TEST_PROJECT')
definition = {
"type" : "NON_RECURSIVE_FORCED_BUILD",
"outputs" : [{
"id" : "dataset_to_build",
"partition" : "NP"
}]
}
job = project.start_job(definition)
state = ''
while state != 'DONE' and state != 'FAILED' and state != 'ABORTED':
time.sleep(1)
state = job.get_status()['baseStatus']['state']
# done!
The start_job() method returns a job handle that can be used to later abort the job. Other jobs can be aborted once their id is known. For example, to abort all jobs currently being processed:
project = client.get_project('TEST_PROJECT')
for job in project.list_jobs():
if job['stableState'] == False:
project.get_job(job['def']['id']).abort()
Reference documentation¶
..autoclass:: dataikuapi.dss.project.JobDefinitionBuilder
-
class
dataikuapi.dss.job.
DSSJob
(client, project_key, id)¶ A job on the DSS instance
-
abort
()¶ Aborts the job
-
get_status
()¶ Get the current status of the job
- Returns:
- the state of the job, as a JSON object
-
get_log
(activity=None)¶ Get the logs of the job
- Args:
- activity: (optional) the name of the activity in the job whose log is requested
- Returns:
- the log, as a string
-