Skip to main content

Jobs - short lived processes

Jobs in CGC are perfect for running tasks that have a clear beginning and end. Unlike regular resources that run continuously, jobs execute their task and then stop. This guide covers everything you need to know about creating and managing jobs.

  • One-time execution workloads
  • Configurable TTL (Time To Live) for automatic cleanup
  • Support for parallel processing
  • Retry policies for failed executions
  • Can wait in Pending state, when resources are not available within the CGC instance

Creating jobs

info

job module is very similar to using the resource module

import cgc.sdk.job as job

# Create a simple job
response = job.job_create(
name="hello-world-job",
image_name="busybox:latest",
startup_command="echo 'Hello, World!'",
cpu=1,
memory=1
)

if response['code'] == 200:
print("Job created successfully!")

TTL (Time To Live) management

Understanding TTL

TTL controls the ttlSecondsAfterFinished field in Kubernetes Jobs, enabling automatic garbage collection.

Job that is completed, stays for logging purposes. That parameter allows of auto-removal of finished job (either successful or a failed one).

Listing jobs

Get all jobs

import cgc.sdk.job as job

# List all jobs
jobs_response = job.job_list()

if jobs_response['code'] == 200:
job_list = jobs_response['details'].get('job_list', [])

for j in job_list:
name = j.get('name', 'unknown')
status = j.get('status', {}).get('phase', 'unknown')
created = j.get('created_at', 'unknown')
print(f"Job: {name}")
print(f" Status: {status}")
print(f" Created: {created}")
print()

Filter jobs by status

def get_jobs_by_status(target_status):
"""Get jobs with specific status"""
response = job.job_list()

if response['code'] != 200:
return []

matching_jobs = []
for j in response['details'].get('job_list', []):
if j.get('status', {}).get('phase') == target_status:
matching_jobs.append(j)

return matching_jobs

# Get all running jobs
running = get_jobs_by_status('Running')
print(f"Found {len(running)} running jobs")

# Get all succeeded jobs
succeeded = get_jobs_by_status('Succeeded')
print(f"Found {len(succeeded)} succeeded jobs")

Deleting jobs

Basic deletion

import cgc.sdk.job as job

# Delete a specific job
response = job.job_delete("my-job-name")

if response['code'] == 200:
print("Job deleted successfully")
else:
print(f"Failed to delete job: {response['message']}")

Batch deletion

def delete_completed_jobs():
"""Delete all completed jobs"""
response = job.job_list()

if response['code'] != 200:
print("Failed to list jobs")
return

deleted_count = 0
for j in response['details'].get('job_list', []):
job_phase = j.get('status', {}).get('phase')
if job_phase == 'Succeeded':
delete_response = job.job_delete(j['name'])
if delete_response['code'] == 200:
deleted_count += 1
print(f"Deleted: {j['name']}")

print(f"Total deleted: {deleted_count}")

# Clean up completed jobs
delete_completed_jobs()

Best practices

Naming conventions

from datetime import datetime

def generate_job_name(prefix, suffix=None):
"""Generate unique job name with timestamp"""
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
if suffix:
return f"{prefix}-{suffix}-{timestamp}"
return f"{prefix}-{timestamp}"

# Use it
job_name = generate_job_name("backup", "postgres")
# Result: "backup-postgres-20240115-143022"

Troubleshooting common issues

Job stays in Pending state

  • Check resource availability
  • Verify image can be pulled

Job fails immediately

  • Check startup command syntax
  • Verify environment variables
  • Check image entrypoint

Job runs but doesn't complete

  • Job might be waiting for input
  • Process might be hanging
  • Check for infinite loops