Jobs - short lived processes
Jobs in CGC are perfect for running tasks that have a clear beginning and end. Unlike regular resources that run continuously, jobs execute their task and then stop. This guide covers everything you need to know about creating and managing jobs.
- One-time execution workloads
- Configurable TTL (Time To Live) for automatic cleanup
- Support for parallel processing
- Retry policies for failed executions
- Can wait in Pending state, when resources are not available within the CGC instance
Creating jobs
info
job module is very similar to using the resource module
import cgc.sdk.job as job
# Create a simple job
response = job.job_create(
name="hello-world-job",
image_name="busybox:latest",
startup_command="echo 'Hello, World!'",
cpu=1,
memory=1
)
if response['code'] == 200:
print("Job created successfully!")
TTL (Time To Live) management
Understanding TTL
TTL controls the ttlSecondsAfterFinished
field in Kubernetes Jobs, enabling automatic garbage collection.
Job that is completed
, stays for logging purposes. That parameter allows of auto-removal of finished job (either successful or a failed one).
Listing jobs
Get all jobs
import cgc.sdk.job as job
# List all jobs
jobs_response = job.job_list()
if jobs_response['code'] == 200:
job_list = jobs_response['details'].get('job_list', [])
for j in job_list:
name = j.get('name', 'unknown')
status = j.get('status', {}).get('phase', 'unknown')
created = j.get('created_at', 'unknown')
print(f"Job: {name}")
print(f" Status: {status}")
print(f" Created: {created}")
print()
Filter jobs by status
def get_jobs_by_status(target_status):
"""Get jobs with specific status"""
response = job.job_list()
if response['code'] != 200:
return []
matching_jobs = []
for j in response['details'].get('job_list', []):
if j.get('status', {}).get('phase') == target_status:
matching_jobs.append(j)
return matching_jobs
# Get all running jobs
running = get_jobs_by_status('Running')
print(f"Found {len(running)} running jobs")
# Get all succeeded jobs
succeeded = get_jobs_by_status('Succeeded')
print(f"Found {len(succeeded)} succeeded jobs")
Deleting jobs
Basic deletion
import cgc.sdk.job as job
# Delete a specific job
response = job.job_delete("my-job-name")
if response['code'] == 200:
print("Job deleted successfully")
else:
print(f"Failed to delete job: {response['message']}")
Batch deletion
def delete_completed_jobs():
"""Delete all completed jobs"""
response = job.job_list()
if response['code'] != 200:
print("Failed to list jobs")
return
deleted_count = 0
for j in response['details'].get('job_list', []):
job_phase = j.get('status', {}).get('phase')
if job_phase == 'Succeeded':
delete_response = job.job_delete(j['name'])
if delete_response['code'] == 200:
deleted_count += 1
print(f"Deleted: {j['name']}")
print(f"Total deleted: {deleted_count}")
# Clean up completed jobs
delete_completed_jobs()
Best practices
Naming conventions
from datetime import datetime
def generate_job_name(prefix, suffix=None):
"""Generate unique job name with timestamp"""
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
if suffix:
return f"{prefix}-{suffix}-{timestamp}"
return f"{prefix}-{timestamp}"
# Use it
job_name = generate_job_name("backup", "postgres")
# Result: "backup-postgres-20240115-143022"
Troubleshooting common issues
Job stays in Pending state
- Check resource availability
- Verify image can be pulled
Job fails immediately
- Check startup command syntax
- Verify environment variables
- Check image entrypoint
Job runs but doesn't complete
- Job might be waiting for input
- Process might be hanging
- Check for infinite loops