Debugging Guide

This guide covers practical techniques for understanding what your Ratchet jobs are doing at runtime, from event tracing to database-level inspection. JobContext.logger() exists on every job and is backed by JBoss Logging by default; database persistence of those log lines is optional.

Using JobContext.logger()

Every job has access to a JobContext that provides a job-scoped logger. The default logger writes to the runtime logging backend and publishes JobLogLine events. If your application observes and persists those events, the entries can be stored in scheduler_job_log, giving you a structured execution trace tied to each job ID.

@ApplicationScoped
public class DataImportService {

    public void importData() {
        JobContext ctx = JobContext.current();
        JobLogger log = ctx.logger();

        log.info("Starting import for job " + ctx.jobId());

        String batchSize = ctx.param("batchSize", "100");
        log.info("Using batch size: " + batchSize);

        try {
            int rows = processRecords(Integer.parseInt(batchSize));
            log.info("Imported " + rows + " rows successfully");
        } catch (Exception e) {
            log.error("Import failed: " + e.getMessage());
            throw e;
        }
    }
}

Querying Job Logs

If your application persists JobLogLine events, you can retrieve them after a job runs:

-- Get all log entries for a specific job, ordered chronologically
SELECT ts, level, message
FROM scheduler_job_log
WHERE job_id = '01902c4e-c4f3-7b8a-9d3e-fedcba987654'
ORDER BY ts ASC;

-- Find jobs that logged errors in the last hour
SELECT DISTINCT jl.job_id, j.status, j.last_error, jl.message
FROM scheduler_job_log jl
JOIN scheduler_job j ON j.job_id = jl.job_id
WHERE jl.level = 'ERROR'
  AND jl.ts >= NOW() - INTERVAL '1 hour'
ORDER BY jl.ts DESC;

Using Job Parameters for Debug Context

Attach parameters at submission time to carry context through to execution:

scheduler.enqueue(importService::importData)
    .withParam("batchSize", "500")
    .withParam("source", "quarterly-report")
    .withParam("requestedBy", currentUser.getName())
    .submit();

These parameters are accessible via JobContext.current().param("key") and stored in the params column of scheduler_job:

-- Find jobs by parameter values
SELECT job_id, status, params::jsonb ->> 'source' as source
FROM scheduler_job
WHERE params::jsonb ->> 'source' = 'quarterly-report';

Event Listeners for Tracing Job Lifecycle

For real-time debugging, register an event listener that logs all lifecycle transitions:

@ApplicationScoped
public class LifecycleTracer {

    private static final Logger log = Logger.getLogger(LifecycleTracer.class.getName());

    @Inject
    JobSchedulerService scheduler;

    private Consumer<Object> listener;

    public void startTracing() {
        listener = event -> {
            switch (event) {
                case JobStartedEvent e ->
                    log.info("[TRACE] Job " + e.getJobId() + " STARTED on " + e.getNodeId()
                        + " [type=" + e.getJobType() + ", priority=" + e.getPriority() + "]");

                case JobCompletedEvent e ->
                    log.info("[TRACE] Job " + e.getJobId() + " COMPLETED in "
                        + e.getExecutionTimeMs() + " ms");

                case JobRetryingEvent e ->
                    log.warning("[TRACE] Job " + e.getJobId() + " RETRYING (attempt "
                        + e.getAttempt() + "), next at " + e.getNextScheduledTime()
                        + ", error: " + e.getErrorMessage());

                case JobDlqEvent e ->
                    log.severe("[TRACE] Job " + e.getJobId() + " -> DLQ after "
                        + e.getAttempt() + " attempts: " + e.getErrorMessage());

                case JobCancelledEvent e ->
                    log.info("[TRACE] Job " + e.getJobId() + " CANCELLED from "
                        + e.getPreviousStatus() + " after " + e.getExecutionTimeMs() + " ms");

                case BatchCompletedEvent e ->
                    log.info("[TRACE] Batch " + e.getJobId() + " COMPLETED");

                case ChainFailedEvent e ->
                    log.severe("[TRACE] Chain " + e.getJobId() + " FAILED");

                default -> log.fine("[TRACE] " + event.getClass().getSimpleName()
                    + " for job " + ((AbstractJobSchedulerEvent) event).getJobId());
            }
        };
        scheduler.addEventListener(listener);
    }

    public void stopTracing() {
        if (listener != null) {
            scheduler.removeEventListener(listener);
        }
    }
}

CDI @Observes for Event Monitoring

CDI observers provide type-safe event handling. Use specific event types to narrow what you observe:

@ApplicationScoped
public class SchedulerMonitor {

    private static final Logger log = Logger.getLogger(SchedulerMonitor.class.getName());

    /**
     * Track slow jobs. Fires after every successful completion.
     */
    public void onJobCompleted(@Observes JobCompletedEvent event) {
        if (event.getExecutionTimeMs() > 60_000) {
            log.warning("Slow job detected: " + event.getJobId()
                + " took " + event.getExecutionTimeMs() + " ms"
                + " [type=" + event.getJobType() + "]");
        }
    }

    /**
     * Alert on DLQ entries. Fires when a job exhausts all retries.
     */
    public void onDlq(@Observes JobDlqEvent event) {
        log.severe("DLQ ALERT: Job " + event.getJobId()
            + " failed permanently after " + event.getRetryAttempt() + " attempts"
            + " - " + event.getErrorMessage());
        // Send alert to monitoring system, Slack, etc.
    }

    /**
     * Monitor retry patterns. Fires each time a job is rescheduled for retry.
     */
    public void onRetry(@Observes JobRetryingEvent event) {
        log.info("Retry: Job " + event.getJobId()
            + " attempt " + event.getRetryAttempt()
            + " scheduled for " + event.getScheduledTime());
    }
}

Database Queries to Inspect Job State

Find a Specific Job

-- Full job details by ID
SELECT job_id, status, job_type, priority,
       attempts, max_retries, backoff_policy,
       scheduled_time, picked_by, picked_at,
       execution_start_time, execution_end_time, execution_duration_ms,
       last_error, business_key, resource_name,
       payload::jsonb ->> 'target' as target_class,
       payload::jsonb ->> 'method' as method_name
FROM scheduler_job
WHERE job_id = '01902c4e-c4f3-7b8a-9d3e-fedcba987654';

Find Failed Jobs by Error Pattern

-- Find jobs that failed with a specific exception type
SELECT job_id, status, attempts, max_retries, last_error,
       payload::jsonb ->> 'target' as target_class,
       payload::jsonb ->> 'method' as method_name,
       created_at, updated_at
FROM scheduler_job
WHERE status = 'FAILED'
  AND last_error LIKE '%ConnectionTimeout%'
ORDER BY updated_at DESC
LIMIT 20;

View Execution History for a Job

Each job execution attempt is recorded in scheduler_job_execution:

-- See all attempts for a job
SELECT id, attempt, node_id, started_at, ended_at, status,
       duration_ms, error_message, error_class
FROM scheduler_job_execution
WHERE job_id = '01902c4e-c4f3-7b8a-9d3e-fedcba987654'
ORDER BY attempt ASC;

Monitor Job Throughput

-- Jobs completed per hour over the last 24 hours
SELECT date_trunc('hour', execution_end_time) as hour,
       COUNT(*) as completed,
       AVG(execution_duration_ms) as avg_ms,
       MAX(execution_duration_ms) as max_ms
FROM scheduler_job
WHERE status = 'SUCCEEDED'
  AND execution_end_time >= NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour DESC;

Check Node Health

-- See which nodes are active and when they last checked in
SELECT node_id, heartbeat_ts, started_at,
       EXTRACT(EPOCH FROM (NOW() - heartbeat_ts)) as seconds_since_heartbeat
FROM scheduler_node
ORDER BY heartbeat_ts DESC;

A seconds_since_heartbeat greater than the orphan grace period (default 60s) indicates a dead or unresponsive node. The OrphanRecoveryTimer will recover jobs from stale nodes automatically.

Inspect Batch Progress

-- Batch completion status
SELECT b.batch_id, j.status as parent_status, j.business_key,
       b.total_items, b.completed_items, b.failed_items,
       b.completion_processed,
       ROUND(100.0 * (b.completed_items + b.failed_items) / GREATEST(b.total_items, 1), 1) as pct_done
FROM scheduler_batch b
JOIN scheduler_job j ON j.job_id = b.batch_id
WHERE j.status IN ('PENDING', 'RUNNING')
ORDER BY b.batch_id DESC;

Find Orphaned Jobs

-- Jobs stuck in RUNNING on nodes that have gone stale
SELECT j.job_id, j.picked_by, j.picked_at,
       EXTRACT(EPOCH FROM (NOW() - j.picked_at)) / 60 as stuck_minutes,
       n.heartbeat_ts
FROM scheduler_job j
LEFT JOIN scheduler_node n ON n.node_id = j.picked_by
WHERE j.status = 'RUNNING'
  AND (n.node_id IS NULL OR n.heartbeat_ts < NOW() - INTERVAL '60 seconds')
ORDER BY j.picked_at ASC;

Enabling Debug Logging

By Component

Set fine-grained log levels to focus on the subsystem you are debugging:

What you are debugging	Logger to set to FINE/DEBUG
Job not being picked up	`run.ratchet.ri.core.Poller`
Job execution failures	`run.ratchet.ri.core.JobTask`
Retry/backoff decisions	`run.ratchet.ri.core.JobTask`
Security/ClassPolicy rejections	`run.ratchet.ri.security`
Circuit breaker behavior	`run.ratchet.ri.resilience`
Thread pool saturation	`run.ratchet.ri.core.ThreadPoolManager`
Orphan recovery	`run.ratchet.ri.core.OrphanRecoveryTimer`
Timeout enforcement	`run.ratchet.ri.core.JobTimeoutHandler`
CDI bean resolution	`run.ratchet.ri.cdi`
Recurring job scheduling	`run.ratchet.ri.core.RecurringScheduler`
Batch orchestration	`run.ratchet.ri.core.BatchService`

Full Debug Mode

To enable debug logging for all of Ratchet (generates significant output):

WildFly CLI:

/subsystem=logging/logger=run.ratchet:add(level=FINE)

Payara asadmin:

asadmin set-log-levels run.ratchet=FINE

Open Liberty server.xml:

<logging traceSpecification="run.ratchet.*=fine"/>

Reading the Log Output

Ratchet logs follow a consistent pattern. Here is what a healthy job execution looks like:

INFO  JobTask - Job 12345 starting execution [type=SINGLE, priority=NORMAL, attempt=1/4, payload=MyService.processData]
INFO  JobTask - Job 12345 resolving target: com.myapp.MyService.processData (static=false)
INFO  JobTask - Job 12345 succeeded in 234 ms
INFO  JobTask - Job 12345 execution complete - cleaning up context

A failed job with retry:

SEVERE JobTask - Job 12345 failed with java.net.ConnectException: Connection refused
WARNING JobTask - Job 12345 retrying in 2000 ms (attempt 1/3) due to: java.net.ConnectException: Connection refused

A job that exhausted retries:

SEVERE JobTask - Job 12345 failed with java.net.ConnectException: Connection refused
SEVERE JobTask - Job 12345 moved to DLQ after 4 attempts

Debugging Checklist

When a job is not behaving as expected, work through this checklist:

Check the job status: SELECT status, last_error FROM scheduler_job WHERE job_id = ?
Check execution history: SELECT * FROM scheduler_job_execution WHERE job_id = ? ORDER BY attempt
Check per-job logs (if enabled): SELECT * FROM scheduler_job_log WHERE job_id = ? ORDER BY ts
Check if the node is alive: SELECT * FROM scheduler_node WHERE node_id = ?
Check the payload: SELECT payload FROM scheduler_job WHERE job_id = ? -- verify the target class and method are correct
Enable debug logging for the relevant subsystem (see table above)
Register an event listener to watch lifecycle events in real time
Check for resource contention: SELECT * FROM scheduler_resource_permit WHERE job_id = ?

Using JobContext.logger()​

Querying Job Logs​

Using Job Parameters for Debug Context​

Event Listeners for Tracing Job Lifecycle​

CDI @Observes for Event Monitoring​

Database Queries to Inspect Job State​

Find a Specific Job​

Find Failed Jobs by Error Pattern​

View Execution History for a Job​

Monitor Job Throughput​

Check Node Health​

Inspect Batch Progress​

Find Orphaned Jobs​

Enabling Debug Logging​

By Component​

Full Debug Mode​

Reading the Log Output​

Debugging Checklist​