Fatigue Risk Scoring Models

A fatigue risk score answers a question the hard limits cannot: is this pairing legal and safe to fly? Cumulative caps and flight duty period tables tell a scheduler whether a roster breaches a statutory boundary, but they say nothing about the crew member who is inside every limit yet reporting for a fourth consecutive back-of-the-clock sector through the window of circadian low. Fatigue scoring closes that gap. It is the forward-looking layer of the Duty Time Validation & Rule Engines domain — a deterministic function that converts circadian phase, accumulated sleep debt, and duty history into a numeric penalty an optimizer can act on before a roster is ever published. This page covers how that scoring layer is modelled, which regulatory provisions require it, how it is implemented in Python, and where it plugs into the rest of the validation pipeline.

The Scoring Problem Behind Legal-but-Fatiguing Rosters

The scoped engineering challenge is precise: given a proposed pairing and each crew member’s recent sleep and duty history, produce a reproducible fatigue score — and do it fast enough to sit inside a pairing optimizer’s inner loop rather than a nightly report. Three properties make this genuinely hard, and none of them are solved by the hard-limit checks that live elsewhere in the engine.

First, fatigue is a function of biological time, not clock time. A duty that starts at 06:00 local carries a very different alertness cost depending on whether the crew member is acclimatised to that zone, has just crossed six time zones eastbound, or is on the third day of a westbound rotation. The same wall-clock report can put one pilot at the peak of their circadian rhythm and another deep in the window of circadian low (WOCL), the 02:00–05:59 band where alertness bottoms out.

Second, fatigue is path-dependent and accumulates non-linearly. A single short rest is recoverable; three short rests in a row are not. Sleep debt compounds, and the recovery value of a rest period depends on where in the circadian cycle it falls — eight hours of rest layered over daytime yields far less restorative sleep than eight hours at night. A scoring model that treats each duty in isolation will systematically under-count risk on exactly the multi-day rotations that hurt crews most.

Third, the score must be deterministic and explainable. A fatigue penalty that nudges the optimizer away from a pairing, or flags a roster for a fatigue risk management committee, becomes part of the operational record. Given the same inputs and the same model version, it must produce byte-identical output, and every score must decompose into the factors that produced it. A black-box neural estimate that cannot be reproduced or explained six months later is unusable in front of an auditor.

The consequence is that fatigue scoring is not a machine-learning afterthought bolted onto the scheduler. It is a first-class, versioned, biomathematical model that consumes the same normalized event stream the hard-limit checks use, and emits a structured, signed score alongside the pass/fail verdict.

Schema and Data Structure Design

The data model separates three concerns that naïve implementations collapse into one flat table: the raw sleep and duty history that drives the homeostatic estimate, the derived circadian state of each crew member at each instant, and the versioned model parameters that turn those into a score. Keeping the model coefficients in their own effective-dated table is what lets a recalibration be diffed and rolled back without touching scheduling code.

The core entities are the crew_member and their time-varying circadian_state (acclimatised reference zone plus current circadian phase); the sleep_opportunity rows derived from rest periods, each carrying a start instant, duration, and an estimated restorative fraction; the duty_segment rows that consume alertness; the fatigue_score produced per pairing evaluation, holding the composite value and its factor breakdown; and the effective-dated scoring_model that supplies the weighting coefficients, WOCL band definition, and threshold tiers keyed by revision date. Every temporal column is stored in UTC with the originating IANA zone as metadata, because the circadian calculation is done in the crew member’s acclimatised local time while the cumulative windows are summed in UTC.

The fatigue-scoring schema: a crew_member owns their circadian_state, sleep_opportunity and duty_segment history; a pairing evaluation scores that history against a versioned, effective-dated scoring_model to emit one fatigue_score carrying both a composite value and its factor breakdown.

Field names, unit conventions, and the classification of what counts as a sector, a positioning segment, or a rest opportunity all follow the shared crew duty time taxonomy. Anchoring the schema to that taxonomy is what prevents the scorer from mistaking a deadhead segment for restorative time, or a short callable standby for genuine sleep opportunity — the two most common ways a fatigue model silently under- or over-scores.

Regulatory Mapping

Fatigue scoring is not merely good practice; it is where the regulations explicitly require operators to reason beyond the hard tables. The mapping below is the ground truth the model encodes. The authoritative US wording lives in FAA Part 117, and the European counterpart under EASA FTL compliance derives from Regulation (EU) No 965/2012, Annex III, Subpart FTL.

§117.3 — Window of circadian low. Defines the WOCL as the period between 02:00 and 05:59, in the crew member’s home-base time when acclimated, or theatre time when acclimated to a new zone. This definition is the anchor for the circadian penalty; a duty overlapping the WOCL carries the model’s heaviest circadian weight.
§117.5 — Fitness for duty. Requires that no crew member report, and no certificate holder assign, a flight if fatigue is likely to compromise safety — an affirmative, forward-looking obligation that a score operationalises rather than replaces.
§117.7 — Fatigue risk management system. Permits an operator to exceed specific prescriptive limits only under an FAA-approved FRMS supported by data. The scoring model is the quantitative backbone of that data case, so its outputs must be auditable to the same standard as a compliance verdict.
§117.13 and §117.15 — Flight duty period tables. The maximum FDP is banded by report time precisely because early-morning and overnight reports encroach on the WOCL; the score reuses the same report-time bands so that its circadian penalty tracks the regulatory rationale rather than diverging from it.
§117.25 — Rest period. Guarantees a minimum rest with an 8-hour uninterrupted sleep opportunity. The model’s homeostatic recovery term is bounded by the restorative fraction of that opportunity, not its raw duration, so a nominally compliant rest scheduled entirely across daytime still scores as partial recovery.
EASA ORO.FTL.120 — Fatigue risk management. Where the certification specifications require it, obliges operators to run a data-driven FRM process alongside the prescriptive scheme, and CS FTL.1.235 with its associated AMC/GM provides the WOCL and reduced-rest interpretations the European branch of the model encodes.

The distinction between when these fatigue provisions merely inform a soft warning and when they compress a hard FDP ceiling is worked through against the underlying tables in FAA Part 117 rule schema design; the scoring layer here consumes those ceilings, it does not redefine them.

Python Implementation Walkthrough

Production implementations express the scoring inputs as typed models so a malformed payload fails at the boundary rather than deep inside the biomathematical core. Using pydantic for the contract guarantees that report times are timezone-aware, sleep opportunities are ordered, and the acclimatisation reference zone is present before evaluation begins.

from datetime import datetime, timezone
from enum import Enum
from pydantic import BaseModel, field_validator


class Acclimatisation(str, Enum):
    HOME = "home"        # WOCL keyed to home-base time
    THEATRE = "theatre"  # acclimated to a new zone
    UNKNOWN = "unknown"  # state cannot be determined


class SleepOpportunity(BaseModel):
    start_utc: datetime
    duration_hours: float
    restorative_fraction: float  # 0..1, discounted for daytime rest


class FatigueInput(BaseModel):
    crew_id: str
    report_time_utc: datetime
    reference_zone: str          # IANA zone for the WOCL calculation
    state: Acclimatisation
    sectors: int
    prior_sleep: list[SleepOpportunity]

    @field_validator("report_time_utc")
    @classmethod
    def must_be_utc(cls, v: datetime) -> datetime:
        if v.tzinfo is None or v.utcoffset() != timezone.utc.utcoffset(None):
            raise ValueError("report_time_utc must be timezone-aware UTC")
        return v

The score itself follows a simplified two-process formulation: a homeostatic term that rises with time awake and falls with restorative sleep, and a circadian term that peaks in the afternoon and troughs across the WOCL. Keeping the scorer a pure function of (FatigueInput, ScoringModel) is what makes every score reproducible from the audit log months later. Alertness effectiveness E is modelled as

E(t) = 100 - w_h \cdot S(t) - w_c \cdot \bigl(1 - \cos\!\tfrac{2\pi (h(t) - \phi)}{24}\bigr)

where S(t) is accumulated homeostatic pressure, h(t) is the local circadian hour, φ is the acrophase offset, and w_h, w_c are model weights. The reported fatigue score is 100 − E, so a higher number means more risk. Timezone handling relies on the standard library zoneinfo and datetime modules documented in the Python datetime documentation.

import math
from zoneinfo import ZoneInfo


def circadian_penalty(report_utc: datetime, zone: str, w_c: float) -> float:
    """Circadian cost peaks across the WOCL (02:00-05:59 local)."""
    local = report_utc.astimezone(ZoneInfo(zone))
    hour = local.hour + local.minute / 60.0
    acrophase = 17.0  # alertness peak ~17:00 local
    return w_c * (1 - math.cos(2 * math.pi * (hour - acrophase) / 24))


def homeostatic_pressure(prior_sleep: list[SleepOpportunity]) -> float:
    """Sleep debt: shortfall against an 8h/night restorative baseline."""
    restored = sum(s.duration_hours * s.restorative_fraction for s in prior_sleep)
    baseline = 8.0 * max(1, len(prior_sleep))
    return max(0.0, baseline - restored)


def fatigue_score(inp: FatigueInput, w_h: float, w_c: float) -> float:
    s = homeostatic_pressure(inp.prior_sleep)
    circadian = circadian_penalty(inp.report_time_utc, inp.reference_zone, w_c)
    sector_load = 1.5 * max(0, inp.sectors - 2)   # multi-sector duty penalty
    raw = w_h * s + circadian + sector_load
    if inp.state is Acclimatisation.UNKNOWN:
        raw *= 1.15                               # conservative uplift
    return round(min(raw, 100.0), 2)

That score is not a verdict. It is fed to the optimizer as a soft penalty coefficient — pushing the solver toward lower-fatigue alternatives that still satisfy every hard limit — and, when it crosses a configured boundary, it routes a pairing to review. The routing boundaries themselves live in Threshold Tuning & Alerting, which owns the tiered escalation logic so the scoring model stays a pure numeric producer.

Fatigue scoring as a pre-validation overlay: weighted physiological factors collapse into a single score that becomes a penalty coefficient steering the pairing optimizer, while the same score is tested against a threshold to flag encroaching pairings for review.

Rolling Window and Temporal Aggregation

The single-report score above is stateless, but the signals that feed it — cumulative sleep debt and the count of consecutive early or WOCL-encroaching duties — are inherently windowed. This is where fatigue scoring rejoins the same rolling-accumulator machinery the cumulative caps use, and where timezone drift is the dominant source of error. Every input is stamped in ISO 8601 UTC before it enters the aggregation layer. Where the data lives in PostgreSQL, frame-bounded window functions express the running counts directly; see the PostgreSQL window functions reference for the frame semantics.

-- Consecutive WOCL-encroaching duties per crew member, trailing 4 days.
SELECT
    crew_id,
    duty_start_utc,
    SUM(CASE WHEN encroaches_wocl THEN 1 ELSE 0 END) OVER (
        PARTITION BY crew_id
        ORDER BY duty_start_utc
        RANGE BETWEEN INTERVAL '96 hours' PRECEDING AND CURRENT ROW
    ) AS wocl_duties_96h
FROM duty_segment
ORDER BY crew_id, duty_start_utc;

For pre-flight validation inside the optimizer, the same logic runs in memory over Polars or Pandas frames, where a rolling group-by keyed on the duty timestamp reproduces the SQL frame without a database round trip. Whichever engine is used, the window boundary must be inclusive of the exact regulatory span, because an off-by-one on the trailing window silently drops a consecutive-early-start from the count and lets the score drift optimistic — the direction that matters least to a scheduler and most to a safety board. The precise reduction of block and duty times that feeds these windows is handled upstream by the Flight Time Calculation Algorithms; getting that reduction wrong poisons the sleep-debt estimate before the scorer ever runs.

Integration Points

Fatigue scoring is one stage in a longer chain and depends on clean contracts with its neighbours:

Upstream — flight data ingestion. Roster changes, actual block times, and disruption events arrive through an event-driven pipeline that publishes state changes to an evaluation queue. Each event triggers recalculation of the affected crew member’s sleep debt and consecutive-duty counts. Events without an explicit zone are rejected at this boundary, never coerced to a default offset.
Vocabulary — crew duty time taxonomy. Supplies the canonical classification of sectors, positioning, standby, and rest so the restorative-fraction estimate is computed against the same event semantics everywhere.
Sibling — Rest Period Compliance Checks. Determines whether a rest period is legal; the scorer reuses its restorative-fraction logic to weight how much of that legal rest actually recovers sleep debt, so the two never disagree about what a rest was worth.
Downstream — Threshold Tuning & Alerting. Consumes the numeric score and applies the configurable tiers that decide whether it becomes an informational flag, a dispatcher warning, or an escalation to a fatigue risk management committee.
Perimeter — system security and access boundaries. Every score and every override of a soft-threshold breach is a signed, hash-chained audit event attributed to an identity, so the FRMS data case is tamper-evident. When the live model store is unreachable, the engine falls back to a cached, signed model snapshot and marks each score as provisional; queued evaluations reconcile once the authoritative store returns, and the fallback path is itself a signed audit event.

Testing and Edge Cases

Because the model is stateful and time-sensitive, example-based tests miss the cases that matter. Property-based testing with hypothesis generates thousands of synthetic histories and asserts invariants — for instance, that adding a restorative sleep opportunity never increases the score, and that two identical inputs always produce identical scores. The boundary conditions that most often break fatigue implementations are specific:

from hypothesis import given, strategies as st


@given(sleep=st.lists(sleep_opportunity_strategy(), min_size=1, max_size=20))
def test_more_restorative_sleep_never_raises_score(sleep):
    base = build_input(prior_sleep=sleep)
    extra = build_input(prior_sleep=sleep + [full_night_sleep()])
    assert fatigue_score(extra, W_H, W_C) <= fatigue_score(base, W_H, W_C)


@given(inp=fatigue_input_strategy())
def test_scorer_is_deterministic(inp):
    assert fatigue_score(inp, W_H, W_C) == fatigue_score(inp, W_H, W_C)

WOCL encroachment at the band edge. A report at exactly 05:59 versus 06:00 local sits on either side of the window of circadian low; the circadian penalty must be selected from local wall-clock time at report, and tests must pin the behaviour on both sides of the boundary.
Daylight-saving transitions. A rest period spanning a spring-forward transition is 23 real clock hours, not 24; the restorative-fraction calculation must be driven by the aware-instant duration in zoneinfo, not by a naïve local-time subtraction that would credit an hour of sleep that never happened.
Acclimatisation reset mid-rotation. After crossing enough time zones the WOCL reference shifts from home base to theatre time, so an otherwise identical report scores differently before and after the reset; tests must cover the transition, not just the endpoints.
Date-line crossing. An eastbound pairing that crosses the International Date Line makes UTC ordering and local ordering diverge; the consecutive-duty count must be computed on UTC-ordered events so a day is never double-counted or skipped.

Model coefficients are version-controlled and validated against the operator’s FRMS reporting data before every deployment, so a recalibration is a reviewable diff rather than a silent behaviour change — the same discipline the hard-limit rule sets follow.

Explore This Topic in Depth

The scoring model sits between the hard-limit checks and the alerting layer, and each of its neighbours is treated in its own detail:

Threshold Tuning & Alerting — the configurable tiers that turn a raw fatigue score into informational, warning, and escalation notifications routed to schedulers, dispatchers, and FRMS committees.
Rest Period Compliance Checks — the split-rest and compensatory-rest logic that determines how much of a rest opportunity the scorer may count as restorative.
Flight Time Calculation Algorithms — the block-to-block reductions that feed the sleep-debt and duty-load inputs the model consumes.

Rest Period Compliance Checks — supplies the restorative-fraction logic the scorer weights sleep debt against.
Threshold Tuning & Alerting — the downstream tiers that turn scores into routed alerts.
Flight Time Calculation Algorithms — the upstream block-time reductions that feed the model.
FAA Part 117 rule schema design — the hard FDP tables and WOCL bands the score tracks.
EASA FTL compliance frameworks — the European FRM counterpart for dual-jurisdiction carriers.

Back to Duty Time Validation & Rule Engines.