Grading Framework
The Gradaris Governance Score (GGS) is a composited A–F grade derived from three tiers of assessment covering 12 criteria. Every score is accompanied by a cryptographic integrity hash, a confidence rating per tier, and direct mapping to EU AI Act articles.
Three-Tier Methodology
The GGS methodology is tiered deliberately. Each tier has a different evidence type, confidence level, and failure behavior. The tiers are not averaged equally — they are hierarchical. A critical failure in Tier 1 caps the maximum achievable score regardless of Tier 2 and Tier 3 performance.
Tier 1 criteria are non-negotiable. Any single failure in this tier forces a Grade D or lower regardless of Tier 2 and Tier 3 performance.
Audit log integrity
All agent runs produce an immutable, tamper-evident log entry. SHA-256 hash verified on read. Log chain continuity confirmed.
Human override capability
A verified mechanism exists to halt, override, or modify agent outputs. Override event is logged and attributed.
Data lineage traceable
Training data sources and versions are documented. Input data schema is captured per run. Sufficient for Article 10 review.
Version control active
Model version, prompt version, and configuration are pinned and logged per run. Rollback capability confirmed.
Tier 2 results are stable and independently verifiable. Benchmark suites are versioned and published. Results are not subject to assessor interpretation.
Bias & fairness benchmark
Performance parity tested across protected attribute groups using the published Gradaris Fairness Suite v2. Disparate impact ratio calculated.
Robustness under distribution shift
Agent performance tested against out-of-distribution inputs from the Gradaris OOD benchmark set. Performance degradation beyond acceptable thresholds results in a failing result.
Adversarial input resistance
Common adversarial prompt patterns and injection attempts from the Gradaris Red Teaming Suite. A minimum pass rate is required; the threshold is calibrated to the agent's risk classification.
Output calibration
Confidence scores are assessed for calibration quality. Overconfident outputs in high-stakes decisions are flagged.
Tier 3 is the most interpretive tier but remains structured. The rubric is fixed and versioned. Assessor decisions are documented with rationale at each sub-criterion.
Risk management documentation
Risk identification, residual risk analysis, and mitigation documentation quality. Weighted: Article 9 EU AI Act.
Transparency and explainability
Quality of user-facing disclosures, decision explanations, and capability limitation notices. Weighted: Article 13.
Human oversight arrangements
Documented oversight procedures, escalation paths, and human review trigger conditions. Weighted: Article 14.
Incident response readiness
Incident detection, reporting, and remediation procedures. Regulator notification capability.
Grade Definitions
The composite GGS score is calculated as a weighted sum of all three tiers, subject to the Tier 1 cap rule. The lowest grades reflect either a Tier 1 failure or severe gaps across multiple tiers.
| Grade | Score | Tier 1 status | Regulatory interpretation |
|---|---|---|---|
| A | 90–100 | All controls verified | Exemplary governance. Audit-ready evidence package. Suitable for proactive regulatory submission. |
| B | 75–89 | All controls verified | Good standing. Minor documentation or benchmark gaps. Remediation recommended within 90 days. |
| C | 60–74 | All controls verified | Acceptable baseline. Identified improvements required before regulatory submission. 30-day remediation plan expected. |
| D | 45–59 | ≥1 control failure | At risk. Tier 1 failure detected or significant multi-tier gaps. Urgent remediation required. Not suitable for regulated deployment. |
| F | 0–44 | Critical failure | Non-compliant. Multiple critical failures. Deployment should be suspended pending full remediation and re-assessment. |
A Grade D or F does not necessarily mean an agent is non-functional — it means governance evidence is insufficient for audit purposes. Many agents operate without scoring at all. A Grade D is still a significant improvement over zero visibility.
Cryptographic Integrity
Every GGS assessment report carries a SHA-256 integrity hash. This hash is computed over a canonical JSON object containing the assessment methodology version, all input signals, criterion scores, tier weights, and the final composite score.
If any element of the assessment changes — including methodology version — the hash changes. This makes every report tamper-evident by design.
What the hash covers
- Methodology version identifier (e.g.
current methodology version) - Agent identifier and assessment timestamp
- All Tier 1 control results (binary)
- All Tier 2 benchmark results (numerical)
- All Tier 3 sub-criterion scores and assessor rationale hashes
- Tier weights used in composite calculation
- Final composite score and grade
Example integrity hash
The hash is included in every PDF evidence package and can be independently verified by any party with access to the Gradaris Verification API — including regulators, auditors, and your legal team.
EU AI Act Article Mapping
Each GGS criterion is mapped to one or more EU AI Act articles. When you download an evidence package, the relevant article references are included alongside the criterion result, making it straightforward to present evidence to a regulator by article number.
| Criterion | EU AI Act article | Obligation summary |
|---|---|---|
| T1.1 Audit log integrity | Article 12 | Automatic recording of events throughout the lifecycle of high-risk AI systems |
| T1.2 Human override | Article 14 | Human oversight measures — ability to intervene in or halt operation |
| T1.3 Data lineage | Article 10 | Data and data governance — training, validation, testing data requirements |
| T1.4 Version control | Article 9, 17 | Risk management system; quality management system requirements |
| T2.1 Bias benchmark | Article 10, 15 | Data governance; accuracy, robustness, and cybersecurity |
| T3.1 Risk documentation | Article 9 | Risk management system — identification, analysis, estimation of risk |
| T3.2 Transparency | Article 13 | Transparency and provision of information to deployers |
| T3.3 Human oversight | Article 14 | Human oversight — design and operational measures |