Validating AI Tools in ISO/IEC 17025 Calibration Labs

Jun 5

Artificial intelligence tools have moved from experimental into operational use in accredited calibration laboratories. The applications are practical: uncertainty contributor identification, anomaly detection in calibration history, draft documentation review, and recall scheduling. These applications all produce outputs that, directly or indirectly, influence calibration decisions. Under ISO/IEC 17025:2017, that fact triggers a specific set of requirements.

This article walks through the validation process for AI software in an accredited calibration environment. The framing is deliberately conservative: AI is a tool, not a substitute for metrologist judgment, and the accreditation framework treats it as such. The piece builds on Tra-Cal's approach to AI in calibration operations and addresses the validation depth not covered in the introductory article.

Why AI tools are software systems under ISO/IEC 17025:2017

ISO/IEC 17025:2017 clause 7.11 requires laboratories to control the data and information management systems used in laboratory activities. The standard does not distinguish between traditional deterministic software and AI systems. A spreadsheet that calculates measurement uncertainty falls under clause 7.11. A large language model that drafts an uncertainty budget review falls under the same clause. The control requirements scale with the tool’s effect on calibration decisions.

The practical implication is that any AI tool whose output reaches a calibration certificate, an uncertainty calculation, an interval decision, or a record review must be validated, documented, and change-controlled. The fact that the underlying technology is novel does not exempt it from the requirements that have always applied to laboratory software.

Two other clauses apply when AI is present. Clause 6.4 covers equipment, including software used with measurement equipment. Clause 6.2 covers personnel competence, which extends to the metrologist using and reviewing AI outputs. For a discussion of where AI’s known limitations affect calibration workflows, see AI limitations in calibration workflows.

Validation requirements: scope, intended use, performance criteria

Validation under clause 7.11 begins with a defined scope. The scope statement identifies what the AI tool will do, what calibration parameters or workflows it will affect, and what it will not be used for. Scope creep is the most common source of validation failure, because it produces outputs the validation did not cover.

The intended use statement is narrower than the scope. It describes the specific tasks the tool will perform, the inputs it will receive, the outputs it will produce, and the metrologist’s role in reviewing each output. For example: “The AI tool will draft proposed uncertainty contributor lists from calibration parameter descriptions. The metrologist will review each contributor list, modify as required, and approve before the budget is used.”

Performance criteria define acceptable behavior. For deterministic software, performance criteria are typically expressed as input-output correctness. For AI systems, performance criteria include accuracy on a validation test set, false positive and false negative rates where the tool flags exceptions, behavior at the boundary of the validated input range, and behavior when inputs are out of scope.

The FDA's general principles of software validation guidance is a useful reference framework even outside FDA-regulated contexts. The ISPE GAMP 5 risk-based approach to compliant computerized systems is also widely used in regulated environments. Both predate current AI tooling but their validation methodology applies cleanly when adapted to AI-specific performance dimensions.

Documenting AI tool limitations and operating boundaries

A validated AI tool has documented boundaries. The boundary documentation includes the input ranges the tool was tested against; the output conditions verified during validation; the input conditions where the tool should not be used, with explicit reasons; and the output conditions that require additional metrologist review beyond the default workflow.

For example, an AI tool validated to draft uncertainty contributor lists for pressure calibrations between 0 and 10,000 psi has documented behavior in that range. If a customer submits a 20,000 psi calibration, the tool’s behavior is undocumented. The validation record must identify this case and define the fallback: human-only contributor identification, or revalidation against the extended range before the AI tool is used.

Operating boundary documentation also addresses the AI’s known limitations. AI tools can produce plausible-sounding but incorrect outputs. They can fail silently when the input is similar to but materially different from training data. They can produce inconsistent outputs across nominally identical inputs. These limitations are not failures of validation; they are characteristics of the technology that the documented operating procedure must accommodate.

The boundary documentation is the artifact that lets an accreditation assessor understand what the AI tool does, what it does not do, and what the metrologist is responsible for catching. Without this documentation, the AI tool is unvalidated software regardless of how well it appears to perform in routine use.

Change control when models, prompts, or training data change

AI tools change. The vendor releases a new model version. The internal prompt engineering team adjusts the instructions. Training data updates. The host system gets patched. Each of these changes can affect the tool’s behavior, and each requires change control.

The change control workflow under ISO/IEC 17025:2017 clause 8.5 requires identifying changes, assessing impact, implementing the change in a controlled way, and verifying that the change did not introduce defects. For AI tools, the verification step is more involved than for deterministic software because the change may affect behavior in ways that are not apparent from the change description alone.

A practical change control standard for AI tools includes the following:

Model version updates. Treat as a major change. Re-run the validation test set, document any behavioral differences, and re-authorize the tool for accredited use before the new version is deployed.
Prompt or instruction updates. Treat as a minor change if the update narrows the tool’s behavior, a major change if it expands the scope or changes the output format. Validate before deployment.
Training data updates. Major change for any custom-trained model. For vendor models, treat as a major change if the vendor identifies the update as material; otherwise verify against a subset of the validation test set.
Integration changes. Any change that affects how the AI tool receives input from or returns output to the calibration data system. Treat as a major change.

The change control record becomes part of the AI tool’s validation documentation. Over time, the record establishes the tool’s behavior history, which is what an assessor will examine when reviewing AI-supported calibration decisions.

Audit trail requirements for AI-supported calibration decisions

The audit trail is the single most important documentation deliverable for AI-supported calibration. Without it, the laboratory cannot reconstruct the basis for a calibration decision if an inspector or customer asks.

A defensible audit trail records the inputs provided to the AI tool, including timestamps; the outputs the tool produced, in the form delivered to the metrologist; the metrologist’s review notes, including any modifications, rejections, or escalations; the metrologist’s independent judgment where it differs from the AI output; and the final decision and signature.

This level of detail aligns with NIST guidance on software in measurement systems and with NCSLI guidance on calibration software. It is more detailed than what many laboratories produce for traditional software, because AI outputs require explicit metrologist judgment to be defensible, and the audit trail must show that judgment occurred.

Retention follows the laboratory’s existing record retention policy under clause 8.4. For laboratories supporting regulated industries, retention should match the longest applicable customer or regulatory requirement, typically the lifetime of the affected instrument plus a defined period.

AI tools used responsibly under a validated, documented, change-controlled, audit-traced framework can meaningfully accelerate calibration work without compromising accreditation defensibility. The validation discipline described above is the prerequisite, not an optional layer.

Tra-Cal Laboratories operates ISO/IEC 17025:2017 accredited calibration services under a quality system that treats AI tools as software requiring validation. For organizations considering AI integration in their own calibration programs, the validation framework above is the starting point.

Frequently Asked Questions

Do AI tools require validation under ISO/IEC 17025:2017?

Yes. AI tools that influence calibration decisions are software systems under ISO/IEC 17025:2017 clause 7.11, which requires control of data and information management systems used in laboratory activities. Validation includes documenting the intended use, defining performance criteria, recording acceptance testing results, and establishing change control before the tool can support accredited calibration work.

What ISO/IEC 17025 clauses apply to AI software in a calibration lab?

Clause 7.11 covers control of data and information management systems, which includes any software that processes calibration data, manages records, or supports decisions. Clause 6.4 covers equipment requirements, applicable when AI is embedded in measurement equipment. Clause 6.2 covers personnel competence for the metrologists who use and review AI outputs. Clause 8.7 covers corrective action when AI tool errors are detected.

How do you document AI tool limitations for accreditation purposes?

Document the operating boundaries: the input ranges the tool was tested across, the output conditions where the tool’s performance was verified, and the conditions where the tool should not be used or where outputs require additional review. Include the validation test results, the date of validation, and the metrologist or quality manager who authorized the tool for accredited work. This documentation forms part of the management system records under clause 8.

What change control is required for AI tools in calibration?

Any change that could affect the AI tool’s behavior triggers a change control review: model updates, prompt or instruction changes, training data updates, host system or environment changes, and integration changes with the calibration data system. Each change requires impact assessment, revalidation appropriate to the change scope, and updated documentation before the tool returns to accredited use.

What audit trail is required for AI-supported calibration decisions?

The audit trail should record what input was provided to the AI tool, what output the tool produced, what the metrologist reviewed, what the metrologist’s independent judgment was, and what decision was signed off. The record must be sufficient to reconstruct the basis for any calibration decision when an accreditation assessor or regulated customer requests it. Retention follows the laboratory’s existing record retention policy under clause 8.4.

Keep your calibration program accurate, documented, and accreditation-ready. Connect with Tra-Cal to support your next calibration requirement.

Request A Quote

Joe DiMarino