NIST AI RMF Principle: Measure

The “Measure” function in the NIST AI RMF guides organizations define specific metrics and thresholds to track these risks effectively. Measuring and monitoring AI risks is critical to ensure AI systems perform reliably, safely, and fairly across their lifecycle. 

Let’s see how you can set these metrics, with examples, practical steps, and some tools commonly used to make it happen.

Define AI Risk Metrics and Thresholds

Defining AI risk metrics and thresholds involves establishing specific criteria and measurement standards to evaluate the potential risks associated with AI systems. Here’s a breakdown of what this entails:

1. List Key Risk Areas Across the AI Lifecycle

Pinpoint the key risks your AI system may encounter at different stages of its lifecycle. These can range from bias in decision-making to security vulnerabilities and even environmental impact. Common risk areas include:

  • Bias and Fairness. Likely to arise during data preparation (from unbalanced datasets) and model training (when algorithms reinforce stereotypes) and must be monitored throughout for drift over time.
  • Explainability. Critical in model development (as complex models can become black boxes), validation (to build stakeholder trust), and deployment (where decisions need to be transparent to users).
  • Robustness and Security. Issues may surface during training (with models failing to generalize) and deployment (through adversarial attacks), requiring ongoing monitoring to mitigate new threats.
  • Environmental Impact. Found primarily in training (with resource-heavy models) and deployment (especially for systems requiring real-time inference), with implications for sustainability.

2. Develop Specific Metrics for Each Risk Area

Once you know what to monitor, it’s time to create concrete, measurable metrics for each area. Here’s what that could look like:

  • Bias and Fairness. Use demographic parity (ensuring equal performance across groups) and equalized odds (similar false positive/negative rates across demographics). The disparate impact ratio compares favorable outcomes across groups. Tools like IBM’s AI Fairness 360 and Google’s What-If Tool help detect and address bias.
  • Explainability. Measure explanation satisfaction scores based on user feedback. Tools such as LIME and SHAP provide interpretable model outputs, enhancing transparency and user trust.
  • Performance Reliability. Track accuracy with metrics like precision, recall, and frequency of human overrides (where operators correct AI errors). TensorFlow Model Analysis and Amazon SageMaker Clarify support monitoring these metrics.
  • Environmental Impact. Use CO2 emissions per model training and energy consumption during inference to assess sustainability. Tools like CodeCarbon and Experiment Impact Tracker quantify AI’s environmental footprint.
  • Security Metrics. Monitor incident rate (number of breaches) and Mean Time to Detection (MTTD) to measure how quickly security issues are identified and addressed.
  • Data Privacy Metrics. Use data minimization metrics to ensure only necessary data is collected and stored. These metrics help align AI systems with privacy regulations and user expectations.

3. Set Measurable Baselines and Thresholds

Establishing clear baselines and thresholds is essential for managing risks effectively. Baselines represent expected performance and act as reference points for tracking deviations. For example, a model’s accuracy baseline could be set at 90% based on initial testing. 

These should align with your organization’s policies, values, and industry standards (e.g., a fairness metric ensuring no group receives less than 95% of opportunities compared to others).

Thresholds define when action is needed. For instance, if model accuracy drops below 85%, or response time exceeds 500ms, alerts trigger corrective measures to prevent further degradation. Regular reviews ensure metrics stay relevant as the system evolves and external conditions change.

4. Document Processes and Adjust as Needed

Comprehensive documentation ensures transparency and continuous improvement. Record how baselines were established, the rationale behind thresholds, and any changes made based on monitoring results. This enables traceability, facilitates audits, and ensures smooth handovers. 

Establish a routine for periodic reviews (e.g., quarterly or after significant system updates) to assess whether baselines and thresholds remain relevant. When adjustments are made, document the impact of these changes to create a feedback loop for refining future decisions. Clear ownership of the review process ensures accountability and proactive risk management.

Implement Monitoring and Testing

Monitoring and testing are critical for ensuring your AI systems are running smoothly and ethically. As these technologies evolve, they can face unexpected challenges and risks. 

That’s why it’s important to have a clear framework for keeping tabs on their performance. Regularly evaluating how your AI is functioning allows you to catch any problems before they escalate.

1. Establish a Monitoring Framework that Fits

The first thing you’ll want to do is set up a framework to track your AI system’s performance. But it’s not a one-size-fits-all solution. 

Depending on the type of AI system you’re running, you’ll have different metrics to track. For example, if your AI system helps in hiring decisions, you’d likely want to monitor not just accuracy but also how well it handles decisions across different demographic groups.

Tailor your monitoring framework to the nature of your AI development and use. 

2. Implement Real-Time Monitoring Tools

While having metrics is crucial, you’ll need real-time tools to track how well the system is holding up. These tools can give you immediate feedback and even alert you when something seems off.

Use tools like Amazon CloudWatch or Azure Monitor to track performance in real-time. These tools can send an alert if accuracy dips below that point, allowing you to investigate before things spiral.

3. Conduct Regular Performance Testing 

Beyond real-time monitoring, regular performance testing is a must. AI systems can degrade over time, especially if the data you’re working with changes. Regular tests can help keep the system in check.

To do this, you can Incorporate different types of testing:

  • Stress testing. How does the system behave under heavy traffic or complex input?
  • A/B testing. If you’ve made updates, compare how the new version performs versus the old one.
  • Regression testing. Ensure that new updates don’t accidentally mess up previous functions.

4. Continuous Learning for a Dynamic World

Your AI system can’t stay static. It needs to learn and adapt based on new data, trends, and user feedback. That’s where continuous learning comes in, helping your system remain relevant and useful over time.

Retrain your model with new data periodically. A customer service chatbot, for instance, can be updated with recent user interactions to handle evolving queries more effectively. 

Techniques like transfer learning can also help here, allowing your model to build on previous learning and adapt to new challenges.

5. Engage Stakeholders for Meaningful Feedback

It’s one thing to rely on numbers and metrics, but human feedback, whether it’s from end users or communities impacted by the AI is invaluable. Engaging these groups provides insights that pure data might not capture, such as user trust or satisfaction.

Gather feedback from different groups through surveys, focus groups, or user interviews. 

6. Regular Audits Keep Everything in Check

AI systems don’t operate in a vacuum—they need to be regularly audited to ensure they’re staying true to your organization’s values and meeting regulatory requirements. Audits can help catch risks before they become issues.

Schedule audits (maybe quarterly or annually) to look at how your AI system stacks up against performance and ethical standards.

7. Document the Process

Transparency is key. Not just for building trust, but also for improving your system over time. Keep detailed records of every monitoring session, test, and feedback loop, so you can spot patterns or recurring issues.

Analyze and Report Risk Data

Once risk data is gathered from monitoring and testing AI systems, the next step is analyzing and reporting this data. This step ensures that organizations don’t just gather information but turn it into actionable insights. 

The goal is to assess how the AI system is performing, identify potential risks, and document everything to ensure transparency, accountability, and continual improvement.

Let’s break down how to effectively analyze and report risk data, step by step.

1. Categorize and Prioritize Risk Data

The first step in analyzing risk data is to categorize it based on the trustworthiness characteristics you’re measuring. This can be accuracy, bias, explainability, or security. Then, prioritize these risks depending on their impact and likelihood.

2. Perform Trend Analysis for Context

A trend analysis helps you see how certain risks, such as a drop in model accuracy or an increase in bias, have evolved over time and whether they are improving or worsening.

3. Correlate Risk Data With Operational Goals

A key part of analyzing risk data is connecting it to the operational goals and values of your organization. Map risk data against the organization’s goals using dashboards or matrix tools. 

If bias in diagnostic recommendations for specific patient groups emerges, that data should be immediately flagged for intervention.

4. Apply Sensitivity Analysis for Deeper Insights

Sensitivity analysis reveals how small changes in data or model parameters affect the AI system’s overall performance, providing insights into its robustness. It helps assess the system’s ability to handle unexpected situations, adversarial conditions, or data shifts. 

Testing different variables—such as altering input datasets, tweaking model parameters, or adjusting thresholds—can uncover vulnerabilities and determine which factors have the most significant impact on outcomes. 

This process supports proactive risk management by identifying areas where the system may need adjustments to maintain performance and reliability.

5. Develop Clear and Actionable Risk Reports

Once the data has been analyzed, it’s important to document and communicate findings in a way that decision-makers can understand. Reports should be clear, concise, and focused on actionable insights.

Make sure your report answers key questions such as:

  • Is the AI system meeting its defined trustworthiness goals?
  • What risks have emerged, and how severe are they?
  • What immediate actions are needed to mitigate these risks?

Frequently asked questions

What are the NIST requirements for AI?

The NIST AI RMF outlines requirements for developing and deploying trustworthy AI systems, focusing on reliability, safety, security, transparency, accountability, and fairness. Organizations must also establish governance frameworks to ensure compliance with ethical and regulatory standards for an effective AI risk management.

Which US agency is responsible for the AI risk management framework?

The National Institute of Standards and Technology (NIST), an agency of the U.S. Department of Commerce, is responsible for the AI Risk Management Framework (AI RMF). NIST develops and promotes measurement standards and technology to enhance innovation and industrial competitiveness. The agency collaborates with various stakeholders to ensure the framework’s relevance and applicability across different sectors.

When did NIST release the AI risk management framework?

NIST released the AI Risk Management Framework (AI RMF) on January 26, 2023.

Does NIST AI RMF have a certification?

Currently, the NIST AI RMF does not offer a formal certification. Instead, it serves as a guideline and best practices framework for organizations to align their AI risk management practices with. However, organizations can demonstrate compliance and adherence to the framework through self-assessments, third-party audits, and by implementing the recommended practices.

Who can perform NIST AI assessments?

NIST AI assessments can be performed by qualified internal teams, third-party auditors, or consultants with expertise in AI risk management and the NIST AI RMF. I.S. Partners offers a complete package of services to help organizations implement the AI RMF standards according to their industry requirements.

Get started

Get a quote today!

Fill out the form to schedule a free, 30-minute consultation with a senior-level compliance expert today!

Analysis of your compliance needs
Timeline, cost, and pricing breakdown
A strategy to keep pace with evolving regulations

Great companies think alike.

Join hundreds of other companies that trust I.S. Partners for their compliance, attestation and security needs.

Scroll to Top