How to verify AI safety?
By assessing whether an AI system behaves reliably, predictably, and in alignment with human intentions
1- Behavioral Testing
- Create test cases (normal, edge, and adversarial inputs).
- See if the AI system behaves as expected.
2- Robustness Evaluation
- Test how well the system handles perturbed, noisy, or adversarial data.
3- Formal Verification
- Use logic and math to prove safety properties of simpler or constrained systems.
4- Alignment Checks
- Evaluate whether the model’s behavior matches human values or instructions.
5- Transparency and Interpretability
- Understanding how and why the AI makes decisions:
6- Monitoring & Oversight
- Put the AI under human supervision (human-in-the-loop or human-on-the-loop).