Advent of 2024, Day 24 – Microsoft Azure AI – Evaluation in Azure AI Foundry
This article is originally published at https://tomaztsql.wordpress.com
In this Microsoft Azure AI series:
- Dec 01: Microsoft Azure AI – What is Foundry?
- Dec 02: Microsoft Azure AI – Working with Azure AI Foundry
- Dec 03: Microsoft Azure AI – Creating project in Azure AI Foundry
- Dec 04: Microsoft Azure AI – Deployment in Azure AI Foundry
- Dec 05: Microsoft Azure AI – Deployment parameters in Azure AI Foundry
- Dec 06: Microsoft Azure AI – AI Services in Azure AI Foundry
- Dec 07: Microsoft Azure AI – Speech service in AI Services
- Dec 08: Microsoft Azure AI – Speech Studio in Azure with AI Services
- Dec 09: Microsoft Azure AI – Speech SDK with Python
- Dec 10: Microsoft Azure AI – Language and Translation in Azure AI Foundry
- Dec 11: Microsoft Azure AI – Language and Translation Python SDK
- Dec 12: Microsoft Azure AI – Vision and Document AI Service
- Dec 13: Microsoft Azure AI – Vision and Document Python SDK
- Dec 14: Microsoft Azure AI – Content safety AI service
- Dec 15: Microsoft Azure AI – Content safety Python SDK
- Dec 16: Microsoft Azure AI – Fine-tuning a model
- Dec 17: Microsoft Azure AI – Azure OpenAI service
- Dec 18: Microsoft Azure AI – Azure AI Hub and Azure AI Project
- Dec 19: Microsoft Azure AI – Azure AI Foundry management center
- Dec 20: Microsoft Azure AI – Models and endpoints in Azure AI Foundry
- Dec 21: Microsoft Azure AI – Prompt flow in Azure AI Foundry
- Dec 22: Microsoft Azure AI – Prompt flow using VS Code and Python
- Dec 23: Microsoft Azure AI – Tracing in Azure AI Foundry
With evaluation you performing iterative, systematic evaluations with the right evaluators and measure and address potential response quality, safety, or security concerns throughout the AI development lifecycle, from initial model selection through post-production monitoring.
With the Evaluation in Azure AI Foundry, you can evaluation the GenAI Ops Lifecycle production. In addition, it also gives you the ability to assess the frequency and severity of content risks or undesirable behavior in AI responses.
With each model, you can consider and evaluate:
- accuracy and quality of the model generated responses
- performance of specific tasks, as handling the prompts and content required for specific use cases
- bias and consider ethical bias to address or avoid harmful stereotypes or avoid promoting any other bias
- risk and safety to track and evaluate the unsafe or malicious content.
You also have:
- pre-production (with processes for testing datasets, finding edge cases, assessing robustness and measuring key metrics)
- post-production (with processes for finding edge cases and reporting it back, tracking and monitoring performance, analysing inappropriate outputs)
evaluation of the models.
You can evaluate:
With the model and prompt, you can select many different metrics:
And you can start with evaluation when the deployment is ready, all relevant metrics selected, prompts added and data integration (input and output) defined.
And the results of evaluation
And you can analyse the results with the Python SDK as well. Under the code, you will get the standard overview of the files, runs, logs, metrics definitions and displaying the results.
Tomorrow we will look into documentation for Azure AI Foundry.
All of the code samples will be available on my Github.
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.