The development of reliable and effective AI systems requires adherence to general best practices for software systems, as well as the incorporation of machine learning-specific considerations. This is essential for ensuring the dependability and efficiency of AI systems that meet the needs of their users. Best practices for software systems include designing features with appropriate disclosures and considering augmentation and assistance to improve user experience. Machine learning-specific considerations include identifying multiple metrics to assess training and monitoring, understanding the limitations of the dataset and model, and learning from software engineering best test practices to ensure that AI systems are working as intended and can be trusted.
Following are some recommendation for responsible development of AI.
1. human-centered design approach :
Evaluating the impact of your system's predictions, recommendations, and decisions depends on the user experience. It's crucial to design features with clarity and control in mind, and to provide appropriate disclosures. While a single answer may work in some cases, offering a few options may be more optimal. It's important to anticipate potential adverse feedback early in the design process and to engage with a diverse set of users and use-case scenarios to incorporate feedback throughout project development, ensuring a broad range of perspectives are considered.
2. Use multiple metrics for training and monitoring:
To have a comprehensive understanding of AI performance, use multiple metrics to assess training and monitoring instead of relying on a single metric. These metrics should consider a variety of factors, including feedback from user surveys, overall system performance, and short- and long-term product health. False positive and false negative rates, analyzed across different subgroups, are also important metrics to consider. Ensure that the selected metrics are appropriate for the context and goals of the system; for instance, a fire alarm system should prioritize high recall, even if that means some false alarms.
3. Evaluate raw data:
Examine your raw data to ensure that your ML models accurately reflect the data they are trained on. If you cannot access your raw data due to privacy concerns, compute aggregate and anonymized summaries. Check your data for errors, inaccuracies, and sampling bias to ensure that it is representative of your users and the real-world setting. To address training-serving skew, identify and correct any potential skews during training by adjusting your training data or objective function. Use the simplest model that meets your performance goals and consider the relationship between the data labels you have and the items you are trying to predict in supervised systems. Finally, address data bias by following AI and fairness best practices.
4. Be aware of the constraints of dataset and model :
It is crucial to understand the constraints of both the dataset and the model in use. For instance, it is incorrect to draw causal inferences from a model trained to detect correlations. Although a model can identify patterns from its training data, it is still limited by its scope and coverage. Therefore, it is essential to communicate the limitations and capabilities of the model, such as a shoe detector trained on stock photos that may not perform well on user-generated cellphone photos. Educating users about these limitations can lead to better feedback and improvement of the feature or application.
5. Continuous testing :
Incorporate software engineering best practices for testing and quality assurance to ensure reliable and trustworthy AI systems. Perform thorough unit tests to examine individual components of the system in isolation. Conduct integration tests to observe the interaction between different parts of the AI system. Detect input drift early on by regularly testing input statistics and ensuring they remain consistent.
Create a standard test dataset to evaluate and verify that the AI system is performing as expected. Continuously update the dataset to adapt to evolving user needs and use cases and prevent overfitting. Conduct iterative user testing to include diverse perspectives and requirements during development.
Implement quality engineering principles to incorporate quality checks in the system and respond to any unintended failures immediately.
6. Post-deployment monitoring and updates :
It is important to continuously monitor and update the system even after deployment to ensure optimal performance and user satisfaction. Real-world performance and user feedback should be taken into account when monitoring the system. Issues are inevitable, and time should be allocated in the product roadmap to address them. Short- and long-term solutions should be considered, with a balance between quick fixes and more sustainable solutions. Before updating a deployed model, it is crucial to analyze the differences between the candidate and deployed models, as well as the potential impact on system quality and user experience.
To ensure the dependability and efficiency of AI systems that serve the users well, it is essential to design them in line with the standard best practices for software systems, coupled with approaches that deal with the machine learning-specific factors. By following these best practices, developers can build AI systems that are both effective and ethical, and that improve the lives of people around the world.
Commentaires