- New documentation standards in machine learning can enable responsible technology.
- These risk management strategies highlight how organizations can be compliant while protecting their valuable intellectual property.
The growing use of machine learning (ML), within a global drive toward digital acceleration, raises the question, “Who’s minding the store?”
We’ve seen many examples of both the power and pitfalls of ML models. While organizations are increasingly aware of the downsides (our research shows many are still behind the curve), they find themselves trying to manage these risks without a set of agreed upon tools or standards. Meanwhile, they’re keenly aware of the competitive environment, rationally protecting their proprietary methods.
Governments in some regions have offered early guardrails by beginning to regulate the underlying model data, however, a global standard has not been established and regulation for ML models has yet to catch up.
Two key components for using ML responsibly provide a prudent “start here” for organizations: model explainability and data transparency. The inability to explain why a model arrived at a particular result presents a level of risk in nearly every industry. In some areas, like healthcare, the stakes are particularly high when a model could be presenting a recommendation for patient care. In financial services, regulators may need to know why a lender is making a loan. Data transparency can ensure there is no unfair or unintended bias in the training data sets used to build the model, which can lead to disparate impact for protected classes – and consumers have what is increasingly a legally protected right to know how their data is being used.
But how can organizations developing ML models enforce explainability and transparency standards when doing so might mean sharing with the public the very features, data sets, and model frameworks that represent that organization’s proprietary intellectual property (IP)?
Managing risk through documentation
Given machine learning’s complexity and interdisciplinary nature, executives should employ a wide variety of approaches to manage the associated risks, which include building risk management into model development and applying holistic risk frameworks that leverage and adapt principles used in managing other types of enterprise risk. Executives must also simply step back regularly to consider the broad implications for employees and society when using ML.
One horizontal practice underlying these approaches is documentation, which can serve as a mechanism for both explainability and transparency while providing a proxy for aspects of the model and its contents without exposing the underlying data or model features.
Whereas standard technical documentation is created to help practitioners implement a model, documentation focused on explainability and transparency informs consumers, regulators, and others about why and how a model or data set is being used. Such documentation includes a high-level overview of the model itself, including: its intended purpose, performance, and provenance; information about the training data set and training process; known issues or tradeoffs with the model; identified risk mitigation strategies; and any other information that can help contextualize the technology. It can be delivered in the form of model fact sheets, score cards, algorithmic impact assessments, and/or data set labels.
Listing the ingredients without revealing the recipe
A good analogy may be drawn from considering a cookie sold by a food manufacturer (an analogy inspired by the Data Nutrition Project effort, where one of our authors is a contributor in a personal capacity). While the company making such a cookie is not eager to share the proprietary recipe, food regulators require a nutrition label on the package with a list of ingredients for consumers. Consumers don’t need to know the exact recipe to make informed choices about the cookie; they only need an overview of the ingredients and a macronutrient profile presented in an understandable way.
Similarly, model documentation can become the proxy for sharing the model and its features and data sets with the world as opposed to sharing the actual “cookie recipe.” This gives others the ability to check the “health” and rationale of the model without needing to know, or being able to derive, how to recreate the model. No trade secrets are leaked.
The World Economic Forum’s Centre for the Fourth Industrial Revolution, in partnership with the UK government, has developed guidelines for more ethical and efficient government procurement of artificial intelligence (AI) technology. Governments across Europe, Latin America and the Middle East are piloting these guidelines to improve their AI procurement processes.
Our guidelines not only serve as a handy reference tool for governments looking to adopt AI technology, but also set baseline standards for effective, responsible public procurement and deployment of AI – standards that can be eventually adopted by industries.
We invite organizations that are interested in the future of AI and machine learning to get involved in this initiative. Read more about our impact.
Importantly, documentation can help organizations identify critical issues early on. In a timely example, one state government building a model to help manage COVID-19 vaccine allocation realized, through the creation of model documentation, that several regions were underrepresented and quickly recalibrated the model to correct the issue.
Documentation provides other benefits to an organization. It can, for example, help companies respond to the market. In the case of cookies, this is comparable to the blinking red light that goes off for nutritionists when a cookie is high in trans fats. Once consumers realize the harm from an excess amount of trans fats and decide instead to purchase a cookie without them, the food producer may determine independently to make a healthier recipe in order to match consumer need.
Putting documentation into practice
Organizations looking to operationalize the practice could do so with a bottom-up approach, where practitioners and business stakeholders decide what needs to be recorded about the model contents and processes. A ML or data set developer could then be provided with a checklist of questions aligned to selected categories. For a data set, for example, we’ve seen the emergence of categories such as description, composition, provenance, collection, and management.
The process of building the documentation would also require cross-functional support. Additional team members will likely include subject-matter experts (to understand implications of model and data use cases), a legal/regulatory expert, as well as those who helped curate and manage the data. Teams may also include a member of any group affected by the AI systems to ensure that the documentation is actually understood by those who will experience the impact.
Not all models require the same level of risk management. Documentation processes can take time and effort, so we recommend building this practice into a risk-assessment framework. High-stakes models are likely a good place to start. Over time, as aspects of documentation become automatable, lower-risk use cases may also become good candidates for more diligent documentation.
Don’t wait for a universal documentation standard
Nutrition labels are useful because they are standardized, enabling consumers to know where to check for certain ingredients. Likewise, model and data documentation is more powerful when universally applied and understood, or at least interoperable. There are several ongoing ML documentation initiatives in the works, but no winning contender yet.
Governments will further regulate in this space, and we are not advocating for specific laws or policy changes. But beginning to engage in the practice of documentation itself can act as a forcing function for collaboration and further research, while protecting your IP.