Artificial Intelligence (AI) systems have enabled people around the world to test new experiments and capabilities, and have been widely used in many areas, ranging from nominating books and T.V shows to more complex tasks such as selecting candidates for various jobs or even more sensitive tasks such as predicting diseases.
AI systems require large amounts of data in order to improve the accuracy of the tasks required, after the collection and storage, this data is processed by smart system algorithms so that the system learns from data patterns and characteristics, and then develops the ability of the machine to accomplish the tasks for which it was created.
AI bias can be defined as a state of deviation in the results of machine learning algorithms caused by biased hypotheses during the process of developing the algorithm, reflecting the racism and bias of society against a particular group, or maybe the result of bias in the training data fed into the AI system.
With the spread of AI applications in different sectors and societies, the impact of their bias has increased widely, which also increased the need for fair systems that are capable of making, or helping to make, fair decisions that are free from racism and prejudice.
Practices to reduce AI bias:
Developing fair and impartial AI systems is not an easy task for several reasons; Perhaps the most important of these is feeding the machine learning models with data collected from the real world, so the most accurate systems can learn or amplify biases already found in these data, which can contain biases based on race, gender, religion or other characteristics. The system can also detect unintended blind spots after its launch, which may be problematic before, during, or after the development of an AI system as a result of biases, prejudgments, or structural disparities in society. This could occur even with rigorous training and testing. For example, if an AI system is trained to recognize adult voices, it will be fair and comprehensive in this scope, but it may fail to recognize new words or slang if used by adolescents.
There is no uniform definition of equity, whether decision-making is done by human beings or machines. Determining the appropriate standards of fairness for a system requires taking into account user experience and cultural, social, historical, political, legal, and ethical considerations, which differ from one person to the other.
In an effort to overcome the problem of AI bias, data engineering specialist Innodata published 5 reliable practices to reduce AI bias and reach more equitable and inclusive machine learning models:
- Dataset Selection
While mitigating AI bias and machine learning can be challenging, there are preventive techniques that can help solve this problem. The biggest challenge in determining bias is understanding how some machine learning algorithms generalize learning from the training data. It is therefore important that the data used to train the model be comprehensive.
- Diverse Teams
Building a diverse team contributes greatly to eliminating prejudice. Team diversity can have a positive impact on machine learning models by producing representative and balanced datasets. It also helps to mitigate adverse bias in the structure of data sets and how classifications are applied to those data.
- Reduce Exclusion Bias
Feature selection is essential to help reduce exclusion in AI, which is a process based on reducing the number of variables introduced into AI models with a view to improving the performance of results prediction. This step excludes data elements that do not contain sufficient variation to influence results.
- Algorithms Alone are not Enough
Another way to start solving the problem of AI bias is to not only rely on algorithms, but to keep individuals developing AI informed about the system so that they can effectively identify patterns of unintended bias. This step can reduce flaws in the system, creating a more neutral machine learning model. Organizations should also develop guidelines and procedures that identify and mitigate potential bias in data sets. Documenting biases when they occur, and determining how to find and talk about them, can help ensure that they are not repeated.
- Representative Data
Institutions must understand what the data represented looks like before collecting the data on which the machine learning model will be trained. The essence and characteristics of the data used must also contain minimal bias. Besides identifying the potential bias in data sets, institutions must also document their methods of data selection and purification to radically eliminate the causes of bias.
Blind taste tests and AI
In addition to the previously mentioned practices to reduce AI bias, it is also possible to rely on what’s known as the blind taste test, which has been used for decades. This test went viral in the mid-1970s when a soda company launched a challenge based on tasting two types of beverages without knowing the producing company of the beverage, by removing the sticker from the beverage packaging. The result of the challenge was in favor of the company that made the challenge, as the majority preferred its products over the best-selling competitor, even though the competitor’s poster created a bias for the product on the ground.
The previous experience has shown that the removal of the definition information (sticker) of the product also removed bias and people, therefore, relied only on taste. This can also be applied to machines using a blind taste test. This means that algorithms can simply reject information that may cause biased output, to make sure that the machine learning model makes “blind” predictions about this information.
For example, Pennsylvania uses a tool called the Allegheny Family Screening Tool (AFST) to predict whether children may be exposed to abusive situations. The tool relies on AI using data from the District Department of Human Services, which includes public agency records related to child welfare, alcohol and drug abuse services, housing, and others. Caseworkers use reports of potential abuse of children, together with any publicly available data on the suspected family, to operate the machine-learning model predicting the level of risk from 1 to 20, and an investigation is opened if the system predicts a high level.
But the AFST system incorporated human biases into the AI model. One of the biggest biases is that the system takes into account family calls from health care providers to the community service hotline. Some evidence suggests that such calls are three times more likely to concern Black or biracial families than white families. Although many of these calls are ultimately excluded, the system relies on them to determine the degree of risk, leading to racially biased investigations if hotline callers are more likely to report on Black families.
In this case, the “blind taste test” can work as follows, train the model on all data that can be used to predict possible abuse of children, including calls from the community service hotline. Then retrain the model on all data except this factor. If the model’s predictions are equally good regardless of the call factor, it means that the model makes blind predictions. If the predictions are different when these calls are included, this suggests that the calls represent an illustrative variable in the model, or that there may be a potential bias in the data that needs to be further examined before relying on the algorithm.
Tools to overcome AI bias:
- IBM AI Fairness 360: IBM Research introduced the “AI Fearness360 and AIF360” toolkit in 2018, which is a set of open source comprehensive measures to verify unwanted bias in databases and machine learning models. These tools form a set of modern algorithms that can mitigate AI bias. The initial version of the AIF 360 package, available as a Python package, contained nine different algorithms to mitigate unwanted bias. The AIF 360 package is not only a set of tools, but it also contains an interactive experience that provides a simple introduction to the concepts and possibilities of the package, to help in figuring out what measures and algorithms are most appropriate for a given case. They were also designed to be open source to encourage the contribution of researchers from around the world to add their own metrics and algorithms to the package. The team working on the package was diverse in terms of ethnic group, scientific competence, sexual identity, years of experience, and a range of other characteristics.
- Fairlearn: An open source toolkit, provided by Microsoft, that enables data scientists and developers to assess and improve equity in their AI systems. Fairlearn has two components: Interactive console and bias reduction algorithms. The project also aspires to include a Python Library to assess and improve the equity of AI (equity measures, bias mitigation algorithms, etc.), as well as educational resources covering organizational and technical processes to reduce AI bias (comprehensive user guide, detailed case studies, technical reports, etc.). The development of the Fairlearn toolbox has been based on the fact that equity in AI systems is a technically societal challenge, as there are many complex sources of bias, some of which are societal and some are technical. Therefore, an open source Fairlearn package has been created and developed to allow the entire community – from data scientists, developers, and business decision-makers to people whose lives may be affected by AI predictions – to participate and assess bias damage, review the effects of bias reduction strategies, and then make them suitable for their scenarios.
- FairLens: An open source Python library that is used to automatically detect bias and measure equity in data. The FairLens package can identify bias quickly, and also provide multiple measures of equity across a range of legally protected characteristics such as age, race, and gender. The basic features of the FairLens tool can be summarized in four points:
- Measure the extent of bias: The tool contains measures and tests through which the extent and significance of bias can be determined using statistical measurements and distances.
- Detect protected properties: The tool provides ways to detect legally protected traits and enables the user to measure the hidden relationships between these traits and others.
- Visual data representation tools: FairLens provides graphs of different types of variables in subsets of sensitive data, thus providing an easy way to see and understand trends and patterns found in data.
- Equity assessment: A simplified way to assess the fairness of a random dataset and to create reports highlighting prejudices and hidden relationships.
- Aequitas: A flexible open source toolkit that scrutinizes AI bias, developed by the Center for Data Science and Public Policy at the University of Chicago in the United States. The toolkit can be used to review the predictions of risk assessment tools used in criminal justice systems, education, public health, workforce development, and machine-learning-based social services to understand different types of biases and to make informed decisions on the development and dissemination of such systems. Through the toolkit, two types of biases can be detected in the risk assessment system:
- Biased actions or interventions that have not been allocated to represent the entire population.
- Biased outputs result from a system error vis-à-vis certain groups of persons.
- TCAV: A system that was announced by Sundar Pichai, CEO of Google during the 2019 Google I/O conference, and is also a research initiative named Test With Connection Active (TCAV) to detect bias in machine learning models. The system can examine models to identify elements that can lead to bias based on race, income, location, etc. The TCAV system learns “concepts” mainly through examples.
- Google What-If Tool: Google researchers and designers have created What-If as a practical resource for developers of machine learning systems. The tool tries to answer one of the most difficult and complex questions about AI systems, “What equity do users want?” This interactive open source tool allows the user to investigate machine learning models visually. As part of open source TensorBoard tools, What-If tools can analyze data sets to provide an understanding of how machine learning models work under different scenarios and build fertile perceptions to explain model performance. The What-If tool also allows the user to manually modify samples from the data set and study the impact of these changes through the accompanying machine learning model. Also, by analyzing the algorithmic fairness found in the tool, it is possible to detect patterns of bias that were not previously identifiable. What-If makes it easier for all users, not only programmers, to explore and test machine learning models and address their problems using a clear and simple graphical user interface.
- Skater: An initiative by Oracle, a Python library to de-blur the Black Box model, which is generated directly from data and by algorithms, so it is not possible to know how variables are integrated to make predictions. Skater helps build an able machine learning system that can be used on the ground.