Essential terms and explanations for AI governance
Published: June 2023
The field of artificial intelligence is rapidly evolving across different sectors and disparate industries, leaving business, technology and government professionals without a common lexicon and shared understanding of terms and phrases used in AI governance. Even a search to define “artificial intelligence” returns a range of definitions and examples. From the cinematic, like HAL 9000 from "2001: A Space Odyssey," to the creative, like Midjourney and DALL-E generative art, to the common, like email autocorrect and mobile maps, the use cases and applications of AI continue to grow and expand into all aspects of life.
This glossary was developed with reference to numerous materials and designed to provide succinct, but nuanced, definitions and explanations for some of the most common terms related to AI today. The explanations aim to present both policy and technical perspectives and add to the robust discourse on AI governance. Although there are some shared terms and definitions, this glossary is separate from the official IAPP Glossary of Privacy Terms.
The obligation and responsibility of the creators, operators and regulators of an AI system to ensure the system operates in a manner that is ethical, fair, transparent and compliant with applicable rules and regulations (see also fairness and transparency). Accountability ensures that actions, decisions and outcomes of an AI system can be traced back to the entity responsible for it.
A subfield of AI and machine learning where an algorithm can select some of the data it learns from. Instead of learning from all the data it is given, an active learning model requests additional data points that will help it learn the best.
Also called query learning.
expand_more
AI governance
Definition
A system of policies, practices and processes organizations implement to manage and oversee their use of AI technology and associated risks to ensure the AI aligns with an organization's objectives, is developed and used responsibly and ethically, and complies with applicable legal requirements.
expand_more
Algorithm
Definition
A computational procedure or set of instructions and rules designed to perform a specific task, solve a particular problem, or produce a machine learning or AI model (see also machine learning model).
expand_more
Artificial general intelligence
Definition
AI that is considered to have human-level intelligence and strong generalization capability to achieve goals and carry out a variety of tasks in different contexts and environments. AGI is still considered a theoretical field of research and contrasted with "narrow" AI, which is used for specific tasks or problems.
Acronym
AGI
expand_more
Artificial intelligence
Definition
Artificial intelligence is a broad term used to describe an engineered system where machines learn from experience, adjusting to new inputs (see also input data), and potentially performing tasks previously done by humans. More specifically, it is a field of computer science dedicated to simulating intelligent behavior in computers. It may include automated decision-making.
Acronym
AI
expand_more
Automated decision-making
Definition
The process of making a decision by technological means without human involvement.
expand_more
Bias
Definition
There are several types of bias within the AI field. Computational bias is a systematic error or deviation from the true value of a prediction that originates from a model's assumptions or the data itself (see also input data). Cognitive bias refers to inaccurate individual judgment or distorted thinking, while societal bias leads to systemic prejudice, favoritism, and/or discrimination in favor of or against an individual or group. Bias can impact outcomes and pose a risk to individual rights and liberties.
expand_more
Bootstrap aggregating
Definition
A machine learning method that aggregates multiple versions of a model (see also machine learning model) trained on random subsets of a data set. This method aims to make a model more stable and accurate.
Sometimes referred to as bagging.
expand_more
Chatbot
Definition
A form of AI designed to simulate human-like conversations and interactions that uses natural language processing to understand and respond to text or other media. Because chatbots are often used for customer service and other personal help applications, chatbots often ingest users’ personal information.
expand_more
Classification model (Classifiers)
Definition
A type of model (see also machine learning model) used in machine learning that is designed to take input data and sort it into different categories or classes.
expand_more
Clustering (or clustering algorithms)
Definition
An unsupervised machine learning method where patterns in the data are identified and evaluated, and data points are grouped accordingly into clusters based on their similarity.
expand_more
Computer vision
Definition
A field of AI that enables computers to process and analyze images, videos and other visual inputs.
expand_more
Conformity assessment
Definition
An analysis, often performed by a third-party body, on an AI system to determine whether requirements, such as establishing a risk management system, data governance, record-keeping, transparency and cybersecurity practices have been met.
expand_more
Contestability
Definition
The principle of ensuring that AI systems and their decision-making processes can be questioned or challenged. This ability to contest or challenge the outcomes, outputs and/or actions of AI systems can help promote transparency (see also transparency) and accountability (see also accountability) within AI governance (see also AI governance).
Also called redress.
expand_more
Corpus
Definition
A large collection of texts or data that a computer uses to find patterns, make predictions or generate specific outcomes. The corpus may include structured or unstructured data and cover a specific topic or a variety of topics.
expand_more
Decision tree
Definition
A type of supervised learning model used in machine learning (see also machine learning model) that represents decisions and their potential consequences in a branching structure.
expand_more
Deep learning
Definition
A subfield of AI and machine learning that uses artificial neural networks. Deep learning is especially useful in fields where raw data needs to be processed, like image recognition, natural language processing and speech recognition.
expand_more
Discriminative model
Definition
A type of model (see also machine learning model) used in machine learning that directly maps input features to class labels and analyzes for patterns that can help distinguish between different classes. It is often used for text classification tasks, like identifying the language of a piece of text. Examples are traditional neural networks, decision trees and random forest.
expand_more
Entropy
Definition
The measure of unpredictability or randomness in a set of data used in machine learning. A higher entropy signifies greater uncertainty in predicting outcomes.
expand_more
Expert system
Definition
A form of AI that draws inferences from a knowledge base to replicate the decision-making abilities of a human expert within a specific field, like a medical diagnosis.
expand_more
Explainability
Definition
The ability to describe or provide sufficient information about how an AI system generates a specific output or arrives at a decision in a specific context to a predetermined addressee. XAI is important in maintaining transparency and trust in AI.
Acronym
XAI
expand_more
Exploratory data analysis
Definition
Data discovery process techniques that take place before training a machine learning model in order to gain preliminary insights into a data set, such as identifying patterns, outliers, and anomalies and finding relationships among variables.
expand_more
Fairness
Definition
An attribute of an AI system that ensures equal and unbiased treatment of individuals or groups in its decisions and actions in a consistent, accurate manner. It means the AI system's decisions should not be affected by certain sensitive attributes like race, gender or religion.
A machine learning method that allows models (see also machine learning model) to be trained on the local data of multiple edge devices or servers. Only the updates of the local model, not the training data itself, are sent to a central location where they get aggregated into a global model — a process that is iterated until the global model is fully trained.
expand_more
Foundation model
Definition
A large-scale, pretrained model for AI capabilities, such as language (see also large language model), vision, robotics, reasoning, search or human interaction, that can function as the base for other applications. The model is trained on extensive and diverse data sets.
expand_more
Generalization
Definition
The ability of a model (see also machine learning model) to understand the underlying patterns and trends in its training data and apply what it has learned to make predictions or decisions about new, unseen data.
expand_more
Generative AI
Definition
A field of AI that uses machine learning models trained on large data sets to create new content, such as written text, code, images, music, simulations and videos. These models are capable of generating novel outputs based on input data or user prompts.
expand_more
Greedy algorithms
Definition
A type of algorithm that makes the optimal choice to achieve an immediate objective at a particular step or decision point, based on the available information and without regard for the longer-term optimal solution.
expand_more
Hallucinations
Definition
Instances where a generative AI model creates content that either contradicts the source or creates factually incorrect output under the appearance of fact.
Data provided to or directly acquired by a learning algorithm or model (see also machine learning model) for the purpose of producing an output. It forms the basis upon which the machine learning model will learn, make predictions and/or carry out tasks.
expand_more
Large language model
Definition
A form of AI that utilizes deep learning algorithms to create models (see also machine learning model) trained on massive text data sets to analyze and learn patterns and relationships among characters, words and phrases. There are generally two types of LLMs: generative models that make text predictions based on the probabilities of word sequences learned from its training data (see also generative AI) and discriminative models that make classification predictions based on probabilities of data features and weights learned from its training data (see also discriminative model). The term "large" generally refers to the model's capacity measured by the number of parameters.
Acronym
LLM
expand_more
Machine learning
Definition
A subfield of AI involving algorithms that enable computer systems to iteratively learn from and then make decisions, inferences or predictions based on data (see also input data). These algorithms build a model from training data to perform a specific task on new data without being explicitly programmed to do so.
Machine learning implements various algorithms that learn and improve by experience in a problem-solving process that includes data cleansing, feature selection, training, testing and validation. Companies and government agencies deploy machine learning algorithms for tasks such as fraud detection, recommender systems, customer inquiries, natural language processing, health care, or transport and logistics.
Acronym
ML
expand_more
Machine learning model
Definition
A learned representation of underlying patterns and relationships in data, created by applying an AI algorithm to a training data set. The model can then be used to make predictions or perform tasks on new, unseen data.
expand_more
Multimodal models
Definition
A type of model used in machine learning (see also machine learning model) that can process more than one type of input or output data, or 'modality,' at the same time. For example, a multi-modal model can take both an image and text caption as input and then produce a unimodal output in the form of a score indicating how well the text caption describes the image. These models are highly versatile and useful in a variety of tasks, like image captioning and speech recognition.
expand_more
Natural language processing
Definition
A subfield of AI that helps computers understand, interpret and manipulate human language by transforming information into content. It enables machines to read text or spoken language, interpret its meaning, measure sentiment, and determine which parts are important for understanding.
expand_more
Neural networks
Definition
A type of model (see also machine learning model) used in machine learning that mimics the way neurons in the brain interact with multiple processing layers, including at least one hidden layer. This layered approach enables neural networks to model complex nonlinear relationships and patterns within data. Artificial neural networks have a range of applications, such as image recognition and medical diagnosis.
expand_more
Overfitting
Definition
A concept in machine learning in which a model (see also machine learning model) becomes too specific to the training data and cannot generalize to unseen data, which means it can fail to make accurate predictions on new data sets.
expand_more
Oversight
Definition
The process of effectively monitoring and supervising an AI system to minimize risks, ensure regulatory compliance and uphold responsible practices. Oversight is important for effective AI governance, and mechanisms may include certification processes, conformity assessments and regulatory authorities responsible for enforcement.
expand_more
Post processing
Definition
Steps performed after a machine learning model has been run to adjust the output of that model. This can include adjusting a model's outputs and/or using a holdout data set—data not used in the training of the model—to create a function that is run on the model's predictions to improve fairness or meet business requirements.
expand_more
Preprocessing
Definition
Steps taken to prepare data for a machine learning model, which can include cleaning the data, handling missing values, normalization, feature extraction and encoding categorical variables. Data preprocessing can play a crucial role in improving data quality, mitigating bias, addressing algorithmic fairness concerns, and enhancing the performance and reliability of machine learning algorithms.
expand_more
Random forest
Definition
A supervised machine learning (see also supervised learning) algorithm that builds multiple decision trees and merges them together to get a more accurate and stable prediction. Each decision tree is built with a random subset of the training data (see also bootstrap aggregating), hence the name "random forest.” Random forests are helpful to use with data sets that are missing values or very complex.
expand_more
Reinforcement learning
Definition
A machine learning method that trains a model to optimize its actions within a given environment to achieve a specific goal, guided by feedback mechanisms of rewards and penalties. This training is often conducted through trial-and-error interactions or simulated experiences that do not require external data. For example, an algorithm can be trained to earn a high score in a video game by having its efforts evaluated and rated according to success toward the goal.
expand_more
Reliability
Definition
An attribute of an AI system that ensures it behaves as expected and performs its intended function consistently and accurately, even with new data that it has not been trained on.
expand_more
Robotics
Definition
A multidisciplinary field that encompasses the design, construction, operation and programming of robots. Robotics allow AI systems and software to interact with the physical world.
expand_more
Robustness
Definition
An attribute of an AI system that ensures a resilient system that maintains its functionality and performs accurately in a variety of environments and circumstances, even when faced with changed inputs or an adversarial attack.
expand_more
Safety
Definition
The development of AI systems that are designed to minimize potential harm to individuals, society, property and the environment.
expand_more
Supervised learning
Definition
A subset of machine learning where the model (see also machine learning model) is trained on input data with known desired outputs. These two groups of data are sometimes called predictors and targets, or independent and dependent variables, respectively. This type of learning is useful for training an AI to group data into specific categories or making predictions by understanding the relationship between two variables.
expand_more
Synthetic data
Definition
Data generated by a system or model (see also machine learning model) that can mimic and resemble the structure and statistical properties of real data. It is often used for testing or training machine learning models, particularly in cases where real-world data is limited, unavailable or too sensitive to use.
expand_more
Testing data
Definition
A subset of the data set used to provide an unbiased evaluation of a final model (see also machine learning model). It is used to test the performance of the machine learning model with new data at the very end of the model development process.
expand_more
Training data
Definition
A subset of the data set that is used to train a model (see also machine learning model) until it can accurately predict outcomes, find patterns or identify structures within the training data.
expand_more
Transfer learning model
Definition
A type of model (see also machine learning model) used in machine learning in which an algorithm learns to perform one task, such as recognizing cats, and then uses that learned knowledge as a basis when learning a different but related task, such as recognizing dogs.
expand_more
Transparency
Definition
The extent to which information regarding an AI system is made available to stakeholders, including if one is used and an explanation of how it works. It implies openness, comprehensibility and accountability in the way AI algorithms function and make decisions.
In most cases used interchangeably with the terms responsible AI and ethical AI, which all refer to principle-based AI development and governance (see also AI governance), including the principles of security, safety, transparency, explainability, accountability, privacy, nondiscrimination/nonbias (see also bias), among others.
expand_more
Turing test
Definition
A test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Alan Turing (1912-1954) originally thought of the test to be an AI's ability to converse through a written text.
expand_more
Underfitting
Definition
A concept in machine learning in which a model (see also machine learning model) fails to fully capture the complexity of the training data. This may result in poor predictive ability and/or inaccurate outputs. Factors leading to underfitting may include too few model parameters or epochs, having too high a regularization rate, or using an inappropriate or insufficient set of features in the training data.
expand_more
Unsupervised learning
Definition
A subset of machine learning where the model is trained by looking for patterns in an unclassified data set with minimal human supervision. The AI is provided with preexisting data sets and then analyzes those data sets for patterns. This type of learning is useful for training an AI for techniques such as clustering data (outlier detection, etc.) and dimensionality reduction (feature learning, principal component analysis, etc.).
expand_more
Validation data
Definition
A subset of the data set used to assess the performance of the model (see also machine learning model) during the training phase. Validation data is used to fine-tune the parameters of a model and prevent overfitting before the final evaluation using the test data set.
expand_more
Variables
Definition
In the context of machine learning, a variable is a measurable attribute, characteristic or unit that can take on different values. Variables can be numerical/quantitative or categorical/qualitative.
expand_more
Variance
Definition
A statistical measure that reflects how far a set of numbers are spread out from their average value in a data set. A high variance indicates that the data points are spread widely around the mean. A low variance indicates the data points are close to the mean. In machine learning, higher variance can lead to overfitting. The trade-off between variance and bias is a fundamental concept in machine learning. Model complexity tends to reduce bias but increase variance. Decreasing complexity reduces variance but increases bias.
Artificial Intelligence Topic Page On this topic page, you can find the IAPP’s collection of coverage, analysis and resources on AI connections to the privacy space.
Privacy and AI Governance Report This report explores the state of AI governance in organizations and its overlap with privacy management.
Privacy and AI Governance Center The IAPP AI Governance Center provides privacy and AI governance professionals with the content, resources, networking, training and certification needed to respond to the complex risks in the AI field.
AI Governance Dashboard newsletter Stay on top of the latest AI governance news, learn about our new offerings and engage in the development of the AI governance profession.
This article provides a breakdown of artificial intelligence governance at the federal level, including the White House, Congress and federal agencies....
Discussions on the need to establish a governance framework for artificial intelligence took off following the public release of ChatGPT last November, which showed the world the impressive pace large language models, and generative AI in particular, are progressing.
The breakneck speed of AI devel...