Ethical and social risks of harm from Language Models


This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences. The paper outlines six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, IV. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms.

The first risk area discusses fairness and toxicity risks in large-scale language models. This includes four distinct risks: LMs can create unfair discrimination and representational and material harm by perpetuating stereotypes and social biases, i.e. harmful associations of specific traits with social identities. Social norms and categories can exclude or marginalise those who exist outside them. Where a LM perpetuates such norms - e.g. that people called “Max” are “male”, or that “families” always consist of a father, mother and child - such narrow category use can deny or burden identities who differ. Toxic language can incite hate or violence or cause offense. Finally, a LM that performs more poorly for some social groups than others can create harm for disadvantaged groups, for example where such models underpin technologies that affect these groups. These risks stem in large part from choosing training corpora that include harmful language and overrepresent some social identities.

The second risk area includes risks from private data leaks or from LMs correctly inferring private or other sensitive information. These risks stem from private data that is present in the training corpus and from advanced inference capabilities of LMs.

The third risk area comprises risks associated with LMs providing false or misleading information. This includes the risk of creating less well-informed users and of eroding trust in shared information. Misinformation can cause harm in sensitive domains, such as bad legal or medical advice. Poor or false information may also lead users to perform unethical or illegal actions that they would otherwise not have performed. Misinformation risks stem in part from the processes by which LMs learn to represent language: the underlying statistical methods are not well-positioned to distinguish between factually correct and incorrect information.

The fourth risk area spans risks of users or product developers who try to use LMs to cause harm. This includes using LMs to increase the efficacy of disinformation campaigns, to create personalised scams or fraud at scale, or to develop computer code for viruses or weapon systems.

The fifth risk area focuses on risks from the specific use case of a “conversational agent” that directly interacts with human users. This includes risks from presenting the system as “human-like”, possibly leading users to overestimate its capabilities and use it in unsafe ways. Another risk is that conversation with such agents may create new avenues to manipulate or extract private information from users. LM-based conversational agents may pose risks that are already known from voice assistants, such as perpetuating stereotypes by self-presenting e.g. as “female assistant”. These risks stem in part from LM training objectives underlying such conversational agents and from product design decisions. The sixth risk area includes risks that apply to LMs and Artificial Intelligence (AI) systems more broadly. Training and operating LMs can incur high environmental costs. LM-based applications may benefit some groups more than others and the LMs themselves are inaccessible to many. Lastly, LM-based automation may affect the quality of some jobs and undermine parts of the creative economy. These risks manifest particularly as LMs are widely used in the economy and benefits and risks from LMs are globally unevenly distributed.

In total, we present 21 risks. We then discuss the points of origin of different risks and point to potential risk mitigation approaches. The point of origin of a harm may indicate appropriate mitigations: for example, the risk of leaking private data originates from this data being present in the training dataset. It can be mitigated at the point of origin, by better redaction or curation of training data. However, other mitigation approaches may also be applicable and ensure more robust mitigation overall. For example, algorithmic tools applied during training, such as differential privacy methods, or product decisions, such as constraining access and use cases of the LM, are additional mitigation approaches that can be pursued in parallel. Risk mitigation approaches range from social or public policy interventions, to technical solutions and research management, to participatory projects and product design decisions. Lastly, we discuss organisational responsibilities in implementing such mitigations, and the role of collaboration. Measuring and mitigating ethical and social risks effectively requires a wide range of expertise, and fair inclusion of affected communities. It is critical to implement mitigations with a broad view of the landscape of risks, to ensure that mitigating against one risk of harm does not aggravate another. Otherwise, for example, mitigation approaches to toxic speech can inadvertently lead to lower LM performance for some social groups.

We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs, and the need for inclusive participatory methods. Finally, we conclude by showing how the present work - of structuring the risk landscape - is the first step in a broader framework of responsible innovation.