Unequal by Design: Addressing Linguistic Bias in AI-Powered Education

Chloe Park 12/04/2026 15:21

Team name: Lexicon Lab
Team members (First name, LAST NAME, University): Chloe, Park, Yale University Sadra, Aliakbarpour, Yale University Zoya, Hussain, Yale University Gabriel, Mena, Yale University Sophia, Eno, Yale University
What area does your use case primarily fall under?: Training / education / pedagogy
The AI use case you are working on: In higher education, students and faculty increasingly interact with LLMs for explanations, writing, and research assistance. By comparing outputs across languages, we analyze differences in reasoning, content, and usefulness, and collect international user experiences in academic contexts. With this information, we assess potential disparities in performance and examine whether AI imposes standardized ideas across linguistic contexts.
Why this use case matters: This issue introduces concerns about unequal access to quality support from AI in higher education. If LLMs perform better in certain languages than others, students and faculty working in other languages may receive less accurate, nuanced, or helpful outputs for academic development. The LLM may also impose dominant rhetorical norms that overlook diverse cultural intellect. Rather than being a neutral tool, AI may actively shape cognitive outcomes differently based on the language used for inquiry, raising concerns about an unforeseen standardization of knowledge and the narrowing of intellectual diversity. As AI becomes further integrated into institutions, these biases may disadvantage those of certain linguistic communities and influence what knowledge and opinions are produced and disseminated.
Your team's motivation and learning objectives: As a collectively multilingual and multicultural team, we have grown up with a heightened awareness of how usage of a language is often accompanied by a variety of cultural, educational and social characteristics. With the onset of AI, we have also personally noticed differences in how LLMs respond across languages. We want to critically examine these differences to understand how algorithmic bias and automated assumptions may affect access to knowledge, quality of learning, and academic expression. Through this project, we hope to better understand the influence of an inquiry’s language on LLM-generated outputs and explore how AI can better support diverse linguistic communities in higher education.
Your initial contribution: A student in Chicago asks an LLM for help with an English paper. Across the world, a student in Dakar makes the same request in French. In principle, both students should receive similar levels of support, but in reality, the English speaking student receives better support—every single time, across millions of queries. LLMs are increasingly being used worldwide, but most are trained with predominantly English data. This means students and faculty operating in other languages are getting systematically worse educational support, whether for assistance in research papers, academic feedback, tutoring, writing guidance–all without knowing it. This disparity is invisible, and neither the students nor institutions can tell it’s happening without models flagging it to the user. We find the impacts to be just as diverse as the affected actors. Limited training data causes worse and more similar outputs, which is an issue on its own for matters of educational equality, and it can also lead to the homogenization of LLM-assisted submissions, like research papers or student academic papers. Furthermore, loss of accuracy can compound. For instance, a student asking for assistance based upon an image of their Korean handwriting will be disadvantaged at every step of the way: handwriting recognition is worse for non-Latin scripts, then the interpretation is interrupted since most models internally translate the scanned text to English, after which the retrieved knowledge is dominated by English training data, and finally the output generation is generally weaker for non-English languages in many models. These impacts pose many challenges for academic institutions who adopt AI models without requirements to test performance across their student body’s linguistic profile. In deliberating the structure of our contribution, the first issue we considered was the educational area to focus on. We discussed exploring how quality might differ across various inputs, like reading PDFs, text-recognition from images, handwriting recognition, math problems, text editing, and passage generation, depending on the language of the prompt. We then decided to explore response generation to various prompts across different languages, because it encapsulates multiple layers of AI interaction (such as comprehension of context, linguistic and cultural nuance, coherence of generated text), which makes it a robust proxy for evaluating AI educational support. The training data, internal translation of tokens, and internal structure of response all influence the quality of the generated response. By analyzing these factors, we can identify specific ways AI may either support or hinder learning, such as reinforcing accurate knowledge or shaping how students engage with multilingual content. In this way, we felt that focusing on this area would be both concrete and highlight diverse areas of intervention. The implications of LLM output on education is significant, since AI-generated passages could directly impact students’ comprehension and critical thinking; understanding the limitations and strengths across languages ensures more equitable and effective learning tools. We plan to survey students and faculty in academia across the varied languages of French, Korean, Farsi, Spanish, and Chinese. Members of our university and other contacts in academia will receive an anonymous survey and be asked to send it to others they know. The survey will ask participants what languages they speak, whether they are native speakers, their academic status, and the contexts in which they employ AI tools. We will then have them evaluate AI generated materials given in their reported language(s). They will receive one to three real outputs from OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude and be asked to rate their satisfaction levels for each response. The text will be presented in random order, and they will be blind to the LLM that generated it. Upon collecting this data, we will analyze the average satisfaction level and the standard deviation with the generated text for each language. Addressing this issue necessitates a greater level of transparency from AI developers. We propose that this be achieved through two methods. First, companies should disclose the linguistic composition of training data, and clearly indicate this (in a one-sentence summary on the screen, for example). Second, when the LLM is undergoing a translation in the process of generating an output, this should also be made clear to the user (in a phrase describing the current step of the generation process, for example). From the user side, this enables users to be aware of a level of relative output quality and understand the limitations of the tools they rely on. From the developer side, transparency requirements naturally incentivize companies to improve multilingual training. We acknowledge the difficulty in the latter implementation method, as the moment of translation is difficult to discern from a programmer’s perspective due to the inherent “black-box” structure of LLMs. Thus, we will work with developers to pinpoint this moment relative to the LLMs' “thinking” process to offer a layer of additional transparency to users. While no solution is without trade-offs, prioritizing linguistic equity and transparency are essential first steps in ameliorating existing imbalances. The invisible AI bias that privileges and disadvantages different demographics is an overlooked fissure that must be addressed.

Comment

1 comment

Best rated

Recent

Older

Most discussed

Loading comments ...

Reference: INRIA-PROP-2026-04-79
Version number 1 (of 1) see other versions

Fingerprint

The piece of text below is a shortened, hashed representation of this content. It is useful to ensure the content has not been tampered with, as a single modification would result in a totally different value.

Value: fff3867c4d3a619e17124bc6efdb75ed7fbc17da02c0a5c97bf6ad08c4ecb301

Source:

{"body":{"en":"<xml><dl class=\"decidim_awesome-custom_fields\" data-generator=\"decidim_awesome\" data-version=\"0.12.6\">\n<dt name=\"textarea-1772188078816-0\">Team name</dt>\n<dd id=\"textarea-1772188078816-0\" name=\"textarea\"><div>Lexicon Lab</div></dd>\n<dt name=\"textarea-1772188112772-0\">Team members (First name, LAST NAME, University)</dt>\n<dd id=\"textarea-1772188112772-0\" name=\"textarea\"><div>Chloe, Park, Yale University\nSadra, Aliakbarpour, Yale University\nZoya, Hussain, Yale University\nGabriel, Mena, Yale University\nSophia, Eno, Yale University</div></dd>\n<dt name=\"radio-group-1772188319073-0\">What area does your use case primarily fall under?</dt>\n<dd id=\"radio-group-1772188319073-0\" name=\"radio-group\"><div alt=\"training\">Training / education / pedagogy</div></dd>\n<dt name=\"textarea-1772792126695-0\">The AI use case you are working on</dt>\n<dd id=\"textarea-1772792126695-0\" name=\"textarea\"><div>In higher education, students and faculty increasingly interact with LLMs for explanations, writing, and research assistance. By comparing outputs across languages, we analyze differences in reasoning, content, and usefulness, and collect international user experiences in academic contexts. With this information, we assess potential disparities in performance and examine whether AI imposes standardized ideas across linguistic contexts.</div></dd>\n<dt name=\"textarea-1772792488518-0\">Why this use case matters</dt>\n<dd id=\"textarea-1772792488518-0\" name=\"textarea\"><div>This issue introduces concerns about unequal access to quality support from AI in higher education. If LLMs perform better in certain languages than others, students and faculty working in other languages may receive less accurate, nuanced, or helpful outputs for academic development. The LLM may also impose dominant rhetorical norms that overlook diverse cultural intellect. Rather than being a neutral tool, AI may actively shape cognitive outcomes differently based on the language used for inquiry, raising concerns about an unforeseen standardization of knowledge and the narrowing of intellectual diversity. As AI becomes further integrated into institutions, these biases may disadvantage those of certain linguistic communities and influence what knowledge and opinions are produced and disseminated. </div></dd>\n<dt name=\"textarea-1772792380575-0\">Your team's motivation and learning objectives</dt>\n<dd id=\"textarea-1772792380575-0\" name=\"textarea\"><div>As a collectively multilingual and multicultural team, we have grown up with a heightened awareness of how usage of a language is often accompanied by a variety of cultural, educational and social characteristics. With the onset of AI, we have also personally noticed differences in how LLMs respond across languages. We want to critically examine these differences to understand how algorithmic bias and automated assumptions may affect access to knowledge, quality of learning, and academic expression. Through this project, we hope to better understand the influence of an inquiry’s language on LLM-generated outputs and explore how AI can better support diverse linguistic communities in higher education. \n</div></dd>\n<dt name=\"textarea-1772792857176-0\">Your initial contribution</dt>\n<dd id=\"textarea-1772792857176-0\" name=\"textarea\"><div>A student in Chicago asks an LLM for help with an English paper. Across the world, a student in Dakar makes the same request in French. In principle, both students should receive similar levels of support, but in reality, the English speaking student receives better support—every single time, across millions of queries. LLMs are increasingly being used worldwide, but most are trained with predominantly English data. This means students and faculty operating in other languages are getting systematically worse educational support, whether for assistance in research papers, academic feedback, tutoring, writing guidance–all without knowing it. This disparity is invisible, and neither the students nor institutions can tell it’s happening without models flagging it to the user.\n\nWe find the impacts to be just as diverse as the affected actors. Limited training data causes worse and more similar outputs, which is an issue on its own for matters of educational equality, and it can also lead to the homogenization of LLM-assisted submissions, like research papers or student academic papers. Furthermore, loss of accuracy can compound. For instance, a student asking for assistance based upon an image of their Korean handwriting will be disadvantaged at every step of the way: handwriting recognition is worse for non-Latin scripts, then the interpretation is interrupted since most models internally translate the scanned text to English, after which the retrieved knowledge is dominated by English training data, and finally the output generation is generally weaker for non-English languages in many models. These impacts pose many challenges for academic institutions who adopt AI models without requirements to test performance across their student body’s linguistic profile.\n\nIn deliberating the structure of our contribution, the first issue we considered was the educational area to focus on. We discussed exploring how quality might differ across various inputs, like reading PDFs, text-recognition from images, handwriting recognition, math problems, text editing, and passage generation, depending on the language of the prompt. We then decided to explore response generation to various prompts across different languages, because it encapsulates multiple layers of AI interaction (such as comprehension of context, linguistic and cultural nuance, coherence of generated text), which makes it a robust proxy for evaluating AI educational support. The training data, internal translation of tokens, and internal structure of response all influence the quality of the generated response. By analyzing these factors, we can identify specific ways AI may either support or hinder learning, such as reinforcing accurate knowledge or shaping how students engage with multilingual content. In this way, we felt that focusing on this area would be both concrete and highlight diverse areas of intervention. The implications of LLM output on education is significant, since AI-generated passages could directly impact students’ comprehension and critical thinking; understanding the limitations and strengths across languages ensures more equitable and effective learning tools.\n\nWe plan to survey students and faculty in academia across the varied languages of French, Korean, Farsi, Spanish, and Chinese. Members of our university and other contacts in academia will receive an anonymous survey and be asked to send it to others they know. The survey will ask participants what languages they speak, whether they are native speakers, their academic status, and the contexts in which they employ AI tools. We will then have them evaluate AI generated materials given in their reported language(s). They will receive one to three real outputs from OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude and be asked to rate their satisfaction levels for each response. The text will be presented in random order, and they will be blind to the LLM that generated it. Upon collecting this data, we will analyze the average satisfaction level and the standard deviation with the generated text for each language.  \n\nAddressing this issue necessitates a greater level of transparency from AI developers. We propose that this be achieved through two methods. First, companies should disclose the linguistic composition of training data, and clearly indicate this (in a one-sentence summary on the screen, for example). Second, when the LLM is undergoing a translation in the process of generating an output, this should also be made clear to the user (in a phrase describing the current step of the generation process, for example). From the user side, this enables users to be aware of a level of relative output quality and understand the limitations of the tools they rely on. From the developer side, transparency requirements naturally incentivize companies to improve multilingual training. We acknowledge the difficulty in the latter implementation method, as the moment of translation is difficult to discern from a programmer’s perspective due to the inherent “black-box” structure of LLMs. Thus, we will work with developers to pinpoint this moment relative to the LLMs' “thinking” process to offer a layer of additional transparency to users.\n\nWhile no solution is without trade-offs, prioritizing linguistic equity and transparency are essential first steps in ameliorating existing imbalances. The invisible AI bias that privileges and disadvantages different demographics is an overlooked fissure that must be addressed.</div></dd>\n</dl></xml>"},"title":{"en":"Unequal by Design: Addressing Linguistic Bias in AI-Powered Education"}}

This fingerprint is calculated using a SHA256 hashing algorithm. In order to replicate it yourself, you can use an MD5 calculator online and copy-paste the source data.

Essential

Preferences

Analytics and statistics

Marketing

Unequal by Design: Addressing Linguistic Bias in AI-Powered Education

Please log in

Cookie settings

Essential

Preferences

Analytics and statistics

Marketing

Unequal by Design: Addressing Linguistic Bias in AI-Powered Education

Confirm

Please log in

Share