Skip to main content

Cookie settings

We use cookies to ensure the basic functionalities of the website and to enhance your online experience. You can configure and accept the use of the cookies, and modify your consent options, at any time.

Essential

Preferences

Analytics and statistics

Marketing

T-TWICE : Mathematical Cognitive Reasoning Engine

Avatar: Candis Medjom Candis Medjom

Team name
T-TWICE team
Team members (First name, LAST NAME, University)
--Candis MEDJOM , Gustave Eiffel University --Johanna FOKUI , Aivancity School ---William KANKEU , Oteria Cyber School
What area does your use case primarily fall under?
Training / education / pedagogy
The AI use case you are working on
Many university math students now use AI to get instant answers to proofs. The effort of working through the reasoning is fading. Our team explores an AI tutor designed to guide rather than answer. It reads studentsโ€™ reasoning, detects where logic breaks, and responds with questions. At the same time, it reveals recurring reasoning errors across a class, helping professors uncover hidden learning.
Why this use case matters
AI tools are increasingly present in higher education and are changing how students approach learning. In mathematics, where understanding and reasoning are central, instant-answer systems can sometimes reduce opportunities for students to engage deeply with problems. At the same time, professors often face large classes and limited time, making it difficult to identify how students reason and where misunderstandings occur. This use case explores whether AI could instead support learning by encouraging reasoning rather than simply providing solutions, while also helping teachers better understand common difficulties within their classes. However, it also raises important questions about the role of AI in education: how student reasoning data should be used, how to avoid excessive monitoring, and how to balance pedagogical benefits with broader concerns such as fairness, sustainability, and the evolving relationship between students, teachers, and technology.
Your team's motivation and learning objectives
We are the students this challenge is about. One of us studies AI and builds the systems. Another studies mathematics and struggles with the proofs. The third studies cybersecurity and data protection law, the one who asks whether we even should before we build. We have seen all three sides: students misusing ChatGPT for homework, professors overwhelmed by copies they can't analyze, and privacy questions that most ed-tech projects leave for later. We want to explore how AI could help students think mathematically without giving answers, and quickly hit questions we cannot answer alone, about dependency, the line between pedagogical analytics and surveillance, and environmental cost. This challenge is our chance to confront these tensions with experts, professors, and peers. We want to come out with better questions, not just better software.
Your initial contribution
The idea for this project came from a lived experience. Candis, during her early years studying mathematics at university, relied heavily on AI to understand exercises and courses. The answers seemed clear, and she felt she was progressing. But her exam results told a different story. She had been understanding solutions without building her own reasoning. The turning point came when she started working with a former mathematics student who refused to give answers, instead asking questions, pushing her to write down her thinking even when it felt wrong, and helping her see where her reasoning broke down. It was slow and sometimes frustrating, but it was the first time she truly progressed. Later, as a private tutor herself, she observed the same pattern in her students: they used AI to get solutions fast but struggled to reason on their own. โ € This experience is where T-Twice (Think - Twice) was born. We are three students from different disciplines, Johanna (AI engineering), Candis (mathematics and actuarial science), and Andelson (data protection law), and we set out to explore whether AI could be redesigned to support mathematical reasoning without replacing the thinking process. โ € โ € ๐—ง๐—ต๐—ฒ ๐˜€๐—ถ๐˜๐˜‚๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜„๐—ฒ ๐—ฎ๐—ฟ๐—ฒ ๐—ฒ๐˜…๐—ฎ๐—บ๐—ถ๐—ป๐—ถ๐—ป๐—ด โ € University mathematics students increasingly rely on generative AI to complete assignments. They get answers instantly and learn nothing. This creates a paradox: the tool designed to help students think can actually prevent them from thinking. Meanwhile, professors grading hundreds of papers can see individual mistakes but cannot detect that many students in their class make the same type of error again and again. These patterns remain invisible until final exams, too late to act. โ € No widely available tool addresses this gap. ChatGPT gives answers. Formal proof assistants are too complex for undergraduates. Learning analytics platforms track activity but rarely analyze the quality of reasoning. And most raise serious questions about data privacy and compliance with the EU AI Act that remain unanswered. โ € โ € ๐—ข๐˜‚๐—ฟ ๐—ฐ๐—ฟ๐—ถ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€ โ € To ground our analysis in reality, we built a functional prototype and tested it with real students. At every stage, the feedback we received changed our thinking. โ € The prototype works as follows: the student writes their reasoning, the AI detects the type of error from 13 types of reasoning errors identified by mathematics education researchers (Weber, 2001; Selden and Selden, 2003; Harel and Sowder, 1998), and responds with a guiding question to help the student find the answer themselves. It never gives the answer directly. It includes a system that identifies each student's pattern of mistakes, four levels of help from detailed guidance to full autonomy, GDPR-compliant data management, and carbon footprint tracking per session. โ € But the prototype is not our contribution. It is our method of investigation. By building and testing, we discovered things that reading research alone could not teach us. โ € Johanna noticed that during testing, some students quickly started sending bare answers without showing their reasoning, waiting for the AI to do the thinking. This led us to redesign the system so that it refuses to validate any answer without explicit justification, even correct ones. It convinced us that any AI tutoring tool must be designed to become less helpful over time, not more. โ € Candis tested the system on problems she knew well and found that while major mathematical errors were rare, the AI occasionally made smaller mistakes, such as misidentifying the precise type of reasoning error. This convinced us that confidence indicators and human oversight are not optional features but ethical requirements for any AI used in education. โ € Andelson, reviewing the system from a legal perspective, raised concerns about how student data was exposed in our first version, which directly shaped the privacy architecture we describe below. โ € But the most important feedback came from the students themselves. Through early informal testing with a small group of students Candis tutors privately, they said something we had not anticipated: they wanted the AI to match their professor's expectations. They were not just looking for correct guidance, they wanted guidance calibrated to what their specific professor considers important, uses as notation, and expects on an exam. A generic tutor, however accurate, was not enough. โ € Students also told us they sometimes doubted the AI's feedback and wished their professor could step in to confirm, correct, or nuance what the AI said. This was the moment we realized that building an AI tool is not enough. A professor will not trust a system they cannot oversee. And nobody is more skeptical than a professor, rightly so. โ € โ € ๐—ง๐—ต๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐˜€๐—ฝ๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ๐˜€ ๐˜„๐—ถ๐˜๐—ต๐—ถ๐—ป ๐—ผ๐˜‚๐—ฟ ๐˜๐—ฒ๐—ฎ๐—บ โ € These findings shaped our most important team debate: how much should the professor see and control? โ € Initially, Andelson had designed strict privacy protections: the professor could only access aggregated statistics, never individual conversations. His reasoning was sound under GDPR: a student who feels observed will self-censor, and self-censoring kills learning. โ € But Candis brought the perspective of a teacher. Students doing assigned exercises want to be followed. They want their professor to see their effort, correct the AI when it is wrong, and comment on their reasoning. An assigned exercise is a digital copy, not a private diary. Students themselves asked for this. โ € We resolved this by putting the choice in the student's hands. Two modes: private mode for free practice where the professor sees nothing, and shared mode for assigned exercises where the professor can follow the work and respond. The student always knows which mode they are in. The consent is free, informed, and specific to each exercise. โ € On the choice of AI model, we navigated the tension between performance and sovereignty. The most accurate model for mathematical reasoning is not European. Candis was clear: for a tool diagnosing reasoning errors, accuracy is an ethical obligation. We chose the best model available but designed the architecture so that migration to any alternative takes seconds, and all data processing stays within Europe. โ € On ethics, Andelson pushed us beyond discussion into implementation. Informed consent at signup. A page showing each student exactly what the system knows and what the professor can see. One-click data deletion and export. Cognitive profiles never used for automated decisions or grading, in line with the EU AI Act's requirements for high-risk AI systems in education. A risk assessment for data protection built into the app as a visible page, not a buried document. โ € โ € ๐—ช๐—ต๐—ฎ๐˜ ๐˜„๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ผ๐˜€๐—ฒ โ € Our contribution is a set of evidence-based recommendations for introducing AI tutoring responsibly in higher education, grounded in what we learned by building and testing a prototype that Johanna developed from the ground up and that the team then tested with real students. โ € Our core conviction: AI in education should be designed as a space where the student writes their reasoning freely, makes mistakes, and learns from errors. The AI analyzes the reasoning after the student produces it, identifies the type of error, and asks a question to guide the student toward finding the answer themselves. It never gives the answer directly. The professor calibrates the AI to their pedagogy, follows assigned work, and corrects the AI when it is wrong. The AI proposes. The professor decides. This is not a limitation. It is the design. โ € Based on what we learned, we propose five recommendations for universities and policymakers: โ € ๐—™๐—ถ๐—ฟ๐˜€๐˜, ๐—ฐ๐—ผ๐—ด๐—ป๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฎ๐˜‚๐˜๐—ผ๐—ป๐—ผ๐—บ๐˜† ๐—ฏ๐˜† ๐—ฑ๐—ฒ๐˜€๐—ถ๐—ด๐—ป. Any AI tutoring system should reduce its own helpfulness over time. The goal is a student who reasons well without AI, not one who performs well with it. โ € ๐—ฆ๐—ฒ๐—ฐ๐—ผ๐—ป๐—ฑ, ๐—ด๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ป๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—ฐ๐—ผ๐—ด๐—ป๐—ถ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—ฑ๐—ฎ๐˜๐—ฎ. This data should never be used for grading, selection, or institutional decisions, in compliance with the EU AI Act. Students should see exactly what is collected, control their data, and be able to delete everything. โ € ๐—ง๐—ต๐—ถ๐—ฟ๐—ฑ, ๐—ฒ๐—ป๐˜ƒ๐—ถ๐—ฟ๐—ผ๐—ป๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น ๐—ฎ๐—ฐ๐—ฐ๐—ผ๐˜‚๐—ป๐˜๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†. Universities procuring AI tools should require transparency about energy sources, infrastructure efficiency, and carbon cost per interaction. โ € ๐—™๐—ผ๐˜‚๐—ฟ๐˜๐—ต, ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜† ๐˜๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐—ฟ ๐—ฝ๐—ฟ๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป. Professors must understand what AI tools show and what they do not show before any deployment. Without training, the best tool can be misused. โ € ๐—™๐—ถ๐—ณ๐˜๐—ต, ๐—ฒ๐˜ƒ๐—ถ๐—ฑ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฏ๐—ฒ๐—ณ๐—ผ๐—ฟ๐—ฒ ๐˜€๐—ฐ๐—ฎ๐—น๐—ฒ. No educational AI tool should be deployed widely without controlled trials measuring actual learning outcomes. โ € Looking ahead, Johanna is already working on the next evolution: full professor calibration, where the professor provides their course material and the AI uses their definitions, their notation, and their progression. The AI does not follow generic rules. It follows this professor's pedagogy, for this class, at this point in the course. This is the level of trust that would make even the most skeptical professor consider using the tool. We also plan to adapt T-Twice for students with learning disabilities such as dyslexia, dyscalculia, and dysorthographia, drawing on Candis's training in teaching these profiles. Mathematical reasoning is not less important for these students; it is harder to express, and the tool should help, not hinder. โ € We are clear-eyed about what remains to be solved. AI models can make occasional errors, but as one professor pointed out to us, even these errors can become learning moments: a student who catches the AI making a mistake is developing exactly the critical thinking we want to build. This is why human oversight must always be part of the design. Our error classification needs validation from mathematics education researchers to move from promising to proven. And no tool, however well-designed, can substitute for a teacher who inspires or address structural inequalities between institutions. These are not reasons to stop. They are reasons to test rigorously, iterate openly, and never deploy without human oversight. โ € ๐—›๐—ผ๐˜„ ๐—ฐ๐—ฎ๐—ป ๐—”๐—œ ๐—ต๐—ฒ๐—น๐—ฝ ๐˜€๐˜๐˜‚๐—ฑ๐—ฒ๐—ป๐˜๐˜€ ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ผ๐—ป ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฑ๐—ฒ๐—ฝ๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ป๐—ฐ๐˜†, ๐—ด๐˜‚๐—ถ๐—ฑ๐—ฒ ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ต๐—ผ๐—บ๐—ผ๐—ด๐—ฒ๐—ป๐—ถ๐˜‡๐—ถ๐—ป๐—ด, ๐—ฎ๐—ป๐—ฑ ๐—ฝ๐—ฒ๐—ฟ๐˜€๐—ผ๐—ป๐—ฎ๐—น๐—ถ๐˜‡๐—ฒ ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐˜€๐˜‚๐—ฟ๐˜ƒ๐—ฒ๐—ถ๐—น๐—น๐—ถ๐—ป๐—ด? We believe this question deserves collective deliberation, and we look forward to engaging with other perspectives throughout this challenge.
Comment

Confirm

Please log in

The password is too short.

Share