Abstract
Modern large language models struggle with complex mathematical reasoning tasks despite their impressive performance across various natural language processing applications. The primary challenge lies in their inability to detect and correct errors during multi-step problem-solving, where early mistakes cascade through subsequent reasoning steps.
This paper presents a practical framework enabling language models to systematically verify and refine their solutions through structured self-correction. Our approach organizes the problem-solving process into three distinct stages: generating initial solutions with explicit identification of critical reasoning steps, analyzing these solutions for potential errors, and producing corrected final answers.
Comprehensive evaluation demonstrates substantial improvements: 49.9% accuracy on GSM8K (60% relative improvement over 31.2% baseline), 21.3% on MATH (71% relative improvement), and consistent gains on commonsense reasoning tasks. Statistical significance testing confirms these improvements (p < 0.001).
Video Overview
Research Walkthrough
Watch a detailed explanation of the S2C framework, methodology, and key results
Key Results
Access Full Paper
Step 1: Follow the Public Channels
Open the official profile and video links first, then confirm the two checks below.
Step 2: Request the Password
Send a short request email so the author can verify the purpose of access before sharing the archive password.
Step 3: Enter the Password
The downloaded files are also password-protected. Enter the password here to reveal the download links once you have received it from the author.
The password is incorrect. Request the correct password from the author and try again.
Access granted! Download links are now available below.
Files Ready
All archives are password-protected. Use the same password to open the downloaded files.
Key Contributions
- Three-Stage Self-Correction Framework (Generator, Critic, Synthesizer) that decomposes problem-solving into distinct computational personas, each optimized for specific cognitive functions.
- Novel Three-Phase Training Methodology combining supervised fine-tuning on high-quality reasoning traces with reinforcement learning using a Hierarchical Process-Based Reward (HPBR) system.
- Comprehensive Experimental Validation on GSM8K mathematical reasoning benchmark achieving 60% relative improvement in accuracy (from 31.2% to 49.9%).
- Detailed Error Analysis revealing 78% correction success rate on computational errors, demonstrating the framework's effectiveness across different error types.
- Superior Computational Efficiency compared to ensemble methods, using 74% fewer resources while achieving 29% higher accuracy.
- Metacognitive Capabilities teaching LLMs to develop intrinsic self-correction abilities without requiring external verification systems.
Citation
@article{chowdhury2026teaching,
title={Teaching Large Language Models to Think Twice: A Three-Stage Framework for Self-Correcting Mathematical Reasoning},
author={Chowdhury, Md Anisur Rahman and Patel, Pratham and Jawar, Shahajada and Wang, Kefei},
year={2026},
institution={Gannon University}
}
Contact
Gannon University
engr.aanis@gmail.comGannon University
patel292@gannon.eduGannon University
shahajadajawar@gmail.comGannon University
wang039@gannon.edu