Teaching Large Language Models to Think Twice

Abstract

Modern large language models struggle with complex mathematical reasoning tasks despite their impressive performance across various natural language processing applications. The primary challenge lies in their inability to detect and correct errors during multi-step problem-solving, where early mistakes cascade through subsequent reasoning steps.

This paper presents a practical framework enabling language models to systematically verify and refine their solutions through structured self-correction. Our approach organizes the problem-solving process into three distinct stages: generating initial solutions with explicit identification of critical reasoning steps, analyzing these solutions for potential errors, and producing corrected final answers.

Comprehensive evaluation demonstrates substantial improvements: 49.9% accuracy on GSM8K (60% relative improvement over 31.2% baseline), 21.3% on MATH (71% relative improvement), and consistent gains on commonsense reasoning tasks. Statistical significance testing confirms these improvements (p < 0.001).

Video Overview

Research Walkthrough

Watch a detailed explanation of the S2C framework, methodology, and key results

Click to Play Video

S2C Framework - Research Overview

Watch on YouTube Star on GitHub

Key Results

49.9%

Accuracy on GSM8K

60%

Relative Improvement

78%

Error Correction Rate

Access Full Paper

Page 1 of 7 - Preview Research Paper - First Page Preview

1

2

3

Step 1: Follow the Public Channels

Open the official profile and video links first, then confirm the two checks below.

Open GitHub Profile Open YouTube Video

I followed the GitHub profile. I subscribed to the YouTube channel.

Step 2: Request the Password

Send a short request email so the author can verify the purpose of access before sharing the archive password.

Email the Author

Subject: Password request for S2C research paper download

Hello Md Anisur Rahman Chowdhury,

I followed the GitHub profile, subscribed to the YouTube channel, and would like the password for the S2C research paper.

Name:
Institution:
Purpose of use:

Thank you.

I sent the password request email.

Step 3: Enter the Password

The downloaded files are also password-protected. Enter the password here to reveal the download links once you have received it from the author.

The password is incorrect. Request the correct password from the author and try again.

Access granted! Download links are now available below.

Files Ready

All archives are password-protected. Use the same password to open the downloaded files.

View PDF in Browser Download Research Paper (PDF) View Source Code on GitHub

Note: Downloaded PDF files are also password-protected. Use the same access password to open them.

Key Contributions

Three-Stage Self-Correction Framework (Generator, Critic, Synthesizer) that decomposes problem-solving into distinct computational personas, each optimized for specific cognitive functions.
Novel Three-Phase Training Methodology combining supervised fine-tuning on high-quality reasoning traces with reinforcement learning using a Hierarchical Process-Based Reward (HPBR) system.
Comprehensive Experimental Validation on GSM8K mathematical reasoning benchmark achieving 60% relative improvement in accuracy (from 31.2% to 49.9%).
Detailed Error Analysis revealing 78% correction success rate on computational errors, demonstrating the framework's effectiveness across different error types.
Superior Computational Efficiency compared to ensemble methods, using 74% fewer resources while achieving 29% higher accuracy.
Metacognitive Capabilities teaching LLMs to develop intrinsic self-correction abilities without requiring external verification systems.

Citation

@article{chowdhury2026teaching,
  title={Teaching Large Language Models to Think Twice: A Three-Stage Framework for Self-Correcting Mathematical Reasoning},
  author={Chowdhury, Md Anisur Rahman and Patel, Pratham and Jawar, Shahajada and Wang, Kefei},
  year={2026},
  institution={Gannon University}
}

Contact

Md Anisur Rahman Chowdhury

Gannon University

engr.aanis@gmail.com

Pratham Patel

Gannon University

patel292@gannon.edu

Shahajada Jawar

Gannon University

shahajadajawar@gmail.com

Kefei Wang

Gannon University

wang039@gannon.edu