Teaching Large Language Models to Think Twice

A Three-Stage Framework for Self-Correcting Mathematical Reasoning

Md Anisur Rahman Chowdhury*, Pratham Patel*, Shahajada Jawar, Kefei Wang

Department of Computer and Information Science
Gannon University, USA

Abstract

Modern large language models struggle with complex mathematical reasoning tasks despite their impressive performance across various natural language processing applications. The primary challenge lies in their inability to detect and correct errors during multi-step problem-solving, where early mistakes cascade through subsequent reasoning steps.

This paper presents a practical framework enabling language models to systematically verify and refine their solutions through structured self-correction. Our approach organizes the problem-solving process into three distinct stages: generating initial solutions with explicit identification of critical reasoning steps, analyzing these solutions for potential errors, and producing corrected final answers.

Video Overview

Research Walkthrough

Watch a detailed explanation of the S2C framework, methodology, and key results

Click to Play Video

S2C Framework - Research Overview

Key Results

49.9%

Accuracy on GSM8K

60%

Relative Improvement

78%

Error Correction Rate

Read the Full Paper

Page 1 of 7 - Preview Research Paper - First Page Preview

Access Restricted

To read or download the full research paper, please complete the following steps and enter the access password.

Step 1

Follow on GitHub

Follow the repository and star it to stay updated with the latest research.

Follow on GitHub
Step 2

Subscribe on YouTube

Subscribe to our channel for video explanations and research walkthroughs.

Subscribe on YouTube
Step 3

Request Permission

Send an email requesting access. Include your name and institution.

Send Request Email

Enter Access Password

After completing the steps above and receiving your password, enter it below to unlock the full paper.

Incorrect password. Please verify you have completed all steps and try again.

Access granted! You can now view and download the paper below.

Note: Downloaded PDF files are also password-protected. Use the same access password to open them.

Key Contributions

  • Three-stage self-correction framework (Generator, Critic, Synthesizer)
  • Novel three-phase training methodology combining supervised learning with reinforcement techniques
  • Comprehensive experimental validation on GSM8K mathematical reasoning benchmark
  • 60% relative improvement in accuracy (from 31.2% to 49.9%)
  • Detailed error analysis revealing 78% correction success rate on computational errors
  • Superior computational efficiency compared to ensemble methods

Contact

Corresponding Authors:
Md Anisur Rahman Chowdhury - engr.aanis@gmail.com
Pratham Patel - patel292@gannon.edu

Co-Authors:
Shahajada Jawar - shahajadajawar@gmail.com
Kefei Wang - wang039@gannon.edu