Teaching Large Language Models to Think Twice

Abstract

Modern large language models struggle with complex mathematical reasoning tasks despite their impressive performance across various natural language processing applications. The primary challenge lies in their inability to detect and correct errors during multi-step problem-solving, where early mistakes cascade through subsequent reasoning steps.

This paper presents a practical framework enabling language models to systematically verify and refine their solutions through structured self-correction. Our approach organizes the problem-solving process into three distinct stages: generating initial solutions with explicit identification of critical reasoning steps, analyzing these solutions for potential errors, and producing corrected final answers.

Video Overview

Research Walkthrough

Watch a detailed explanation of the S2C framework, methodology, and key results

Click to Play Video

S2C Framework - Research Overview

Key Results

49.9%

Accuracy on GSM8K

60%

Relative Improvement

78%

Error Correction Rate

Read the Full Paper

Page 1 of 7 - Preview Research Paper - First Page Preview

Access Restricted

To read or download the full research paper, please complete the following steps and enter the access password.

Step 1

Follow on GitHub

Follow the repository and star it to stay updated with the latest research.

Follow on GitHub

Step 2

Subscribe on YouTube

Subscribe to our channel for video explanations and research walkthroughs.

Subscribe on YouTube

Step 3

Request Permission

Send an email requesting access. Include your name and institution.

Send Request Email

Enter Access Password

After completing the steps above and receiving your password, enter it below to unlock the full paper.

Incorrect password. Please verify you have completed all steps and try again.

Access granted! You can now view and download the paper below.

Note: Downloaded PDF files are also password-protected. Use the same access password to open them.

View PDF in Browser Download PDF View on GitHub

Key Contributions

Three-stage self-correction framework (Generator, Critic, Synthesizer)
Novel three-phase training methodology combining supervised learning with reinforcement techniques
Comprehensive experimental validation on GSM8K mathematical reasoning benchmark
60% relative improvement in accuracy (from 31.2% to 49.9%)
Detailed error analysis revealing 78% correction success rate on computational errors
Superior computational efficiency compared to ensemble methods

Contact

Corresponding Authors:
Md Anisur Rahman Chowdhury - engr.aanis@gmail.com
Pratham Patel - patel292@gannon.edu

Co-Authors:
Shahajada Jawar - shahajadajawar@gmail.com
Kefei Wang - wang039@gannon.edu