IEEE-Style Research Project

A Local Distributed Multi-Agent LLM Ensemble System

Orchestrated local inference across multiple agent models with modular aggregation, reproducible benchmarks, statistical significance testing, and publication-ready outputs.

Md Anisur Rahman Chowdhury1*, Kefei Wang1

1Dept. of Computer and Information Science, Gannon University, USA

engr.aanis@gmail.com | wang039@gannon.edu

Start Here

Use this public portal in three simple steps.

1

Read the Architecture

Open the interactive architecture map to understand query flow from orchestrator to agents.

2

Try a Live Query

Go to Playground, set your public orchestrator URL, and run one strategy with fixed seed.

3

Check Results and Paper

Review benchmark charts, CSV/JSON exports, and the full IEEE manuscript from this same site.

Abstract

This project presents a complete local distributed AI network in which one FastAPI orchestrator routes prompts to four local Ollama-based LLM agents, aggregates answers with multiple ensemble strategies, and logs benchmark-ready metrics. The system implements majority voting, dynamic weighted voting, inverse surprising popularity (ISP), topic-based routing, and two-round debate. Evaluation is automated over MMLU, GSM8K, and TruthfulQA with deterministic controls, repeated trials, confidence intervals, paired t-tests, and Wilcoxon signed-rank tests. All outputs are exported as CSV, JSON, PNG figures, and IEEE LaTeX tables, with an accompanying manuscript package for conference submission.

Research Video Walkthrough

Watch the full visual explanation of this distributed AI ensemble project, including architecture flow, live multi-agent querying, optimization, and benchmark outcomes.

Why watch this video?

  • Quickly understand the full end-to-end system flow.
  • See how all agent models collaborate in real time.
  • Review benchmark outputs and statistical reporting.
Open Video on YouTube

System Architecture

Node Layout

  • Orchestrator: 172.16.185.223
  • Agent 1: 172.16.185.209 (llama3.2:3b)
  • Agent 2: 172.16.185.218 (qwen2.5:3b)
  • Agent 3: 172.16.185.220 (phi3:mini)
  • Agent 4: 172.16.185.222 (gemma2:2b)

Aggregation Methods

  • Majority Voting
  • Dynamic Weighted Voting
  • Inverse Surprising Popularity (ISP)
  • Topic-Based Routing
  • Two-Round Debate
User Query Orchestrator (FastAPI) Agent 1: llama3.2:3b Agent 2: qwen2.5:3b Agent 3: phi3:mini Agent 4: gemma2:2b Aggregation + Metrics + Statistical Analysis

Click any node in the architecture map to view role details.

Live AI Playground

This playground calls your orchestrator /query API directly. For public use, set a public HTTPS endpoint (for example, your tunnel URL) with CORS enabled on the orchestrator.

Easy Steps

  1. For this deployment, use https://ai.marcbd.site as the Orchestrator API URL.
  2. Click Check Connection to test /health.
  3. Select one or more models in Select Models (you can compare models individually).
  4. Recommended stable test settings: Strategy majority, Seed 42, Temperature 0.0, Deterministic ON, Max tokens 24-64.
  5. Click Run Query and review the final answer, JSON output, and model charts below.

Support Endpoint

Recommended public endpoint for this project:

https://ai.marcbd.site

Open /health

Example endpoint: https://your-subdomain.example.com (without trailing /query).

Click Check Connection to load models from /agents.

Response

Waiting for query...

No response yet.

Live Query Insights

Final Answer -
Agreement -
Total Latency -
Active Models -

Agreement distribution will appear after query.

Per-Model Latency (ms)

Per-Model Token Count

Answer Distribution by Model

Per-Model Result Table

Agent Model Answer Latency (ms) Tokens Status
Run a query to populate per-model results.

Paper Access

Only the public abstract is shown by default. Full manuscript reading and downloads are restricted behind your follow, subscribe, permission-request, and password gate.

Public Abstract

This project presents a complete local distributed AI network in which one FastAPI orchestrator routes prompts to four local Ollama-based LLM agents, aggregates answers with multiple ensemble strategies, and logs benchmark-ready metrics. The system implements majority voting, dynamic weighted voting, inverse surprising popularity, topic-based routing, and two-round debate. Evaluation is automated over MMLU, GSM8K, and TruthfulQA with deterministic controls, repeated trials, confidence intervals, paired t-tests, and Wilcoxon signed-rank tests. All outputs are exported as CSV, JSON, PNG figures, and IEEE LaTeX tables, with an accompanying manuscript package for conference submission.

Restricted Access Policy

Visitors must follow my GitHub, subscribe to my YouTube channel, send a permission request, and then enter the approved access password to download the protected paper package.

Unlock Protected Package

Complete all three steps, then enter the approved password to unlock the protected paper package.

Access is stored only for this browser session.

Paper Preview: First Page Only

This public preview shows only the first page of the manuscript. Full access remains protected.

First page preview of the research paper
MD ANISUR RAHMAN CHOWDHURY

Result Highlights

Average Accuracy by Strategy

Accuracy by strategy and benchmark
Accuracy by strategy and benchmark.
Latency trend across strategies
Latency profile across strategies.
Average accuracy share
Average accuracy share (pie chart).
Cumulative progress
Cumulative GSM8K progress curve.
Optimization comparison
Runtime optimization comparison.