Scout Ranking Methodology

Introduction

Optimistic Rollup currently faces centralization issues, with most sequencers being centralized. Despite plans for their decentralization, it's still insufficient for production use. Optimistic Machine Learning (opML) faces a similar problem, with most inference tasks conducted in a centralized fashion. Even with decentralized sequencers, the setup and entry costs would remain high.

To address this issue in our implementation of Optimistic Machine Learning, we encountered a similar challenge. Our solution involves implementing a non-token rating system using Glicko V2. We've created a new mechanism that includes a reputation system, in addition to token rewards, to simulate real-world reputation systems. This approach is inspired by various games that use rankings to foster competitiveness.

Chasm LLM Pool Phase

Hybrid Method of Random Distribution and Multi-Armed Bandit (MAB) will be used for launching a new LLM pool.

Onboarding Phase

When the Chasm Orchestrator creates a new pool of LLMs, it will have a 14-day Initial Ranking Period. During this phase, scouts can pledge to join the network. The network requires at least 20 scouts in order to start the network.

Initial Ranking Period

At the initial ranking period, each scout starts with an initial reputation score (e.g., 1000 points), and the orchestrator will pick random scouts through a random distribution method to perform inference work.

Adaptive Ranking Period

Once the network stabilizes with the leaderboard, it will switch to the Multi-Armed Bandit (MAB) algorithm. This algorithm uses the explore or exploit method to assign inference tasks to the top-performing scouts, as well as newcomers to the network.

Drawing inspiration from LOL (League of Legends), we can incorporate a "placement" mechanism for new players. For instance, if a new player begins at rank 1000, they cannot be demoted below 1100 until they complete their initial placement matches.

Season Reset

Following how most games work, we will have a seasonal soft reset every month to promote competition. Top rankings from the last season could provide a slight boost in terms of rank, promoting seasonal competitiveness. Good actors should quickly reclaim top positions without penalties.

Rank Change

How ranking changes work: We are going to use the Glicko Algorithm to replace ELO, which takes into consideration rating uncertainty and also tweaks a bit on top of Glicko.

Key Tunable Parameters

Initial Rating (BASE_RATING)

  • Sets the baseline for scouts' initial ratings.

  • Affects how quickly scouts' ratings evolve from the starting point.

Initial Deviation (BASE_DEVIATION)

  • Controls the initial uncertainty in ratings.

  • Higher deviation means more significant adjustments after each update.

Initial Volatility (BASE_VOLATILITY)

  • This parameter reflects how consistently a scout's performance is expected to be over time.

  • Higher volatility suggests that a scout's performance is more variable, leading to larger rating changes. Lower volatility indicates more consistent performance, resulting in smaller changes.

initial_rating: 1000,
initial_deviation: 200,
initial_volatility: 0.06

How It Works

The traditional chess game is a one-on-one match. However, in our case, since there isn't an opponent, we use several tasks to determine the outcomes. Traditional chess outcomes are primarily binary (1 for a win, 0.5 for a draw, and 0 for a loss). Therefore, we need to incorporate penalties that have a more significant negative impact for certain situations, such as disputes.

Here's how the algorithm works: it uses various tasks to generate outcome scores.

Task 1: Inference Response

If T(response) < T(average) = 1
If T(average) < T(response) < T(threshold) = 0.5
If T(threshold) < T(response) or T(response) failed = 0

Task 2: Health Check QC

The Health Check jobs and Actual Jobs will be mixed and indistinguishable from the scouts’ perspective. Health Checks will affect the ranking, as well as ensure the foundational scouts are up.

If Q(response) >= Q(benchmark) = 1
Else = 0

Last updated