Direct Preference Optimization Algorithm

RFPO: Enhancing Mathematical Reasoning of Large Language Models with Fine-Grained Preference Optimization

Abstract: The Direct Preference Optimization (DPO) method and its various variants have recently been shown to perform well on general instruction tuning tasks. These methods focus on optimizing ...

IEEE

Collaborative Optimization of Air Compressor Units Based on MSI-BWO Algorithm

Abstract: To improve the operational efficiency of compressor units and reduce waste, a Multi-Strategy Improved Beluga Whale Optimization (MSI-BWO) algorithm is proposed for the collaborative ...

GitHub

TournO: Tournament Optimization for Non-Verifiable Reinforcement Learning

TournO (Tournament Optimization) combines pointwise and pairwise LLM judges to produce reward signals in RL for LLMs, using tournament-style comparisons (round-robin, ELO) to derive scalar rewards ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

RFPO: Enhancing Mathematical Reasoning of Large Language Models with Fine-Grained Preference Optimization

Collaborative Optimization of Air Compressor Units Based on MSI-BWO Algorithm

TournO: Tournament Optimization for Non-Verifiable Reinforcement Learning

Trending now