Compute-Aware Hybrid Attention Architecture Search

December 10, 2025 — #llm#architecture-search#attention

Abstract

This project is a deep dive into the compute-aware hybrid attention architecture search for large language models. We will be using the Llama 3.1 8B model and the Qwen 2.5 7B model to search for the best architecture.

We will be using the compute-aware hybrid attention architecture search to search for the best architecture for the Llama 3.1 8B model and the Qwen 2.5 7B model.