
Traditionally, machine learning models that focused on specialized tasks facilitated straightforward evaluation. However, the evolution of Large Language Models, has increased the complexity w.r.t. performance measurement. Evaluation and benchmarking of large language model is a significant challenge due to their versatility and improved capability to perform a wide range of tasks. This manuscript examines existing literature for various benchmarks and identifies a comprehensive overview of the emerging trends in benchmarking methodology.