LAMBDA: A Large Model Based Data Agent

Maojun Sun, Ruijian Han , Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan and Jian Huang

The Hong Kong Polytechnic University

Published in Journal of the American Statistical Association

Preprint at https://arxiv.org/pdf/2407.17535

Codes : https://github.com/AMA-CMFAI/LAMBDA

Abstract

We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user’s instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.

Demo Videos of LAMBDA
Flag Counter

If you find our work useful in your research, consider citing our paper by

@article{Maojun02062025,
        author = {Sun Maojun and Ruijian Han and Binyan Jiang and Houduo Qi and Defeng Sun and Yancheng Yuan and Jian Huang and},
        title = {LAMBDA: A Large Model Based Data Agent},
        journal = {Journal of the American Statistical Association},
        volume = {0},
        number = {ja},
        pages = {1--20},
        year = {2025},
        publisher = {ASA Website},
        doi = {10.1080/01621459.2025.2510000},
        URL = {https://doi.org/10.1080/01621459.2025.2510000},
        eprint = {https://doi.org/10.1080/01621459.2025.2510000}

}
@misc{sun2024survey,
      title={A Survey on Large Language Model-based Agents for Statistics and Data Science},
      author={Maojun Sun and Ruijian Han and Binyan Jiang and Houduo Qi and Defeng Sun and Yancheng Yuan and Jian Huang},
      year={2024},
      eprint={2412.14222},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.14222},
}