【Press Release】AI startup, Recursive, Launches Open-Source Benchmarking Tools for RAG Technology Evaluation

AIPress release2024-08-06

- Flow Benchmark Tools to set standardized criteria for RAG evaluation, contribute to effective deployment of RAGs for various industries -

Recursive developed a standardized, comprehensive tool suite for benchmarking the different RAG pipelines with a focus on Japanese language performance
The results from the test benchmark conducted on the Tool shows Recursive’s originally-developed RAG technology, FindFlow, outperforms major RAG pipelines in the market in both Question Answering using RAG and Whole Document Understanding aspects

Tokyo, Japan - August 6, 2024 - Recursive Inc., a developer of AI solutions that facilitate sustainable business transformation, is pleased to announce that today, it has launched its open-source benchmarking tool suite for retrieval-augmented generation (RAG; *1) systems, “Flow Benchmark Tools.” Aiming to serve as an industry standard for evaluating RAG systems, the Flow Benchmark Tools are designed to evaluate and optimize RAG systems by focusing on document-specific information retrieval and end-to-end document processing, closely mirroring real-world scenarios. Recursive’s Flow Benchmark Tools are now publicly available to all engineers globally as open-source code on GitHub.

Background and current challenges of benchmarking RAG systems While organizations across various industries are increasingly viewing RAG as transformative solution for their businesses, there is no standardized, comprehensive method for benchmarking the different RAG pipelines (*2) in the market. Without a reliable benchmark, organizations face a critical challenge in identifying and selecting the most effective RAG tool for their specific use cases; this not only delays RAG implementation but also increases the risks of deploying ineffective solutions and wasting valuable resources.

Moreover, there is a lack of benchmarking methods for non-English RAG systems, particularly for Japanese language performance. This makes it even more challenging for Japanese businesses to effectively evaluate and implement RAG tools suited to their specific linguistic and cultural needs.

Key features of the Flow Benchmark Tools By addressing the complexities inherent in RAG pipelines such as semantic retrieval, query generation, and Large Language Models (LLM)-based answer generation, the Flow Benchmark Tools provide a nuanced, comprehensive evaluation framework.

The key features of the Tool include:

Multilingual capabilities, including Japanese language: With a focus on Japanese language performance in addition to English, the Flow Benchmark Tools are able to accurately measure RAG system performance in Japanese language.
Comprehensive evaluation of the entire RAG pipeline: By evaluating the entire pipeline from raw document processing to response generation, the Flow Benchmark Tools provide a more holistic and practical assessment of RAG system performance.

To ensure the most objective and robust evaluation possible, mitigate potential biases, and provide a comprehensive assessment of performance, the Tool employs an automated evaluation system that leverages multiple state-of-the-art LLMs, including GPT-4, Claude 3, and Gemini.
The automated evaluation system outputs a mean opinion rating with values ranging from 0 (worst) to 10 (perfect).

Open-source methodology: The Flow Benchmark Tools are developed with an open-source approach, fostering transparency and collaboration within the AI community to have them leverage and build upon Recursive’s work to customize to their organization’s needs.

Findings from the test benchmark: Recursive conducted a test benchmark using the Flow Benchmark Tools and compared Recursive’s original fully customizable LLM-based Generative AI assistant, FindFlow, with three state-of-the-art Generative AI solutions in the market and designed the test to be thorough, fair, and reflective of real-world scenarios. Conducted from June 7 to July 19, 2024, t he benchmark evaluated each of the pipelines’ performance in two critical areas: Question Answering using RAG and Whole Document Understanding.

The benchmarking process utilized a dataset of Japanese government documents with challenging questions.

Recursive's FindFlow RAG applications have demonstrated superior performance in initial benchmarks:

FindFlow SearchAI (feature within FindFlow that is optimized for discovering factual information from a large database of documents) achieved a mean rating of 8.42 in RAG question answering, outperforming leading competitors by 0.61 to 1.41 points.
FindFlow AnalysisAI (feature within FindFlow that is optimized for analyzing documents and communications in-depth) scored 8.90 in whole document analysis, surpassing other prominent AI solutions by 1.68 to 2.61 points.

These results underscore Recursive's commitment to developing RAG technologies that excel in complex, document-rich scenarios.

"Our Flow Benchmark Tools represent a major leap forward in RAG system evaluation," said Tiago Ramalho, Co-founder and CEO of Recursive. "By open-sourcing these tools, we're inviting the global AI community to join us in pushing the boundaries of what's possible in AI-driven document analysis and retrieval."

Recursive plans to continually improve and expand these tools, incorporating additional languages and refining methodologies based on community feedback.

In addition, Recursive aims to continue pursuing our commitment to not only driving sustainable business innovations, but also advancing RAG technology and empowering the global AI community. *1 Retrieval-augmented generation (RAG): RAG is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs' generative process *2 RAG pipeline: A piece of code that performs RAG Reference:

Information about the Flow Benchmark Tools and to access the open-source release: https://recursiveai.co.jp/en/blog/introducing-flow-benchmark-tools/
Information on FindFlow: https://www.findflow.ai/en/
Flow Benchmark Tools on GitHub: https://github.com/recursiveai/flow_benchmark_tools

[About Recursive Inc.] Recursive is a service provider that offers AI solutions for building a sustainable future. By combining expertise in diverse industries such as environment, energy, healthcare, pharmaceuticals, food, and retail as well as advanced technological capabilities and specialized knowledge in sustainability, Recursive provides AI consulting services and technical development. In order to leave a better global environment and society behind for future generations, Recursive’s unparalleled professionals are leading the creation of a new society with world-class, cutting-edge technology.

Company name: Recursive Inc. Headquarter: Shibuya S-6 Building 6F, 1-7-1 Shibuya, Shibuya-ku, Tokyo Founded: August 2020 Co-Founders: Tiago Ramalho, Katsutoshi Yamada Business: Research and development of AI and provision of sustainability-related solutions Number of employees: 59 (including full-time employees, outsourced workers, and interns; as of July 2024) Website: https://recursiveai.co.jp/en/

[Media inquiries] Recursive Inc. Kishimoto (PR & Branding) Email: info@recursiveai.co.jp Phone: +81-90-9847-7832 (Direct number to Kishimoto)

back to news

【Press Release】AI startup, Recursive, Launches Open-Source Benchmarking Tools for RAG Technology Evaluation

Document download