Top 30 Popular LLM Projects on GitHub (as of October 26, 2023)
Top RAG Builder Projects on GitHub (as of October 26, 2023)
While there isn’t a specific category for “RAG builders” on GitHub, here are some of the top projects that facilitate building Retrieval Augmented Generation (RAG) systems:
Title | Description | Stars | URL |
---|---|---|---|
LangChain | Building applications with LLMs through composability. Provides tools for document loading, retrieval, and question answering. | 19.6k | https://github.com/hwchase17/langchain |
Haystack | End-to-end framework for building NLP search systems that can be extended with LLMs. | 7.6k | https://github.com/deepset-ai/haystack |
LlamaIndex (GPT Index) | A data framework for your LLM applications. Connects LLMs to external data. | 6.2k | https://github.com/jerryjliu/llama_index |
DeepLake | A data lake for deep learning. Build, manage, query, version, and collaborate on your unstructured data for LLMs. | 3.6k | https://github.com/activeloopai/deeplake |
Faiss | A library for efficient similarity search and clustering of dense vectors. Useful for building the retrieval component of RAG. | 20.2k | https://github.com/facebookresearch/faiss |
Sentence Transformers | Multilingual Sentence & Image Embeddings with BERT. Useful for generating embeddings for documents used in retrieval. | 12.7k | https://github.com/UKPLab/sentence-transformers |
Chroma | The AI-native open-source embedding database. Offers a vector database for semantic search and retrieval. | 6.4k | https://github.com/chroma-core/chroma |
Weaviate | Weaviate is an open-source vector search engine that allows you to store data objects and vector embeddings and query them through natural language. | 3.8k | https://github.com/weaviate/weaviate |
Qdrant | Vector Similarity Search Engine. Offers a robust and efficient vector database for building RAG systems. | 3.6k | https://github.com/qdrant/qdrant |
Milvus | An open-source vector database built for scalable similarity search and AI applications. | 15.4k | https://github.com/milvus-io/milvus |
Explanation:
-
LangChain and Haystack offer comprehensive frameworks for building RAG pipelines, including components for document loading, splitting, embedding, retrieval, and LLM interaction.
-
LlamaIndex simplifies the process of connecting LLMs to various data sources and provides tools for building RAG applications.
-
DeepLake offers a specialized data lake designed for deep learning and LLM applications, facilitating data management and retrieval.
-
Faiss, Sentence Transformers, Chroma, Weaviate, Qdrant, and Milvus provide efficient vector search and storage capabilities, crucial for building the retrieval component of a RAG system.
This list focuses on projects that are either directly focused on RAG or provide essential components for building RAG systems. Remember that the best choice for your specific needs will depend on the complexity of your project and your specific requirements.
This information is accurate as of October 26, 2023. The popularity and features of these projects may change over time. Always refer to the official documentation and GitHub repositories for the most up-to-date information.