Introduction
The Problem: Dumb Bots and Information Overload
Traditional search methods or basic chatbots often fall short when dealing with specific document sets:
- Information Overload: Manually searching large documents is time-consuming and inefficient.
- Generic LLM Limitations: Large Language Models (LLMs) are powerful, but they lack specific, up-to-date knowledge about your documents unless explicitly trained on them (which is often impractical).
- Hallucination Risk: When asked about information outside their training data, LLMs might confidently invent answers that sound plausible but are incorrect. This is unacceptable for reliable FAQ systems.
- Inconsistent Outputs: Getting answers in a usable, predictable format can be challenging with free-form text generation.
We need a system that answers questions accurately based only on a given set of documents and provides answers in a consistent, structured way.
The Solution: RAG + Gemini API

1. Indexing: Convert the source documents (Google Car manuals) into numerical representations (embeddings) using the Gemini
text-embedding-004
model and
store them in a vector database (ChromaDB). This allows for efficient
similarity searches. This setup process is crucial for enabling fast
retrieval later.

gemini-2.0-flash
). Instruct the model to answer the
question based only on the provided context.Alongside the RAG structure, we leverage specific Gemini API Features:
- High-Quality Embeddings:
text-embedding-004
provides embeddings suitable for finding semantically similar text. - Powerful Generation:
gemini-2.0-flash
can synthesize answers based on the retrieved context. - Structured Output (JSON Mode): We instruct Gemini to return the answer and a confidence score in a predictable JSON format, making it easy for applications to use the output.
- Optional Grounding: We can even add Google Search as a tool if the local documents don't suffice (though our primary goal here is document-based Q&A).
Implementation Highlights
1.
Custom Embedding Function for ChromaDB:
We need to tell ChromaDB how to generate embeddings using the Gemini API.
chromadb import Documents, EmbeddingFunction, Embeddings from google.api_core
import retry from google import genai from google.genai import types
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in
{429, 503}) class GeminiEmbeddingFunction(EmbeddingFunction): document_mode =
True # Toggle between indexing docs and embedding queries
@retry.Retry(predicate=is_retriable) def __call__(self, input_texts:
Documents) -> Embeddings: task = "retrieval_document" if self.document_mode
else "retrieval_query" print(f"Embedding {'documents' if self.document_mode
else 'query'} ({len(input_texts)})...") try: # Assuming 'client' is
initialized Google GenAI client response = client.models.embed_content(
model="models/text-embedding-004", contents=input_texts,
config=types.EmbedContentConfig(task_type=task), # Specify task type ) return
[e.values for e in response.embeddings] except Exception as e: print(f"Error
during embedding: {e}") return [[] for _ in input_texts]
</div>
2. Setting up ChromaDB and Indexing:
We create a ChromaDB collection and add our documents. get_or_create_collection
makes this idempotent.
# --- 5. Setup ChromaDB Vector Store ---
import chromadb
import time
print("Setting up ChromaDB...")
DB_NAME = "googlecar_faq_db"
embed_fn = GeminiEmbeddingFunction()
chroma_client = chromadb.Client() # In-memory client
try:
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)
print(f"Collection '{DB_NAME}' ready. Current count: {db.count()}")
# Assuming 'documents' and 'doc_ids' are defined earlier
if db.count() < len(documents):
print(f"Adding/Updating documents in '{DB_NAME}'...")
embed_fn.document_mode = True # Set mode for indexing
db.upsert(documents=documents, ids=doc_ids) # Use upsert for safety
time.sleep(2) # Allow indexing to settle
print(f"Documents added/updated. New count: {db.count()}")
else:
print("Documents already seem to be indexed.")
except Exception as e:
print(f"Error setting up ChromaDB collection: {e}")
raise SystemExit("ChromaDB setup failed. Exiting.")
3. Retrieving Relevant Documents:
This function takes the user query, embeds it (using document_mode=False
), and searches ChromaDB.
# --- 6. Define Retrieval Function ---
def retrieve_documents(query: str, n_results: int = 1) -> list[str]:
print(f"\nRetrieving documents for query: '{query}'")
embed_fn.document_mode = False # Switch to query mode
try:
results = db.query(query_texts=[query], n_results=n_results)
if results and results.get("documents"):
retrieved_docs = results["documents"][0]
print(f"Retrieved {len(retrieved_docs)} documents.")
return retrieved_docs
else:
print("No documents retrieved.")
return []
except Exception as e:
print(f"Error querying ChromaDB: {e}")
return []
4. Generating the Structured Answer:
Here's the core logic combining the query, retrieved context, and instructions for the LLM, specifying JSON output with a confidence score.
# --- 7. Define Structured Output Schema ---
from typing_extensions import Literal
from pydantic import BaseModel
class AnswerWithConfidence(BaseModel):
answer: str
confidence: Literal["High", "Medium", "Low"]
# --- 8. Define Augmented Generation Function ---
def generate_structured_answer(query: str, context_docs: list[str]) -> dict | None:
if not context_docs:
print("No context provided, cannot generate answer.")
return {
"answer": "I couldn't find relevant information in the provided documents to answer this question.",
"confidence": "Low",
}
context = "\n---\n".join(context_docs)
prompt = f\"\"\"You are an AI assistant answering questions about a Google car based ONLY on the provided documents.
Context Documents:
---
{context}
---
Question: {query}
Based *only* on the information in the context documents above, answer the question.
Also, assess your confidence in the answer based *only* on the provided text:
- "High" if the answer is directly and clearly stated in the documents.
- "Medium" if the answer can be inferred but isn't explicitly stated.
- "Low" if the documents don't seem to contain the answer or are ambiguous.
Return your response ONLY as a JSON object with the keys "answer" and "confidence". Example format:
{
"answer": "Your answer here.",
"confidence": "High/Medium/Low"
}
\"\"\"
try:
generation_config = types.GenerateContentConfig(
temperature=0.2,
response_mime_type="application/json", # Request JSON
response_schema=AnswerWithConfidence, # Provide the schema
)
# Assuming 'client' is initialized Google GenAI client
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=prompt,
generation_config=generation_config, # Pass the config object
)
# Safe access to parsed output
if (
response.candidates
and response.candidates[0].content
and response.candidates[0].content.parts
):
parsed_output = response.candidates[0].content.parts[0].function_call
# Fallback check if .parsed is used
if not parsed_output and hasattr(
response.candidates[0].content.parts[0], "parsed"
):
parsed_output = response.candidates[0].content.parts[0].parsed
if isinstance(parsed_output, dict) and "answer" in parsed_output and "confidence" in parsed_output:
print("Generated Answer:", parsed_output)
return parsed_output
else:
print("Warning: Could not extract valid JSON from response.")
print("Raw response part:", response.candidates[0].content.parts[0])
# Attempt to parse the text part if it exists and looks like JSON
try:
import json
text_part = response.candidates[0].content.parts[0].text
if text_part and text_part.strip().startswith("{") and text_part.strip().endswith("}"):
parsed_json = json.loads(text_part)
if isinstance(parsed_json, dict) and "answer" in parsed_json and "confidence" in parsed_json:
print("Recovered JSON from text part:", parsed_json)
return parsed_json
except Exception as json_e:
print(f"Could not parse text part as JSON: {json_e}")
print("Error: Could not generate/parse structured response correctly.")
return {"answer": "Error: Could not generate or parse the structured response from the AI.", "confidence": "Low"}
except Exception as e:
print(f"Error during content generation call: {e}")
return {"answer": f"Error during generation API call: {e}", "confidence": "Low"}
Tip: Ensure your API key is correctly set up in Kaggle Secrets (GOOGLE_API_KEY
). Also, ChromaDB setup might require specific permissions or setup depending on the environment (here we use an in-memory one for simplicity).
Limitations and Future Work
This implementation is a great starting point, but it has limitations:
- Document Quality: The RAG system's effectiveness heavily depends on the quality, relevance, and comprehensiveness of the indexed documents. Garbage in, garbage out.
- Retrieval Accuracy: Simple similarity search might not always retrieve the perfect chunk of text, especially for complex queries. More advanced retrieval strategies (like hybrid search or re-ranking) could improve this.
- Structured Output Failures: While JSON mode is robust, the LLM might occasionally fail to generate perfectly valid JSON matching the schema. More robust error handling and potentially retries could be added.
- Limited Context Handling (within LLM): While RAG provides context, the LLM itself still has limits on how much context it can process effectively in a single generation step. Very long retrieved passages might need summarization or chunking before being sent to the LLM.
- Static Knowledge: The bot only knows what's in the ChromaDB index. It doesn't learn automatically. Updates require re-indexing.
Future Enhancements:
- Implement Google Search grounding as a fallback when confidence is low or documents are missing.
- Add conversation memory for multi-turn interactions.
- Explore more sophisticated retrieval techniques.
- Build a simple UI (e.g., using Gradio or Streamlit).
- Fine-tune an embedding model specifically for the car manual domain (though
text-embedding-004
is quite capable).
Conclusion
Key Takeaways:
- RAG grounds LLM answers in your specific data.
- Gemini Embeddings + ChromaDB enable efficient document retrieval.
- Structured Output (JSON Mode) enhances reliability and integrability.
- Confidence Scores add a layer of trustworthiness.
I hope this walkthrough provides a clear picture of how this smarter FAQ bot works! Feel free to ask questions or leave a comment with your thoughts or own implementations!