Index and search files
Creating an indexed file
If you need to index a file for semantic search just pass the index: true parameter when creating the file.
import { Client } from '@botpress/client'
const client = new Client({
token: process.env.BOTPRESS_PAT,
botId: process.env.BOTPRESS_BOT_ID,
})
const file = await client.uploadFile({
key: 'test.txt',
content: 'This is a test file',
index: true,
})
Supported file formats
The following file formats are supported for indexing:
| Format | File Extension | MIME Type |
|---|---|---|
.pdf | application/pdf | |
| HTML | .html | text/html |
| Text | .txt | text/plain |
| Markdown | .md | text/markdown |
Notes on indexing
- The file it will initially have a status of
indexing_pendingand will be indexed asynchronously. The time it takes to index the file will depend on the file size and the current load on the system. - You can check the status of the file by calling the Get File endpoint and checking that the
file.statusproperty has changed toindexing_completed. - If the indexing failed the status will be set to
indexing_failedand the reason of the failure will be available in thefailedStatusReasonproperty of the file.
Searching files
To run a semantic search on your bot’s files you can use the Search Files API endpoint. This is particularly useful for RAG (Retrieval Augmented Generation) implementations.
const res = await client.searchFiles({
query: 'what are the most popular products of your company?', // This is the natural language query (for example: a user's question) to be used for running a semantic search on the files of the bot.
contextDepth: 1, // Optional (default: 0), allows you prepend and append the surrounding context to each matching passage in the search results
tags: {
// Optional, allows you to only search files that match the specified tags. Useful if you only want to search a subset of the bot's files.
category: 'Support',
subcategory: 'Technical',
},
limit: 30, // Default is 20, maximum is 50 results
})
const passages = res.passages
for (const passage of passages) {
// You can use passage.content here to retrieve the content (including surrounding context, if any) of each matching passage
}import qs from 'qs' // You'll need to install this package first using your favorite package manager
const params = qs.stringify({
query: 'what are the most popular products of your company?',
tags: {
// Optional: You can delimit the search to only files that match the specified tags
category: 'Support',
subcategory: 'Technical',
},
})
const result = await fetch('https://api.botpress.cloud/v1/files/search?' + params, {
method: 'GET',
headers: {
'x-bot-id': 'YOUR_BOT_ID_GOES_HERE',
Authorization: `Bearer ${process.env.BOTPRESS_PAT}`,
},
})
const response = await result.json()
const passages = response.passages You can check the API reference for more details on the properties available for each passage object returned in the response.
Using the search results for RAG
You can use the search results to provide relevant information to a bot for generating a response to a user’s question.
Here’s an example of a simple RAG (Retrieval Augmented Generation) implementation that shows how you can use the ChatGPT large language model (through the OpenAI API) and the search results provided by our API to answer a user’s question:
import OpenAI from 'openai' // You'll need to install this package first using your favorite package manager
const openai = new OpenAI({ apiKey: 'YOUR_OPENAI_API_KEY' }) // Change this for your own OpenAI API key
const userQuestion = 'Which are the most popular products of your company?' // This will come from the user's input
const res = await client.searchFiles({
query: userQuestion,
contextDepth: 1, // This number can be increased to pass more context to the model for each matching passage
})
const relevantInformation = res.passages.map((passage) => passage.content).join('\n\n----------\n\n')
const answer = await openai.chat.completions.create({
model: 'gpt-4o-mini', // This can be changed for any other OpenAI model available here: https://platform.openai.com/docs/models
messages: [
{
role: 'system',
// You can change the text below to include any additional instructions to the model
content:
"You are a helpful assistant that answers the user's question based only on the relevant information you are provided along with the question.",
},
{
role: 'user',
content:
`## RELEVANT INFORMATION: \n\n ${relevantInformation}\n\n` +
`## ANSWER THE FOLLOWING USER'S QUESTION: \n\n ${userQuestion}`,
},
],
})