Index and search files

Creating an indexed file

If you need to index a file for semantic search just pass the index: true parameter when creating the file.

import { Client } from '@botpress/client'

const client = new Client({
  token: process.env.BOTPRESS_PAT,
  botId: process.env.BOTPRESS_BOT_ID,
})

const file = await client.uploadFile({
  key: 'test.txt',
  content: 'This is a test file',
  index: true,
})

Supported file formats

The following file formats are supported for indexing:

Format	File Extension	MIME Type
PDF	`.pdf`	`application/pdf`
HTML	`.html`	`text/html`
Text	`.txt`	`text/plain`
Markdown	`.md`	`text/markdown`

Notes on indexing

The file it will initially have a status of indexing_pending and will be indexed asynchronously. The time it takes to index the file will depend on the file size and the current load on the system.
You can check the status of the file by calling the Get File endpoint and checking that the file.status property has changed to indexing_completed.
If the indexing failed the status will be set to indexing_failed and the reason of the failure will be available in the failedStatusReason property of the file.

Searching files

To run a semantic search on your bot’s files you can use the Search Files API endpoint. This is particularly useful for RAG (Retrieval Augmented Generation) implementations.

const res = await client.searchFiles({
  query: 'what are the most popular products of your company?', // This is the natural language query (for example: a user's question) to be used for running a semantic search on the files of the bot.
  contextDepth: 1, // Optional (default: 0), allows you prepend and append the surrounding context to each matching passage in the search results
  tags: {
    // Optional, allows you to only search files that match the specified tags. Useful if you only want to search a subset of the bot's files.
    category: 'Support',
    subcategory: 'Technical',
  },
  limit: 30, // Default is 20, maximum is 50 results
})

const passages = res.passages

for (const passage of passages) {
  // You can use passage.content here to retrieve the content (including surrounding context, if any) of each matching passage
}

import qs from 'qs' // You'll need to install this package first using your favorite package manager

const params = qs.stringify({
  query: 'what are the most popular products of your company?',
  tags: {
    // Optional: You can delimit the search to only files that match the specified tags
    category: 'Support',
    subcategory: 'Technical',
  },
})

const result = await fetch('https://api.botpress.cloud/v1/files/search?' + params, {
  method: 'GET',
  headers: {
    'x-bot-id': 'YOUR_BOT_ID_GOES_HERE',
    Authorization: `Bearer ${process.env.BOTPRESS_PAT}`,
  },
})

const response = await result.json()
const passages = response.passages

You can check the API reference for more details on the properties available for each passage object returned in the response.

Using the search results for RAG

You can use the search results to provide relevant information to a bot for generating a response to a user’s question.

Here’s an example of a simple RAG (Retrieval Augmented Generation) implementation that shows how you can use the ChatGPT large language model (through the OpenAI API) and the search results provided by our API to answer a user’s question:

import OpenAI from 'openai' // You'll need to install this package first using your favorite package manager

const openai = new OpenAI({ apiKey: 'YOUR_OPENAI_API_KEY' }) // Change this for your own OpenAI API key

const userQuestion = 'Which are the most popular products of your company?' // This will come from the user's input

const res = await client.searchFiles({
  query: userQuestion,
  contextDepth: 1, // This number can be increased to pass more context to the model for each matching passage
})

const relevantInformation = res.passages.map((passage) => passage.content).join('\n\n----------\n\n')

const answer = await openai.chat.completions.create({
  model: 'gpt-4o-mini', //  This can be changed for any other OpenAI model available here: https://platform.openai.com/docs/models
  messages: [
    {
      role: 'system',
      // You can change the text below to include any additional instructions to the model
      content:
        "You are a helpful assistant that answers the user's question based only on the relevant information you are provided along with the question.",
    },
    {
      role: 'user',
      content:
        `## RELEVANT INFORMATION: \n\n ${relevantInformation}\n\n` +
        `## ANSWER THE FOLLOWING USER'S QUESTION: \n\n ${userQuestion}`,
    },
  ],
})