Metadata updates
Hello Vectara community! Today, we’re very happy to announce another new feature in Vectara: the ability to update the metadata of documents after they’ve been indexed. This has been one of the most requested features in Vectara by our users. In this blog, we’ll walk you through how it works, why it’s been so requested, and what you can do with Vectara now.
4-minute read timeMetadata in Vectara
Long term users of Vectara know that we’ve been building out robust capabilities in the platform around hybrid search and metadata. We were one of the first GenAI systems to implement hybrid search capabilities (combining semantic search with BM25 keyword matching and automatically blending the results) all the way back in April 2023 and we were also one of the first GenAI systems to implement the ability to attach and filter metadata over a year ago. While you might not think of bringing in these “old tricks” of keyword matching and tagging/filtering documents with structured data as tools in the GenAI age, we’ve worked with a lot of users that realized these have a place in helping reduce hallucinations (by getting the LLM to “focus”) and costs (by reducing the need for expensive fine-tuning and sending only relevant information to the LLM).
We’ve continued to make investments and improvements in this infrastructure, including letting users influence the result order (and thus influence the generative model with better results for your business) via user-defined functions reranking, and letting users define the citation format for responses with citation using the metadata as well. As we’ve done so, our users have continued to demand even more out of Vectara metadata.
Updating Metadata
While it’s been possible to update document metadata by issuing a delete-then-reindex workflow (which has been viable for many use cases because Vectara generally deletes or indexes documents within a few seconds), the developer experience isn’t great. There are a lot of great use cases for this that our users have raised. For example:
- They’re using price information or number of units in stock to influence the generative system and they have frequent price and unit volume updates that need to be accurately reflected
- They’re using metadata to store ACLs for security purposes and the permissions periodically are updated on the associated documents
- They’re using metadata to flag products/documents as “promoted” they want to dynamically enable/disable to bias the results
You can now do this with a single call to Vectara. You can both add metadata and replace metadata in these new APIs using standard PUT and PATCH REST semantics. To add new metadata to an existing document:
With the PATCH request, the existing document metadata is updated or added. If the document metadata didn’t exist before, it will be added to the document, and if it did exist before, it will be replaced.
To replace the metadata of a document:
With the PUT request, any existing document metadata is removed first and replaced with the metadata of the request, so this is intended to be an idempotent request. This gives you a way to delete metadata without completely deleting the document.
These are a couple examples, but we know our user community has many ideas of what to do with metadata updates! Today, this capability is limited only to document-level metadata and not to “part-level” (also known as “section-level”), though we plan to expand this capability in the future.
Conclusion
Updateable metadata is the next step in our journey of bringing ease of use to building GenAI applications, and we’re happy to be able to deliver this highly requested feature. Soon, we plan to expose additional metadata features in our console and allow document text to be updated as well, so stay tuned!
As always, we’d love to hear your feedback! Connect with us on our forums, on our Discord, or on our community. If you’d like to see what Vectara can offer you for retrieval augmented generation on your application or website, sign up for an account!