This post has some more technical info: https://www.pinecone.io/blog/integrated-inference/
Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search
Can someone please explain how this works?
I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors. Is this announcement that pinecone is adding their own?
Is it better or worse than the models here: https://ollama.com/search?c=embedding For example?
Nothing new, Marqo has been doing this for a while now with their all in one platform to train, embed, retrieve, and evaluate.
I've played around with Weaviate & Astra DB but Marqo is the best and easiest solution imo.
txtai (https://github.com/neuml/txtai) has had inline vectorization since 2020. It supports Transformers, llama.cpp and LLM API services. It also has inline integration with LLM models and a built-in RAG pipeline.
This title was a little misleading to me IMO because (maybe my skill issue) I associated "inferencing" with "generation".
After reading the article, it seems Pinecone just now supports in-DB vectorization, a feature that is shared by:
- DataStax Astra DB: https://www.datastax.com/blog/simplifying-vector-embedding-g... (since May 2024)
- Weaviate: https://weaviate.io/blog/introducing-weaviate-embeddings (as of yesterday)