-
Notifications
You must be signed in to change notification settings - Fork 231
Closed
Description
Initial Checks
- I have read and followed the docs and still think this is a bug
Description
I have a document structure as per the docs given below. Until version 0.36, the nested fields "paths" were stored in separate collections with Qdrant. So with a collection name of "channel_category" for the parent doc, the paths would be stored in "channel_category__paths". With 0.36 the nested paths vectors and their collections are being held in the parent collection "channel_category" as separate records. Is this an intended change or a bug in how nested data is stored?
class MetaPathDoc(BaseDoc):
path_id: str
level: int
text: str
embedding: Optional[AnyTensor] = Field(
space=similarity_space, dim=dim_size)
class MetaCategoryDoc(BaseDoc):
node_id: Optional[str]
node_name: Optional[str]
name: Optional[str]
product_type_definitions: Optional[str]
leaf: bool
paths: Optional[DocList[MetaPathDoc]]
embedding: Optional[AnyTensor] = Field(
space=similarity_space, dim=dim_size)
channel: str
lang: strExample Code
I'm loading documents to QDrant via a Jina executor like this:
import os
import sys
import more_itertools
from docarray import DocList
from docarray.index import QdrantDocumentIndex
from utils.docs import MetaCategoryDoc
from jina import Executor, requests
from jina.logging.logger import JinaLogger
from qdrant_client.http import models
QDRANT_LOCATION = os.getenv('QDRANT_LOCATION', "http://localhost:6333")
QDRANT_API_KEY = os.getenv('QDRANT_API_KEY', None)
class MetaChannelCategoryIndexingExec(Executor):
def __init__(self,
collection_name: str = "channel_category",
batch_size: str = 64,
qdrant_location: str = QDRANT_LOCATION,
qdrant_api_key: str = QDRANT_API_KEY,
*args, **kwargs):
super().__init__(*args, **kwargs)
self.logger = JinaLogger('meta_channel_category_indexing')
db_config = QdrantDocumentIndex.DBConfig(
location=qdrant_location,
api_key=qdrant_api_key,
collection_name=collection_name,
quantization_config=models.ScalarQuantization(
scalar=models.ScalarQuantizationConfig(
type=models.ScalarType.INT8,
quantile=0.99,
always_ram=False,
)
),
optimizers_config=models.OptimizersConfigDiff(
memmap_threshold=20000, indexing_threshold=20000),
on_disk_payload=True,
hnsw_config=models.HnswConfigDiff(m=16,ef_construct=100,on_disk=True),
wal_config=models.WalConfigDiff(
wal_capacity_mb=64, wal_segments_ahead=1),
prefer_grpc=False)
self.doc_index = QdrantDocumentIndex[MetaCategoryDoc](db_config)
self.batch_size = batch_size
@requests(
request_schema=DocList[MetaCategoryDoc],
response_schema=DocList[MetaCategoryDoc]
)
def index_metadata(self, docs, **kwargs):
""" Save products to the Vector DB.
"""
for doc_batch in more_itertools.chunked(docs, self.batch_size):
# Indexing the documents
self.doc_index.index(
doc_batch
)Python, Pydantic & OS Version
0.36.0
Affected Components
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done