💫 Release v0.30.0 #1410
samsja
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
💫 Release v0.30.0 (a.k.a DocArray v2)
Changelog
If you are using DocArray v<0.30.0, you will be familiar with its dataclass API.
DocArray v2 is that idea, taken seriously. Every document is created through dataclass-like interface, courtesy of Pydantic.
This gives the following advantages:
You may also be familiar with our old Document Stores for vector database integration. They are now called Document Indexes and offer the following improvements:
For now, Document Indexes support Weaviate, Qdrant, ElasticSearch, and HNSWLib, with more to come.
Changes to
DocumentDocumenthas been renamed toBaseDoc.BaseDoccannot be used directly, but instead has to be extended. Therefore, each document class is created through a dataclass-like interface.BaseDocallows for a flexible schema compared to theDocumentclass in v1 which only allowed for a fixed schema, with one oftensor,textandblob, and additionalchunksandmatches..load_uri_to_image_tensor()) are not supported in v2. Instead, we provide some of those methods on the typing-level.LegacyDocumentclass, which extendsBaseDocwhile following the same schema as v1'sDocument. TheLegacyDocumentcan be useful to start migrating your codebase from v1 to v2. Nevertheless, the API is not fully compatible with DocArray v1Document. Indeed, none of the methods associated withDocumentare present. Only the schema of the data is similar.Changes to
DocumentArrayDocList
DocumentArrayclass from v1 has been renamed toDocList, to be more descriptive of its actual functionality, since it is a list ofBaseDocs.DocVec
DocVec, which is a column-based representation ofBaseDocs. BothDocVecandDocListextendAnyDocArray.DocVecis a container of Documents appropriates to perform computation that require batches of data (ex: matrix multiplication, distance calculation, deep learning forward pass).DocVechas a similar interface asDocListbut with an underlying implementation that is column-based instead of row-based. Each field of the schema of theDocVec(the.doc_typewhich is aBaseDoc) will be stored in a column. If the field is a tensor, the data from all Documents will be stored as a singledoc_vec(Torch/TensorFlow/NumPy) tensor. If the tensor field isAnyTensoror a Union of tensor types, the.tensor_typewill be used to determine the type of thedoc_veccolumn.Parameterized DocList
DocListit does not necessarily have to be homogenous.DocListyou can parameterize it at initialization time:.from_csv()or.pull()only work with parameterizedDocLists.Access attributes of your DocumentArray
AnyDocArraywill expose the same attributes as theBaseDocs it contains. This will return a list oftype(attribute). However, this only works if (and only if) all theBaseDocs in theAnyDocArrayhave the same schema. Therefore only this works:Changes to Document Store
In v2 the
Document Storehas been renamed toDocIndexand can be used for fast retrieval using vector similarity. DocArray v2DocIndexsupports:Instead of creating a
DocumentArrayinstance and setting thestorageparameter to a vector database of your choice, in v2 you can initialize aDocIndexobject of your choice, such as:In contrast,
DocStorein v2 can be used for simple long-term storage, such as with AWS S3 buckets or Jina AI Cloud.Thank you to all of the contributors to this release:
This discussion was created from the release 💫 Release v0.30.0.
Beta Was this translation helpful? Give feedback.
All reactions