Friday, August 29, 2025

Hierarchical (and different) Indexes utilizing LlamaIndex for RAG Content material Enrichment

At our weekly This Week in Machine Studying (TWIML) conferences, (our chief and facilitataor) Darin Plutchok identified a LinkedIn weblog put up on Semantic Chunking that has been just lately applied within the LangChain framework. Not like extra conventional chunking approaches that use variety of tokens or separator tokens as a information, this one chunks teams of sentences into semantic models by breaking them when the (semantic) similarity between consecutive sentences (or sentence-grams) fall beneath some predefined threshold. I had tried it earlier (pre-LangChain) and whereas outcomes have been affordable, it could want a number of processing, so I went again to what I used to be utilizing earlier than.

I used to be additionally just lately exploring LlamaIndex as a part of the trouble to familiarize myself with the GenAI ecosystem. LlamaIndex helps hierarchical indexes natively, which means it supplies the information constructions that make constructing them simpler and extra pure. Not like the standard RAG index, that are only a sequence of chunks (and their vectors), hierarchical indexes would cluster chunks into guardian chunks, and guardian chunks into grandparent chunks, and so forth. A guardian chunk would usually inherit or merge many of the metadata from its kids, and its textual content could be a abstract of its kids’s textual content contents. For instance my level about LlamaIndex knowledge constructions having pure help for this type of setup, listed below are the definitions of the LlamaIndex TextNode (the LlamaIndex Doc object is only a baby of TextNode with an extra doc_id: str area) and the LangChain Doc. Of specific curiosity is the relationships area, which permits tips to different chunks utilizing named relationships PARENT, CHILD, NEXT, PREVIOUS, SOURCEand many others. Arguably, the LlamaIndex TextNode may be represented extra usually and succintly by the LangChain Dochowever the hooks do assist to help hierarchical indexing extra naturally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# this can be a LlamaIndex TextNode
class TextNode:
  id_: str = None
  embedding: Non-compulsory[List[float]] = None
  extra_info: Dict[str, Any]
  excluded_embed_metadata_keys: Record[str] = None
  excluded_llm_metadata_keys: Record[str] = None
  relationships: Dict[NodeRelationship, Union[RelatedNodeInfo, List[RelatedNodeInfo]] = None
  textual content: str
  start_char_idx: Non-compulsory[int] = None
  end_char_idx: Non-compulsory[int] = None
  text_template: str = "{metadata_str}nn{content material}"
  metadata_template: str = "{key}: {worth}",
  metadata_separator = str = "n"

# and this can be a LangChain Doc
class Doc:
  page_content: str
  metadata: Dict[str, Any]

In any case, having found the hammer that’s LlamaIndex, I started to see a number of potential hierarchical indexes nails. One such nail that occurred to me was to make use of Semantic Chunking to cluster consecutive chunks moderately than sentences (or sentence-grams), after which create mother and father nodes from these chunk clusters. As a substitute of computing cosine similarity between consecutive sentence vectors to construct up chunks, we compute cosine similarity throughout consecutive chunk vectors and cut up them up into clusters based mostly on some similarity threshold, i.e. if the similarity drops beneath the edge, we terminate the cluster and begin a brand new one.

Each LangChain and LlamaIndex have implementations of Semantic Chunking (for sentence clustering into chunks, not chunk clustering into guardian chunks). LangChain’s Semantic Chunking permits you to set the edge utilizing percentiles, customary deviation and inter-quartile vary, whereas the LlamaIndex implementation helps solely the percentile threshold. However intuitively, this is how you would get an concept of the percentile threshold to make use of — thresholds for the opposite strategies may be computed equally. Assume your content material has N chunks and Ok clusters (based mostly in your understanding of the information or from different estimates), then assuming a uniform distribution, there could be N/Ok chunks in every cluster. If N/Ok is roughly 20%, then your percentile threshold could be roughly 80.

LlamaIndex supplies an IngestionPipeline which takes an inventory of TransformComponent objects. My pipeline seems one thing like beneath. The final element is a customized subclass of TransformComponentall you want to do is to override it is __call__ technique, which takes a Record[TextNode] and returns a Record[TextNode].

1
2
3
4
5
6
7
8
transformations = [
    text_splitter: SentenceSplitter,
    embedding_generator: HuggingFaceEmbedding,
    summary_node_builder: SemanticChunkingSummaryNodeBuilder
]
ingestion_pipeline = IngestionPipeline(transformations=transformations)
docs = SimpleDirectoryReader("/path/to/enter/docs")
nodes = ingestion_pipeline.run(paperwork=docs)

My customized element takes the specified cluster dimension Ok throughout development. It makes use of the vectors computed by the (LlamaIndex offered) HuggingFaceEmbedding element to compute similarities between consecutive vectors and makes use of Ok to compute a threshold to make use of. It then makes use of the edge to cluster the chunks, leading to an inventory of listing of chunks Record[List[TextNode]]. For every cluster, we create a abstract TextNode and set its CHILD relationships to the cluster nodes, and the PARENT relationship of every baby within the cluster to this new abstract node. The textual content of the kid nodes are first condensed utilizing extractive summarization, then these condensed summaries are additional summarized into one ultimate abstract utilizing abstractive summarization. I used bert-extractive-summarizer with bert-base-uncased for the primary and a HuggingFace summarization pipeline with fb/bert-large-cnn for the second. I suppose I might have used an LLM for the second step, however it could have taken extra time to construct the index, and I’ve been experimenting with concepts described within the DeepLearning.AI course Open Supply Fashions with HuggingFace.

Lastly, I recalculate the embeddings for the abstract nodes — I ran the abstract node texts via the HuggingFaceEmbeddinghowever I assume I might have accomplished some aggregation (mean-pool / max-pool) on the kid vectors as effectively.

Darin additionally identified one other occasion of Hierarchical Index proposed by way of the RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval and described intimately by the authors on this LlamaIndex webinar. This is a little more radical than my concept of utilizing semantic chunking to cluster consecutive chunks, in that it permits clustering of chunks throughout your complete corpus. One different vital distinction is that it permits for soft-clustering, which means a piece is usually a member of multiple chunk. They first scale back the dimensionality of the vector house utilizing UMAP (Uniform Manifold Approximation and Projection) after which apply Gaussian Combination Mannequin (GMM) to do the delicate clustering. To search out the optimum variety of clusters Ok for the GMM, one can use a mix of AIC (Aikake Data Criterion) and BIC (Bayesian Data Criterion).

In my case, when coaching the GMM, the AIC stored reducing because the variety of clusters elevated, and the BIC had its minimal worth for Ok=10which corresponds roughly to the 12 chapters in my Snowflake e-book (my check corpus). However there was a number of overlap, which might pressure me to implement some type of logic to reap the benefits of the delicate clustering, which I did not need to do, since I wished to reuse code from my earlier Semantic Chunking node builder element. In the end, I settled on 90 clusters by utilizing my unique instinct to compute Okand the ensuing clusters appear fairly effectively separated as seen beneath.

Utilizing the outcomes of the clustering, I constructed this additionally as one other customized LlamaIndex TransformComponent for hierarchical indexing. This implementation differs from the earlier one solely in the best way it assigns nodes to clusters, all different particulars with respect to textual content summarization and metadata merging are an identical.

For each these indexes, we now have a alternative to take care of the index as hierarchical, and resolve which layer(s) to question based mostly on the query, or add the abstract nodes into the identical stage as the opposite chunks, and let vector similarity floor them when queries cope with cross-cutting points that could be discovered collectively in these nodes. The RAPTOR paper studies that they do not see a major achieve utilizing the primary method over the second. As a result of my question performance is LangChain based mostly, my method has been to generate the nodes after which reformat them into LangChain Doc objects and use LCEL to question the index and generate solutions, so I have not regarded into querying from a hierarchical index in any respect.

Trying again on this work, I’m reminded of comparable decisions when designing conventional search pipelines. Typically there’s a alternative between constructing performance into the index to help a less expensive question implementation, or constructing the logic into the question pipeline that could be dearer but additionally extra versatile. I feel LlamaIndex began with the primary method (as evidenced by their weblog posts Chunking Methods for Giant Language Fashions Half I and Evaluating Ultimate Chunk Sizes for RAG Techniques utilizing LlamaIndex) whereas LangChain began with the second, despite the fact that these days there’s a number of convergence between the 2 frameworks.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles