Layered Caching

The LayeredCaching class is a component in the llamarch library that provides an interface to work with caching layers that are used to improve the performance of LLMs through fine-tuning.

`LayeredCaching`

Source code in llamarch/patterns/layered_caching/__init__.py

class LayeredCaching:
    def __init__(self, large_llm, small_llm):
        """
        Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

        Parameters
        ----------
        large_llm : LLM
            The large language model used to generate responses for queries that are not cached.
        small_llm : LLM
            The smaller language model used for fine-tuning based on cached results.

        Attributes
        ----------
        large_llm : LLM
            The large language model used to handle uncached queries.
        fine_tuner : FineTuner
            Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.
        cache : Cache
            Cache instance used to store query results for later retrieval.
        specialized_llm : LLM, optional
            A fine-tuned version of the small LLM, set after fine-tuning based on cached data.
        """
        self.large_llm = large_llm
        self.fine_tuner = FineTuner(small_llm)
        self.cache = Cache()
        self.specialized_llm = None  # This will be set after fine-tuning

    def handle_query(self, query):
        """
        Handle a query by checking the cache, generating a response using the large LLM if not cached,
        and fine-tuning the smaller LLM based on cached results.

        Parameters
        ----------
        query : str
            The query for which a response is needed.

        Returns
        -------
        str
            The response to the query, either from the cache or generated by the large LLM.

        Notes
        -----
        This method fine-tunes the small LLM using the cached data once the cache is populated.
        If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
        """
        # Step 1: Check if the query is cached
        cached_result = self.cache.get(query)
        if cached_result:
            return cached_result

        # Step 2: If not cached, use the large LLM to answer the query
        result = self.large_llm.generate(query)

        # Cache the result for future use
        self.cache.set(query, result)

        # Step 3: Fine-tune the smaller model based on cached results
        if not self.specialized_llm:
            data = self.cache.get_all_values()  # Use all cached results
            self.specialized_llm = self.fine_tuner.fine_tune(data)
            print('Fine-tuned model loaded')
        return result

    def handle_future_query(self, query):
        """
        Handle future queries using the specialized fine-tuned smaller LLM.

        Parameters
        ----------
        query : str
            The query for which a response is needed.

        Returns
        -------
        str
            The response to the query, generated by the fine-tuned smaller LLM.

        Notes
        -----
        This method assumes that the small LLM has already been fine-tuned using cached data.
        """
        return self.specialized_llm.generate(query)

`init(large_llm, small_llm)`

Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

Parameters:	`large_llm` (`LLM`) – The large language model used to generate responses for queries that are not cached. `small_llm` (`LLM`) – The smaller language model used for fine-tuning based on cached results.

Attributes:

large_llm (LLM) –

The large language model used to handle uncached queries.
fine_tuner (FineTuner) –

Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.
cache (Cache) –

Cache instance used to store query results for later retrieval.
specialized_llm ((LLM, optional)) –

A fine-tuned version of the small LLM, set after fine-tuning based on cached data.

Source code in llamarch/patterns/layered_caching/__init__.py

def __init__(self, large_llm, small_llm):
    """
    Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

    Parameters
    ----------
    large_llm : LLM
        The large language model used to generate responses for queries that are not cached.
    small_llm : LLM
        The smaller language model used for fine-tuning based on cached results.

    Attributes
    ----------
    large_llm : LLM
        The large language model used to handle uncached queries.
    fine_tuner : FineTuner
        Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.
    cache : Cache
        Cache instance used to store query results for later retrieval.
    specialized_llm : LLM, optional
        A fine-tuned version of the small LLM, set after fine-tuning based on cached data.
    """
    self.large_llm = large_llm
    self.fine_tuner = FineTuner(small_llm)
    self.cache = Cache()
    self.specialized_llm = None  # This will be set after fine-tuning

`handle_query(query)`

Handle a query by checking the cache, generating a response using the large LLM if not cached, and fine-tuning the smaller LLM based on cached results.

Parameters:	`query` (`str`) – The query for which a response is needed.

Returns:	`str` – The response to the query, either from the cache or generated by the large LLM.

Notes

This method fine-tunes the small LLM using the cached data once the cache is populated. If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.

Source code in llamarch/patterns/layered_caching/__init__.py

def handle_query(self, query):
    """
    Handle a query by checking the cache, generating a response using the large LLM if not cached,
    and fine-tuning the smaller LLM based on cached results.

    Parameters
    ----------
    query : str
        The query for which a response is needed.

    Returns
    -------
    str
        The response to the query, either from the cache or generated by the large LLM.

    Notes
    -----
    This method fine-tunes the small LLM using the cached data once the cache is populated.
    If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
    """
    # Step 1: Check if the query is cached
    cached_result = self.cache.get(query)
    if cached_result:
        return cached_result

    # Step 2: If not cached, use the large LLM to answer the query
    result = self.large_llm.generate(query)

    # Cache the result for future use
    self.cache.set(query, result)

    # Step 3: Fine-tune the smaller model based on cached results
    if not self.specialized_llm:
        data = self.cache.get_all_values()  # Use all cached results
        self.specialized_llm = self.fine_tuner.fine_tune(data)
        print('Fine-tuned model loaded')
    return result

`handle_future_query(query)`

Handle future queries using the specialized fine-tuned smaller LLM.

Parameters:	`query` (`str`) – The query for which a response is needed.

Returns:	`str` – The response to the query, generated by the fine-tuned smaller LLM.

Notes

This method assumes that the small LLM has already been fine-tuned using cached data.

Source code in llamarch/patterns/layered_caching/__init__.py

def handle_future_query(self, query):
    """
    Handle future queries using the specialized fine-tuned smaller LLM.

    Parameters
    ----------
    query : str
        The query for which a response is needed.

    Returns
    -------
    str
        The response to the query, generated by the fine-tuned smaller LLM.

    Notes
    -----
    This method assumes that the small LLM has already been fine-tuned using cached data.
    """
    return self.specialized_llm.generate(query)

Layered Caching

LayeredCaching

__init__(large_llm, small_llm)

handle_query(query)

handle_future_query(query)

`LayeredCaching`

`init(large_llm, small_llm)`

`handle_query(query)`

`handle_future_query(query)`