Layered Caching

img

The LayeredCaching class is a component in the llamarch library that provides an interface to work with caching layers that are used to improve the performance of LLMs through fine-tuning.

LayeredCaching

Source code in llamarch/patterns/layered_caching/__init__.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
class LayeredCaching:
    def __init__(self, large_llm, small_llm):
        """
        Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

        Parameters
        ----------
        large_llm : LLM
            The large language model used to generate responses for queries that are not cached.
        small_llm : LLM
            The smaller language model used for fine-tuning based on cached results.

        Attributes
        ----------
        large_llm : LLM
            The large language model used to handle uncached queries.
        fine_tuner : FineTuner
            Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.
        cache : Cache
            Cache instance used to store query results for later retrieval.
        specialized_llm : LLM, optional
            A fine-tuned version of the small LLM, set after fine-tuning based on cached data.
        """
        self.large_llm = large_llm
        self.fine_tuner = FineTuner(small_llm)
        self.cache = Cache()
        self.specialized_llm = None  # This will be set after fine-tuning

    def handle_query(self, query):
        """
        Handle a query by checking the cache, generating a response using the large LLM if not cached,
        and fine-tuning the smaller LLM based on cached results.

        Parameters
        ----------
        query : str
            The query for which a response is needed.

        Returns
        -------
        str
            The response to the query, either from the cache or generated by the large LLM.

        Notes
        -----
        This method fine-tunes the small LLM using the cached data once the cache is populated.
        If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
        """
        # Step 1: Check if the query is cached
        cached_result = self.cache.get(query)
        if cached_result:
            return cached_result

        # Step 2: If not cached, use the large LLM to answer the query
        result = self.large_llm.generate(query)

        # Cache the result for future use
        self.cache.set(query, result)

        # Step 3: Fine-tune the smaller model based on cached results
        if not self.specialized_llm:
            data = self.cache.get_all_values()  # Use all cached results
            self.specialized_llm = self.fine_tuner.fine_tune(data)
            print('Fine-tuned model loaded')
        return result

    def handle_future_query(self, query):
        """
        Handle future queries using the specialized fine-tuned smaller LLM.

        Parameters
        ----------
        query : str
            The query for which a response is needed.

        Returns
        -------
        str
            The response to the query, generated by the fine-tuned smaller LLM.

        Notes
        -----
        This method assumes that the small LLM has already been fine-tuned using cached data.
        """
        return self.specialized_llm.generate(query)

__init__(large_llm, small_llm)

Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

Parameters:
  • large_llm (LLM) –

    The large language model used to generate responses for queries that are not cached.

  • small_llm (LLM) –

    The smaller language model used for fine-tuning based on cached results.

Attributes:
  • large_llm (LLM) –

    The large language model used to handle uncached queries.

  • fine_tuner (FineTuner) –

    Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.

  • cache (Cache) –

    Cache instance used to store query results for later retrieval.

  • specialized_llm ((LLM, optional)) –

    A fine-tuned version of the small LLM, set after fine-tuning based on cached data.

Source code in llamarch/patterns/layered_caching/__init__.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def __init__(self, large_llm, small_llm):
    """
    Initialize the LayeredCaching system with a large LLM, a small LLM, and a caching mechanism.

    Parameters
    ----------
    large_llm : LLM
        The large language model used to generate responses for queries that are not cached.
    small_llm : LLM
        The smaller language model used for fine-tuning based on cached results.

    Attributes
    ----------
    large_llm : LLM
        The large language model used to handle uncached queries.
    fine_tuner : FineTuner
        Fine-tuner instance responsible for fine-tuning the small LLM based on cached data.
    cache : Cache
        Cache instance used to store query results for later retrieval.
    specialized_llm : LLM, optional
        A fine-tuned version of the small LLM, set after fine-tuning based on cached data.
    """
    self.large_llm = large_llm
    self.fine_tuner = FineTuner(small_llm)
    self.cache = Cache()
    self.specialized_llm = None  # This will be set after fine-tuning

handle_query(query)

Handle a query by checking the cache, generating a response using the large LLM if not cached, and fine-tuning the smaller LLM based on cached results.

Parameters:
  • query (str) –

    The query for which a response is needed.

Returns:
  • str

    The response to the query, either from the cache or generated by the large LLM.

Notes

This method fine-tunes the small LLM using the cached data once the cache is populated. If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.

Source code in llamarch/patterns/layered_caching/__init__.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def handle_query(self, query):
    """
    Handle a query by checking the cache, generating a response using the large LLM if not cached,
    and fine-tuning the smaller LLM based on cached results.

    Parameters
    ----------
    query : str
        The query for which a response is needed.

    Returns
    -------
    str
        The response to the query, either from the cache or generated by the large LLM.

    Notes
    -----
    This method fine-tunes the small LLM using the cached data once the cache is populated.
    If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
    """
    # Step 1: Check if the query is cached
    cached_result = self.cache.get(query)
    if cached_result:
        return cached_result

    # Step 2: If not cached, use the large LLM to answer the query
    result = self.large_llm.generate(query)

    # Cache the result for future use
    self.cache.set(query, result)

    # Step 3: Fine-tune the smaller model based on cached results
    if not self.specialized_llm:
        data = self.cache.get_all_values()  # Use all cached results
        self.specialized_llm = self.fine_tuner.fine_tune(data)
        print('Fine-tuned model loaded')
    return result

handle_future_query(query)

Handle future queries using the specialized fine-tuned smaller LLM.

Parameters:
  • query (str) –

    The query for which a response is needed.

Returns:
  • str

    The response to the query, generated by the fine-tuned smaller LLM.

Notes

This method assumes that the small LLM has already been fine-tuned using cached data.

Source code in llamarch/patterns/layered_caching/__init__.py
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def handle_future_query(self, query):
    """
    Handle future queries using the specialized fine-tuned smaller LLM.

    Parameters
    ----------
    query : str
        The query for which a response is needed.

    Returns
    -------
    str
        The response to the query, generated by the fine-tuned smaller LLM.

    Notes
    -----
    This method assumes that the small LLM has already been fine-tuned using cached data.
    """
    return self.specialized_llm.generate(query)