Handle a query by checking the cache, generating a response using the large LLM if not cached,
and fine-tuning the smaller LLM based on cached results.
Parameters: |
-
query
(str )
–
The query for which a response is needed.
|
Returns: |
-
str
–
The response to the query, either from the cache or generated by the large LLM.
|
Notes
This method fine-tunes the small LLM using the cached data once the cache is populated.
If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
Source code in llamarch/patterns/layered_caching/__init__.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70 | def handle_query(self, query):
"""
Handle a query by checking the cache, generating a response using the large LLM if not cached,
and fine-tuning the smaller LLM based on cached results.
Parameters
----------
query : str
The query for which a response is needed.
Returns
-------
str
The response to the query, either from the cache or generated by the large LLM.
Notes
-----
This method fine-tunes the small LLM using the cached data once the cache is populated.
If the smaller LLM is not yet specialized, it will be fine-tuned and loaded.
"""
# Step 1: Check if the query is cached
cached_result = self.cache.get(query)
if cached_result:
return cached_result
# Step 2: If not cached, use the large LLM to answer the query
result = self.large_llm.generate(query)
# Cache the result for future use
self.cache.set(query, result)
# Step 3: Fine-tune the smaller model based on cached results
if not self.specialized_llm:
data = self.cache.get_all_values() # Use all cached results
self.specialized_llm = self.fine_tuner.fine_tune(data)
print('Fine-tuned model loaded')
return result
|