Problem
A batch processing script using an on-device LLM hung after 5-6 items. No timeout, no error — just stuck. The script created a fresh session for every item:
for item in items:
session = LanguageModelSession(...) # NEW SESSION EVERY TIME
result = session.respond(prompt, generating=Schema)Key Insight
Each session initialization has overhead (~20ms) and allocates its own context window. Multiple sessions serialize on the hardware — they queue, not parallelize. By item 5-6, the system hits internal limits and blocks indefinitely.
Fix: reuse a single session.
session = LanguageModelSession(...) # ONE session
for item in items:
result = session.respond(prompt, generating=Schema) # ReuseResult: 40% faster, zero hangs, all 15+ items complete reliably.
For large file processing (chunked scanning), use a session refresh pattern — reuse one session for 5 chunks, then create a fresh one to clear the context window:
if chunks_since_refresh >= 5 or session is None:
session = LanguageModelSession(...)
chunks_since_refresh = 0Takeaway
Never create a new LLM session per item in a loop. Reuse sessions for the same task. Refresh periodically to prevent context overflow. The hardware serializes inference regardless of how many sessions you create.