Session Reuse Eliminates Hangs in Batch LLM Processing

Problem

A batch processing script using an on-device LLM hung after 5-6 items. No timeout, no error — just stuck. The script created a fresh session for every item:

for item in items:
    session = LanguageModelSession(...)  # NEW SESSION EVERY TIME
    result = session.respond(prompt, generating=Schema)

Each session initialization has overhead (~20ms) and allocates its own context window. Multiple sessions serialize on the hardware — they queue, not parallelize. By item 5-6, the system hits internal limits and blocks indefinitely.

Fix: reuse a single session.

session = LanguageModelSession(...)  # ONE session
for item in items:
    result = session.respond(prompt, generating=Schema)  # Reuse

Session reuse cut processing time by 40%, eliminated hangs, and all 15+ items completed reliably.

For large file processing (chunked scanning), use a session refresh pattern — reuse one session for 5 chunks, then create a fresh one to clear the context window:

if chunks_since_refresh >= 5 or session is None:
    session = LanguageModelSession(...)
    chunks_since_refresh = 0

Takeaway

Never create a new LLM session per item in a loop. Reuse sessions for the same task. Refresh periodically to prevent context overflow. The hardware serializes inference regardless of how many sessions you create.

Problem

A batch processing script using an on-device LLM hung after 5-6 items. No timeout, no error — just stuck. The script created a fresh session for every item:

for item in items:
    session = LanguageModelSession(...)  # NEW SESSION EVERY TIME
    result = session.respond(prompt, generating=Schema)

Key Insight

Fix: reuse a single session.

session = LanguageModelSession(...)  # ONE session
for item in items:
    result = session.respond(prompt, generating=Schema)  # Reuse

Session reuse cut processing time by 40%, eliminated hangs, and all 15+ items completed reliably.

For large file processing (chunked scanning), use a session refresh pattern — reuse one session for 5 chunks, then create a fresh one to clear the context window:

if chunks_since_refresh >= 5 or session is None:
    session = LanguageModelSession(...)
    chunks_since_refresh = 0