J.S.Cock · AI Inquiry Demo
← Home
Hosted vs local

Extraction comparison

Same prompt, same emails, two backends. Hosted: Claude Haiku via the Anthropic CLI subscription (the demo’s default). Local: qwen3:8b running on a single RTX 3080 with 10 GB VRAM via Ollama. Field-level scoring treats the Claude extract as the reference baseline, so the accuracy figure reads as “does the local model agree with the hosted one?”

78.3%
Field-level accuracy
184/235 fields agreed
7/10
Completeness verdicts agreed
The decision that gates the rest of the workflow
17.4s
Avg latency / inquiry
57 tok/s output, on local GPU
€0
Per-inquiry cost
Local GPU. No API tokens spent on this run.
Headline finding

Local Qwen 8B agrees with hosted Claude on 78.3% of fields.

That number is achieved by an 8-billion-parameter model running on a single consumer GPU at zero marginal cost. Where it disagrees, the gap is mostly specialty Norwegian valve vocabulary the model hasn’t seen enough during training (e.g. skyvespjeld, syrefast, ensilasje). A larger Qwen (14B / 32B) or a fine-tune on JS Cock terminology would close most of that gap. For now the demo runs hosted by default; self-hosted is a real fallback if cost or data residency forces it.

What we ran: 10 inquiries from your archive, total 9,869 output tokens generated locally in 174 seconds.

Per-inquiry breakdown

Where Qwen agreed and where it slipped

JTStatusAccuracyVerdict (rule)LatencySample diffs
JT118202OK77.8%
14/18
partial / partial19.4s
intent
customer.contact_email
+2 more
JT119284OK86.2%
25/29
partial / partial18.8s
customer.company
customer.phone
+2 more
JT119626schema-drift83.3%
15/18
partial / complete7.5s
intent
customer.phone
+1 more
JT121885OK83.3%
15/18
partial / partial8.6s
customer.phone
item[1].product_reference
+1 more
JT122083OK68.6%
35/51
partial / complete22.2s
item[1].quantity
item[1].product_reference
+14 more
JT122170OK88.9%
16/18
partial / partial9.1s
completeness
item[1].product_reference
JT122579schema-drift88.9%
16/18
partial / partial19.8s
customer.phone
item[1].product_reference
JT122623schema-drift88.9%
16/18
unusable / partial19.5s
customer.company
item[1].product_reference
JT122961OK65.5%
19/29
unusable / unusable28.3s
intent
completeness
+8 more
JT124009schema-drift72.2%
13/18
complete / complete20.5s
completeness
item[1].product_reference
+3 more
Talking-point for the demo call

What this means for J.S.Cock