Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
xienze
38 days ago
|
parent
|
context
|
favorite
| on:
A few words on DS4
I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.
fgfarben
38 days ago
|
next
[–]
That prefill number isn't right. M4 Max hits 200-300:
https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...
hadlock
38 days ago
|
parent
|
next
[–]
M5 studio is gonna sell like hot cakes
throwdbaaway
38 days ago
|
prev
|
next
[–]
Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.
aiscoming
38 days ago
|
prev
[–]
if it's just the coding agent system prompt and tools, you can cache that
xienze
38 days ago
|
parent
[–]
Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: