doctorpangloss 2 days ago

Yesterday my company changed a single line to switch our application’s LLM backend API from Claude to ChatGPT, because Claude started adding stuff its answers in QA style prompts that it wasn’t before, at least since August 1st.

I wish I could pay for a guarantee of performance, really quantization, which seems so simple but because it can 2-4x decrease their costs, LLM API providers keep quantizing and distilling without telling anyone. It’s a longer journey: to be an enterprise API. Which by the way, is a terrible business to be in.

rerdavies 4 hours ago

Kind of fun. I recently used Claude to generate scripts for Gnu Plot, with only slightly less convenience than this. It's kind of spooky what you can ask claude to do. e.g. "Rotate the x-ais labels by 90%; use "Arial Black" for the title, and "Roboto" for the rest of the fonts". Etc.

mellosouls 2 days ago

This is interesting and appreciated but I'm not sure it's a Show HN unless the OP is representing Anthropic?

  • rubslopes 2 days ago

    All 3 submissions of OP start with "Show HN". I think they are not aware of HN protocols.

zurfer 2 days ago

Well done. We are building something similar [1] and found that generating UIs (mostly charts) on the fly works surprisingly well in most cases, but can be a bit frustrating if you know exactly what you want and just can't prompt it (as a user) to do that because of some edge case.

While this is a cool demo that shows what LLMs can do I am a bit surprised how polished and advanced it looks (even PDF upload) for a quick start. Anyway I love that it's open source so we can learn from it.

[1] https://getdot.ai

dash2 2 days ago

This seems like Anthropic showing people how to build a thin layer around Claude. Can building a thin layer around Claude be a valuable business model? If there are good profitable UIs for Claude, wouldn't Anthropic implement them itself?

  • anonzzzies 2 days ago

    I would indeed (like Altman promised for OpenAI; 'we're gonna steamroll you'), implement all these cases (and many more) on their side, so their 'chat' becomes a full toolkit for building, visualising, prompting etc and allow people to plugin their data/processes (maybe with a few partners for that part which they can easily replace or have multiple of).

    Currently, the "Added to project" button that remains for n seconds and you have to wait for to add another file (sometimes Claude generates 4-5 files per chat) is such an annoyance that I guess they should stick to training and nothing else.

  • fragmede 2 days ago

    Opportunity cost. Anthropic's deal is in training Claude and whatever they choose to call their next model, not whatever weird little niche you're going after. I might not go after programming, but, say, a dnd character backstory generator would be a wrapper that's probably not interesting enough for them to build themselves to compete with yours. Or maybe it is, but your DND character backstory generator also doesn't have to use anthropic as the backend, there are others for you to choose from, so it's a bit of a standoff.

    • Larrikin 2 days ago

      But is there any reason to use them besides a demo to investors while you actually build the business on llama? Why build a business with a permanent subscription that's the entire core to your business?

      • fragmede 2 days ago

        because for this hypothetical niche, Claude is better than llama. now whether or not that's actually true, I don't know, but while it would be nice to sell shovels in a gold rush, not everyone has the privilege of being able to do that. In this metaphor, some people only know how to mine for gold, and pivoting to selling shoves is an entirely different skillset that the miner doesn't posess.

  • peer2pay 2 days ago

    I think the idea here is to build a thin layer but BYOD.

    I currently work for a company where most our value add lies in the data collection, cleaning and running of proprietary algorithms. A UI like this would be a game changer for us and something that Anthropic couldn’t easily replicate due to all the IP in our data pipeline.

  • billsunshine 2 days ago

    This...so this. There is no value capture in building a shell around Claude

    • Viliam1234 2 days ago

      Perhaps there is a lot of money you could get in short term. Enough to pay the costs and generate some profit.

      Also, most people are not computer experts; if you show them something can be done using your website, they will continue to do it using your website long after others have added the same functionality.

bl4ckneon 2 days ago

(didn't look at the code yet but) Would a challenge of building an app like this that heavily depends on a LLM be getting a deterministic response back? I guess you could code for it to check if it gave you a certain format of data or if it was what you expected, but if I upload something that Claude doesn't understand and it gives back something that breaks the data analysis then that seems it would be tricky to handle that case.

Please correct me if I am wrong. Thanks!

  • SparkyMcUnicorn 2 days ago

    Anthropic and OpenAI let you define a JSON schema to adhere to for tool calling.

    Here's the part you're looking for: https://github.com/anthropics/anthropic-quickstarts/blob/mai...

    • cj 2 days ago

      For some reason, the guarantee in the format of the response doesn't seem sufficient in preventing backwards incompatible changes that may happen to models.

      Yes, the response might be in a standard format. But a well formed response can still be bad/broken.

      Another way to think about it, is it can "pass QA" one day, and "fail QA" the next day even if the API response is identically formatted/structured.

      • SparkyMcUnicorn a day ago

        This is why OpenAI and Anthropic provide date versioned models.

        gpt-4o can change, but gpt-4o-2024-05-13 will always use the 2024-05-13 snapshot.

        • cj a day ago

          i have a feeling those dates are an illusion of sorts.

          I get the feeling they frequently deploy hot patches for edge cases. I hate to call them edge cases because they are actually “real cases” - things like adjusting system prompts so one day it might happy answer “Fill in the blank: F _ _ _ you”.

          To truly freeze a model, you would need to freeze its weights, freeze its system prompts (no one sees those), and avoid any and all action that might impact its output. Perhaps would even need the default temperature to be 0 so it’s truly a deterministic API, with the option to add in some temperature to the responses.

          Until then, I consider those “versions” but only reference the model weights and not the abstractions around the model

        • felixvolny a day ago

          Tangent, but it seems like such a tough engineering challenge to keep all these models around and available at an instant

weinzierl 2 days ago

I wish they'd focus more on getting the basics solid. Currently Claude can't even render anything beyond the most basic form of a table.

For example, try to let it turn multiple items in a table cell into a bulleted list. It just outputs a mess of literal HTML tags.

ideashower 2 days ago

Can you take these resulting interactives and export them to publish?

troupo 2 days ago

Do they have any plans on opening up APIs to private individuals?

albert_e 2 days ago

Looks very interesting.

I am more familiar with React - am looking for a React example that achieves similar UI, any working examples I can take inspiration from?

  • SparkyMcUnicorn 2 days ago

    This is React.

    • albert_e 2 days ago

      My bad. I had a brainfade - registered something else on my first skim. Thanks.