doctorpangloss 8 months ago

Yesterday my company changed a single line to switch our application’s LLM backend API from Claude to ChatGPT, because Claude started adding stuff its answers in QA style prompts that it wasn’t before, at least since August 1st.

I wish I could pay for a guarantee of performance, really quantization, which seems so simple but because it can 2-4x decrease their costs, LLM API providers keep quantizing and distilling without telling anyone. It’s a longer journey: to be an enterprise API. Which by the way, is a terrible business to be in.

mellosouls 8 months ago

This is interesting and appreciated but I'm not sure it's a Show HN unless the OP is representing Anthropic?

  • rubslopes 8 months ago

    All 3 submissions of OP start with "Show HN". I think they are not aware of HN protocols.

zurfer 8 months ago

Well done. We are building something similar [1] and found that generating UIs (mostly charts) on the fly works surprisingly well in most cases, but can be a bit frustrating if you know exactly what you want and just can't prompt it (as a user) to do that because of some edge case.

While this is a cool demo that shows what LLMs can do I am a bit surprised how polished and advanced it looks (even PDF upload) for a quick start. Anyway I love that it's open source so we can learn from it.

[1] https://getdot.ai

dash2 8 months ago

This seems like Anthropic showing people how to build a thin layer around Claude. Can building a thin layer around Claude be a valuable business model? If there are good profitable UIs for Claude, wouldn't Anthropic implement them itself?

  • anonzzzies 8 months ago

    I would indeed (like Altman promised for OpenAI; 'we're gonna steamroll you'), implement all these cases (and many more) on their side, so their 'chat' becomes a full toolkit for building, visualising, prompting etc and allow people to plugin their data/processes (maybe with a few partners for that part which they can easily replace or have multiple of).

    Currently, the "Added to project" button that remains for n seconds and you have to wait for to add another file (sometimes Claude generates 4-5 files per chat) is such an annoyance that I guess they should stick to training and nothing else.

  • fragmede 8 months ago

    Opportunity cost. Anthropic's deal is in training Claude and whatever they choose to call their next model, not whatever weird little niche you're going after. I might not go after programming, but, say, a dnd character backstory generator would be a wrapper that's probably not interesting enough for them to build themselves to compete with yours. Or maybe it is, but your DND character backstory generator also doesn't have to use anthropic as the backend, there are others for you to choose from, so it's a bit of a standoff.

    • Larrikin 8 months ago

      But is there any reason to use them besides a demo to investors while you actually build the business on llama? Why build a business with a permanent subscription that's the entire core to your business?

      • fragmede 8 months ago

        because for this hypothetical niche, Claude is better than llama. now whether or not that's actually true, I don't know, but while it would be nice to sell shovels in a gold rush, not everyone has the privilege of being able to do that. In this metaphor, some people only know how to mine for gold, and pivoting to selling shoves is an entirely different skillset that the miner doesn't posess.

  • peer2pay 8 months ago

    I think the idea here is to build a thin layer but BYOD.

    I currently work for a company where most our value add lies in the data collection, cleaning and running of proprietary algorithms. A UI like this would be a game changer for us and something that Anthropic couldn’t easily replicate due to all the IP in our data pipeline.

  • billsunshine 8 months ago

    This...so this. There is no value capture in building a shell around Claude

    • Viliam1234 8 months ago

      Perhaps there is a lot of money you could get in short term. Enough to pay the costs and generate some profit.

      Also, most people are not computer experts; if you show them something can be done using your website, they will continue to do it using your website long after others have added the same functionality.

bl4ckneon 8 months ago

(didn't look at the code yet but) Would a challenge of building an app like this that heavily depends on a LLM be getting a deterministic response back? I guess you could code for it to check if it gave you a certain format of data or if it was what you expected, but if I upload something that Claude doesn't understand and it gives back something that breaks the data analysis then that seems it would be tricky to handle that case.

Please correct me if I am wrong. Thanks!

  • SparkyMcUnicorn 8 months ago

    Anthropic and OpenAI let you define a JSON schema to adhere to for tool calling.

    Here's the part you're looking for: https://github.com/anthropics/anthropic-quickstarts/blob/mai...

    • cj 8 months ago

      For some reason, the guarantee in the format of the response doesn't seem sufficient in preventing backwards incompatible changes that may happen to models.

      Yes, the response might be in a standard format. But a well formed response can still be bad/broken.

      Another way to think about it, is it can "pass QA" one day, and "fail QA" the next day even if the API response is identically formatted/structured.

      • SparkyMcUnicorn 8 months ago

        This is why OpenAI and Anthropic provide date versioned models.

        gpt-4o can change, but gpt-4o-2024-05-13 will always use the 2024-05-13 snapshot.

        • cj 8 months ago

          i have a feeling those dates are an illusion of sorts.

          I get the feeling they frequently deploy hot patches for edge cases. I hate to call them edge cases because they are actually “real cases” - things like adjusting system prompts so one day it might happy answer “Fill in the blank: F _ _ _ you”.

          To truly freeze a model, you would need to freeze its weights, freeze its system prompts (no one sees those), and avoid any and all action that might impact its output. Perhaps would even need the default temperature to be 0 so it’s truly a deterministic API, with the option to add in some temperature to the responses.

          Until then, I consider those “versions” but only reference the model weights and not the abstractions around the model

        • felixvolny 8 months ago

          Tangent, but it seems like such a tough engineering challenge to keep all these models around and available at an instant

rerdavies 8 months ago

Kind of fun. I recently used Claude to generate scripts for Gnu Plot, with only slightly less convenience than this. It's kind of spooky what you can ask claude to do. e.g. "Rotate the x-ais labels by 90%; use "Arial Black" for the title, and "Roboto" for the rest of the fonts". Etc.

weinzierl 8 months ago

I wish they'd focus more on getting the basics solid. Currently Claude can't even render anything beyond the most basic form of a table.

For example, try to let it turn multiple items in a table cell into a bulleted list. It just outputs a mess of literal HTML tags.

ideashower 8 months ago

Can you take these resulting interactives and export them to publish?

troupo 8 months ago

Do they have any plans on opening up APIs to private individuals?

albert_e 8 months ago

Looks very interesting.

I am more familiar with React - am looking for a React example that achieves similar UI, any working examples I can take inspiration from?

  • SparkyMcUnicorn 8 months ago

    This is React.

    • albert_e 8 months ago

      My bad. I had a brainfade - registered something else on my first skim. Thanks.