emreckartal 5 days ago

Emre here from Homebrew Research. It's great to see Ichigo on HN!

A quick intro: We're a Local AI company building local AI tools and training open-source models.

Ichigo is our training method that enables LLMs to understand human speech and talk back with low latency - thanks to FishSpeech integration. It is open data, open weights, and weight initialized with Llama 3.1, extending its reasoning ability.

Plus, we are the creators and lead maintainers of: https://jan.ai/, Local AI Assistant - an alternative to ChatGPT & https://cortex.so/, Local AI Toolkit (soft launch coming soon)

Everything we build and train is done out in the open - we share our progress on:

https://x.com/homebrewltd https://discord.gg/hTmEwgyrEg

You can check out all our products on our simple website: https://homebrew.ltd/

  • gnuly 5 days ago

    any plans to share progress on open channels like matrix.org or even irc?

    • silverliver a day ago

      I second this request. Any publicly indexable channel would be fine.

      I think Matrix is not publicly indexable unless the channel is unencrypted and set to public.

cassepipe 8 days ago

Finally I can use one of the random facts that have entered my brain for decades now even though I can't remember where my keys are.

If I remember correctly, "ichigo" means strawberry in japanese. You are welcome.

  • SapporoChris 5 days ago

    Sorry, you're wrong. It means 1 5. Just kidding, it is strawberry but it can also be read as one and five. However, it is not fifteen.

    • TheCraiggers 5 days ago

      > it can also be read as one and five. However, it is not fifteen.

      Can you help me wrap my brain around this? Does it mean six? I'm struggling to understand how a word can mean two numbers and how this would actually be used in a conversation.

      Thanks. I'm curious and trying to search for this to understand just returns anime.

      • BugsJustFindMe 5 days ago

        > I'm struggling to understand how a word can mean two numbers

        Ichi is the word for 1. Go is the word for 5.

        • TheCraiggers 5 days ago

          /smacks forehead.

          Can't believe I fell for that.

          • gardenmud 4 days ago

            I mean, it wasn't really a trick.

            It's truly the exact same as someone saying "onefive can be read as (one five), but it's not (fifteen)" - to a non-English speaker I mean - I don't read 'prank' in that statement

  • d3w3y 5 days ago

    There are strawberries all over the readme so I reck you're right.

    • mmastrac 5 days ago

      Is this a continuation of the meme that GPT can't identify the number of "R"s in "strawberry"?

      • TheDong 5 days ago

        > How many 'r's are in the word 'ichigo'?

        GPT 4o: The word "ichigo," which is the Romanized spelling (romaji) of いちご, contains one "r." It appears in the letter "r" in "chi," as the "ch" sound in romaji represents a combination of the "r" sound from "r" and "t" sound from "i."

        Thank you chatgpt. I'm glad we've burned down a bunch of forests for this.

        You can consistently get the right answer with a prompt of:

        > Write python code, and run it, to count the number of 'r' characters in いちご.

        though. For numeric stuff, telling the thing to just write python code makes it significantly better at getting right answers.

        • BugsJustFindMe 5 days ago

          Without any special prompt change, I get

          There are no “r”s in the word “ichigo.”

          Maybe your instructions are bad.

      • dev-jayson 5 days ago

        I think you might be on to something

  • AtlasBarfed 5 days ago

    Getsuga tenshou!!

    • dumb1224 5 days ago

      haha was looking for that!

      Ban-kai 卍解

  • adammarples 5 days ago

    From the book tomorrow and tomorrow and tomorrow?

  • zarmin 5 days ago

    Your keys are in the fridge with the remote control.

  • greydius 5 days ago

    I think it's a bit of word play. 苺 (strawberry) and 一語 (one word) are both read "Ichigo".

thruflo 5 days ago

Great stuff. Voice AI is great to run locally not just for privacy / access to personal data but also because of the low latency requirement. If there's a delay in conversation caused by a network call, it just feels weird, like an old satellite phone call.

tmshapland 5 days ago

This is a really cool project! What have people built with it? I'd love to learn about what local apps people are building on this.

  • emreckartal 5 days ago

    Thanks! We've received feedback on use cases like live translation, safe and untrackable educational tools for kids, and language-learning apps. There are so many possibilities, and hope to see guys building amazing products on top of Ichigo.

    • itake 5 days ago

      I just tried to use the demo website for live translation. The AI always responded in English, either ignoring my request to only respond in French or Lao, or preface the translation with english ("I can translate that to French. the translation is: ...").

      I'm trying to use chatgpt for ai translation, but the other big problem I run into is TTS and SST on non-top 40 languages (e.g. lao). Facebook has a TTS library, but it isn't open for commercial use unfortunately.

      • emreckartal 5 days ago

        Oh, I see. We've limited it to English for simplicity for the demo. More languages are planned for future releases.

        • itake 4 days ago

          what is the limiting factor to all llama languages for stt or tts?

famahar 5 days ago

Looks impressive. I'm guessing the demo isn't representative of the full possibilities of this? Tried to have a basic conversation in Japanese and it kept on sticking with English. When it did eventually speak Japanese the pronunciation was completely off. I'm really excited about the possibility of local language learning with near realtime conversation practice. Will keep an eye on this.

mentalgear 5 days ago

Kudos to the team, this is truly impressive work! It's exciting to see how AI connects with the local-first movement, which is also really exploding in popularity. (The idea of local-first, where data processing and functionality are prioritized on users' own devices, aligns perfectly with emerging privacy concerns and the push for decentralization.)

Bringing AI into this space enhances user experience while respecting their autonomy over data. It feels like a promising step toward a future where we can leverage the power of AI without compromising on privacy or control. Really looking forward to seeing how this evolves!

cchance 5 days ago

its amazing to see cool projects like this really REALLY based in opensource and open training like this wow

  • emreckartal 5 days ago

    Thanks! It's all open research, source code, data, and weights.

frankensteins 5 days ago

Great initiative! before adding more comments, I'm trying to set up on my local Mac M3 machine. I'm having a hard time to install dependencies. Anyone here have the same issue?

  • emreckartal 5 days ago

    Thanks! You can't run Ichigo on a Mac M3 just yet. It'll be possible to run it locally on Mac once we integrate it with Jan.ai

lostmsu 5 days ago

Very cool, but a bit less practical than some alternatives because it does not seem to do request transcription.

  • emreckartal 5 days ago

    Actually, it does. You can turn on the transcription feature from the bottom right corner and even type to Ichigo if you want. We didn’t show it in the launch video since we were focusing on the verbal interaction side of things.

    • emreckartal 5 days ago

      Ah, I see now.

      To clarify, while you can enable transcription to see what Ichigo says, Ichigo's design skips directly from audio to speech representations without creating a text transcription of the user’s input. This makes interactions faster but does mean that the user's spoken input isn't transcribed to text.

      The flow we use is Speech → Encoder → Speech Representations → LLM → Text → TTS. By skipping the text step, we're able to speed things up and focus on the verbal experience.

      Hope this clears things up!

      • lostmsu 4 days ago

        I understand that. The problem is that in many scenarios users would want to see transcripts of what they said alongside the model output. Like if I have a chat with a model about choosing a place to move to, I would probably also want to review it later. And when I review it, I will see: me: /audio record/ AI: 200-300m. No easy way to see at glance what the AI answer was about.

        • readyplayeremma 4 days ago

          You can just run whisper on the conversations as a background job populating the text versions of all the user inputs, so it doesn't interfere with the real-time latency.

          • lostmsu 4 days ago

            It's not going to match what model hears.

p0larboy 5 days ago

Tried demo but all I got was "I'm sorry, I can't quite catch that".