G's blog

September 8, 2025 ยท 8 min read

On AI 'Music', open models, and the future of creative tools

An image showing a crudely-drawn middle finger to Suno, Udio and "AI Music"

As someone who actually works with music and sound engineering, I have to say all these generative models and platforms - Suno, Udio, Sonauto, etc. are still very much in the toy category. They're entertaining, of course. It's objectively funny to have a computer make songs about idiotic things, but we've reached this weird saturation point where every commercial music generator basically does the same mediocre thing: spit out complete tracks that are, frankly, pretty low quality across the board to anyone with a trained ear, or anyone who actually listens to music.

"The future isn't in raw generation, but collaboration" - that's easy to agree on, right? And I suspect the companies know this too - Suno just acquired some generative DAW software, and you can bet they're going full steam ahead on trying to rival actual music production tools like FL Studio. Except it is way more nuanced. Suno's toy will probably be yet another grift under the pretence of being a "creative tool". I don't think it'll be very good.

The transparent paradox

Me being me, I always advocate for open-source software, free (beer) things and owning my damn computer, making it do things I damn please. Remember when LLaMA got leaked and suddenly everyone was incredibly excited about running ChatGPT-level models locally? That was huge for the AI community. "We need something similar for music generation", right? That is completely ignoring the massive negative impacts this would have on actual artists and musicians, of course. But that's a separate discussion, so let's indulge - why not, it's my blog.

Right now, if you can make something locally available that used to only exist commercially, you'll get massive attention. Partner with ComfyUI people, promote on the right forums, include calls to action like "Try now at our-website.com/new-model" in your releases and you will get a flood of users and investor attention. It doesn't even have to be mutually exclusive with making profit - you can open source the weights and still run an API or user-facing business. Look at Qwen, DeepSeek, Stability AI, and so on. In a unique twist, young people are way more interested in academic research (more so the runnable demos and pre-trained weights from that), something I wouldn't expect from the brainrot-filled 2020s.

But here's the paradox - if you make something open source, can you really monetize it well? Sure, you might get an initial kickstart, but what then? Make it available and people will distill from your model and make their own versions, run it locally or create competing services. So why would you do it? Well, you could do it for the sake of research and pushing the envelope. You could do it for the attention and prestige. Or you could do it to build a community around your tech. All valid, perhaps utopian reasons. If money were no object...

Let's be realistic - we live in a capitalist shithole we all hate unless you're a fucking grifter. Most companies will prioritize profit over openness. So the paradox remains - open models and research get attention and prestige, but they also get copied and diluted, making it hard to sustain a business. Nerds and enthusiasts want open models, users don't care, companies want to make money and grow infinitely with as little effort as possible.

The legality problem and why the results suck anyway

The thing about all of these generative models is that they cannot exist without massive amounts of copyrighted data. There's no synthetic music dataset large enough, and no realistic legal way to get millions of songs without consent and compensation being factors. This is why most AI research groups will never release their datasets or reproduction instructions - too much legal trouble. Even distillations from models already trained on copyrighted data are, at the very least, giving me the heebie-jeebies in terms of legality and ethics.

Companies like Google, Spotify, Apple - they're going to have enormous advantages here. Google is already working on this with the entire YouTube Music corpus probably available to them. Meanwhile smaller companies are doing things like training on Suno outputs. These outputs cannot be copyrighted, so it's a loophole, but surely it is a temporary one, right?

The whole legal situation is grayer than AI apps' UI colors. Udio got fucked by lawsuits, Suno got sued too, because they were (allegedly) trained on copyrighted material. It's almost inevitable that out of billions of possible seeds, some will just spit out training data verbatim.

And you would think that with all this data, the results would be amazingly diverse, right? Wrong. The results are still pretty mediocre. The models are still limited by their architectures, the training data is biased toward popular music, and the conditioning methods are rudimentary at best.

As a music producer, it's always painfully obvious what latent pool any given genre is being generated from. The voice tones, production styles, melodies - it's almost always pulling from the same top 500 Spotify chart songs for any genre. The models are biased toward popular music by definition, then reinforcement learned by users to sound even more generic, eventually converging to a bland average or completely unlistenable garbage.

This is how you get those stupid fast square lead melodies with sped-up 2021 deep house tracks whenever the model gets conditioned even slightly outside its most represented training data. It's not provably quantifiable because "vibes" aren't measurable, but that's my take on the whole music generation industry right now.

Conditioning also seems to impact the subjective quality enormously. Whenever I test a generative model, I now try it with empty prompts and no lyrics - helps gauge its biases and frankly often produces way more interesting results than when you prompt it with meaningless words like "generate an enticing engaging soundscape for my amazing high-quality production." and the most generic pop lyrics imaginable by an LLM.

There's another thing - text can't really represent music that well anyway, how do you describe a feeling that you have no words for? Music is inherently abstract, and trying to pin it down with words is like trying to describe a taste, you can only get so far.

The wishful thinking I have for the future

I picture the future being collaborative and controllable. In a DAW or a recording studio, as a musician or producer, I have absolute control over every part of the sound. I imagine AI as a tool to assist me, not replace me with feces. In a proposed AI DAW, I can compose a melody with MIDI and get realtime output, but maybe I won't like it initially. I should be able to prompt individual aspects - make this synth more "sparkly and colorful," adjust just that track, pull in my own recordings, transform sounds individually. It is not bound by text prompts, but by sound - represented and raw alike.

Maybe, I could even generate an entire boilerplate track to work on - too shoddy to release, but a starting point, much like random melody generators people use for inspiration or pulling a random YouTube video with no view to sample. The point is, I get to decide, not the AI. The AI is a tool, like a synth plugin or an effect, that I can tweak and refine to my liking.

It doesn't even have to be exclusively one model - it could be a language model trained on creating synth presets, as long as it lets me accomplish results. The creative process would be my chipping away at it, refining it, not the AI spitting out a finished product that I have no say in. Product - I guess that is the best way to describe AI music right now, a product for lazy consumption and grifting, extorting you for your attention and indirectly profits via ad revenue and stealing your data.

That's AI as a tool. This is what I thought AI would be for before ChatGPT took off, before I got disillusioned by how shit, once again, capitalism is to even the most magical technology.

Because of this, I predict that what Suno unveils as their "AI DAW" will be a pretence, crappy, investor-aimed, sloppy, shoddy attempt at this vision. Overtrained on pop music, limited to 4 stems, the most control you get being an EQ and volume knob, some generative MIDI conditioning, and the cheapest OpenAI model generating lyrics that are always about "weaving tapestries of neon lights unity of people.". So incredibily limited it serves no use to professionals, so utterly dumbed down to the most average consoomer imaginable that it actively subtracts from your intelligence every time you use it, and so, so fucking investment and profit-driven.

And knowing Suno's userbase, both investors and users will eat it up, then go on social media pretending they're suddenly not just musicians, but sound engineers and producers and music industry revolutionaries, because they clicked generate five times.

The unicorn company problem

Now that I'm slightly older, I realize, companies that could execute on the collaborative vision I described are unicorns. You'd basically need a collection of people like early OpenAI, dead set on executing a vague vision that somehow works out. Top-tier researchers, engineers, musicians and artists (real ones, not grifters), and a long-term vision that isn't just "make money off this one cool thing we made and then stagnate."

I consider current companies like Suno and Udio to be kin to ChatGPT wrappers around 500 lines of Python with pretty (Claude-generated) UIs. The researchers did one admittedly insanely cool thing, then immediately monetized and stagnated - which is more sensible in a capitalist world, but so incredibly frustrating from a technology perspective, to the point that I simply lost all my love for technology in general.

The solution is to keep innovating constantly to stay ahead - it worked for Apple in the 2000s and 2010s and probably some others. Be the company that genuinely makes great products, let people clone your stuff knowing they'll never keep up, and continue releasing new things everyone wants, even if they're not perfect - being better than everyone else is good enough.

But not everyone is Apple. OpenAI started small and non-profit, but they too eventually monetized their one unicorn product and fell into iterative improvement stagnation. The "starter advantage" eventually ends - look how Google, Anthropic and even open models are eating OpenAI's lunch now, probably at least partially by distilling from GPT-4 to improve their own models.

Pop goes the bubble

I think the AI investment bubble is partly to blame here. I've said this in the past in private chats, I believe there is too much money in the world, or at least too much valuation. If there wasn't so much money involved, maybe small companies would be more likely to innovate and create new tech, while bigger companies would copy and improve in a self-sustaining cycle. Instead, all AI firms are settling in and enshittifying, which is exactly what causes a bubble to appear in the first place - remember the dotcom one? Lisp machines? Web 3.0? Crypto? NFTs? All the same pattern - massive hype, massive investment, massive overvaluation, stagnation, then a crash. AI differs slightly in that it is genuinely useful sometimes, accessible to anyone even for free, but the same pattern is emerging.

I believe OpenAI raised expectations so high they effectively prohibited themselves from experimenting. They launch tech demos like Sora as full products with borderline false marketing, then when people realize it's genuinely useless, they either forget about it (hello DALL-E!) or just iterate slightly and re-release it as "improved."

The playbook for AI products now is: "Generate anything! Expand your creativity! Simplify your life!" followed by fine print that you get almost no control, it outputs garbage most of the time, it's useless to most people, they cherry-picked 10 examples to show you, have fun, subscribe at $199.99/month, no refunds.

Literal research projects with unmaintainable code are more fun to play with and have a higher chance of being useful than these commercial "solutions" to problems that don't exist.

Why I'm full of shit

The math changes fast in this field. A year ago nobody knew you could extract model weights from ChatGPT by abusing its parameters. Who knows what eldritch research technique will appear tomorrow to flip everything on its head? Maybe visualizing seed correlations to find ones that produce copyrighted material, or something we can't even imagine yet.

At the current rate of progress, we really can't predict anything. We don't even completely understand the smallest language models - they work, but sometimes in completely unexpected ways because of how immensely complex the math becomes. They're essentially magic black boxes we can use to perform magic, but can't really comprehend.

The rabbit hole is so deep that it would take a decade to cover what happened in AI audio alone, in just the last year, and ten decades to understand it all.

Maybe a benevolent AGI will crack all of this, if the hype is to be believed. Or maybe that's all wishful thinking. What I can say with certainty is that the current approach of typing in meaningless prompt words won't get us very far culturally or technologically. It practically encourages generic results, complacency, and stagnation.

But then again, some people might be into those kinds of results. Humans are weird, and I fucking hate AI bros.


This post is based on a conversation I had on Discord, edited for clarity and flow. I used an LLM to help with grammar and structure, because I suck, but all opinions are my own.