tl;dr Yes. Overall quality is meh. I re-used pre-existing code
warning: I also use AI in its trendy meaning which is whatever ML magic box that we know nothing about
That’s the question I asked myself when I stumbled upon yet another AI product, digitalizing voices. AI is the big topic right now (and to be frank, I am bandwagoning), giving birth to a lot of assumptions, dreams, and claims. One of them is that AI will help the workflow of many small teams or solo game devs. Games are very gluttons of assets, that they require to be appealing, unique, and rememberable.
I then challenged myself to have a small visual novel that I could do in an afternoon with script, music, arts, and voices generated with AI. I might be cheating, by choosing a kinetic visual novel as a “game” and reusing my previous work but let’s say I am just picking my battles.
Quite interestingly, I was using all those tools for the first time (consequences of being slightly Luddite).
I used chatGPT for this and it doesn’t need to be presented anymore. I used 3 prompts, one that is asking it a small story, then to change the end, then to write the story as if it was theater blocking (My engine uses this kind of writing). It managed quite well, but I lost a lot of information in the process, making it very very dry.
In only 3 prompts, ChatGPT gave me something I could use.
I used Stable Diffusion for the first time. Setting it up and having it run on my computer was quite easy. I drew a simple sketch and … Horror I could not get a single acceptable image. With a bit of tweaking and a lot of time spent (including, downloading a new model that makes more acceptable faces), I managed to have something I deem acceptable for the target.
After some Eldritch abominations, I finally got something usable.
For the princess, I had fewer difficulties. Probably it’s a more popular topic and I was not constraining the output of my drawing.
Generating a Background was much easier, but again, it stumbled when I tried to request a desert “mordor-like” with cracked earth and a gloomy atmosphere. Again, I can only guess why I could not obtain the output I wanted. This kind of idea must be less represented in the corpus.
I went to try a couple of AI products. Some had very restricting licenses which could result in me not being able to show them, and some you could not try without putting the hand in your pocket. The notion that you can license pseudo-random-generated music scared me a bit. I settled with Aiva, which provide 3 downloads per day in their free tier (really enough for my needs). I could generate music based on a particular mood and edit it. The music generated was by no means good, but it’s base. Their product comes with tools to edit the individual tracks hinting that you are not supposed to leave it as is. I have absolutely zero skills with music creation so I left it as it is.
I used ElevenLabs[https://beta.elevenlabs.io/] for the voices, with two “stock” voices. The process was really fast and easy. However, ElevenLabs voices work better with long text, but all the chatGPT-generated sentences were short. The result is robotic voices in dead tones. I could probably have tricked the tool, by making it says the sentence I wanted, inside a text of a mood I wanted. But the clock was ticking and I wanted to move forward.
The result is below, text has been slightly altered to work with the engine.
Overall I am not happy with the results. Everything looks slightly great but slightly off-tone. It was very hard to obtain what my mind envisioned, and I had to settle for suboptimal each time. You can download the game for windows over here.
The whole process was not fruitless because it made me reflect and learn what AI could do for me. As it is, it can only help you if you have some skills related to what you want to generate. I can edit the text or even skew the drawings a bit with my own sketches, but I can’t have music like this is something I want to share with the world. But AI products seem to be aware and already offer you tools and workflows that go with it.
It’s also limited on what it’s trained, meaning that if you seek something that is not conventional, you can pass (unless you’re training your model on your niche
kink use, but then it might be overfitted to that). What is popular seems to skew models a lot, which will also drag down your result (as an example, anime models are generating faces inspired by popular mobile games of the moment).
Another downside is that AI products seem to require a subscription to a service or beefy computers. This is not affordable for everyone and if energy costs continue to be an issue (and they will) more and more people will get priced out of AI.
AI is not going to steal your job, whether you’re a programmer, an artist, or even a lawyer. It’s not able to. You were not a good programmer because you could copy-paste Stack Overflow really fast, and streamlining this with AI won’t either. Except if you’re interviewing, nobody asks you to blurt out CRUD Apps in a few hours.
Sadly, the ability of AI does not matter. The discourse that AI will steal jobs is already there. This alone will lead to job destruction and people selling their work for less.
Robots won’t destroy the world today, but quite frankly, I am not so sure about the companies and individuals pushing them.