« Back to article list

Creative AI and Baldur's Gate Modding

Table of Contents

Hey all! It's been awhile since I last blogged anything (these tend to be a one or two a year…) - they also swap between "technical how-to", "opinionated programming conventions", and "stream of thought"/"general rambling".

No doubt, this one will be some blend of all of the above, as I want to focus on the "week in review" of a new fan mod I'm making for Baldur's Gate Enhanced Edition (well, to those in-the-know - more targetted towards Enhanced Edition Trilogy).

The general idea section and what led up to this point can be read by anyone, if I get into "weidu" or "modding" terms - that'll likely be meaningless to readers outside of that community (but it never hurts to learn a thing or two - come join https://www.gibberlings3.net/ and see what it's all about).

I'll probably have a few posts worth of information on here - if you don't care about my thoughts and musings, use the TOC to skip to the technical section that interests you.

Start of the week (technically end of last week - Saturday Dec 3)

I was happily working through Advent of Code 2022 (https://adventofcode.com/2022) trying out a language I wanted to learn https://nim-lang.org/. AOC is a yearly event, similar to an advent calendar, where participants can flex some mental muscle to solve problems.

On day 3, there was an all time record breaking submission - the first problem had been solved in 10 seconds - come to find out, it was by the GPT-3 (https://chat.openai.com/ if you wanna try it out). I think this may have been that Thursday or Friday before this story takes place.

This was a bit dismaying to me, and the FUD began to creep in (would the world need programmers if an AI can do it? Are we venturing towards Star Trek's "holodeck programming" now?). It was a bit ironic however, as just a few weeks before, I was trying out DALL-E for some automatic image generation (portraits for Baldur's Gate - which I'll refer to as "BG" henceforth). Sidenote - they weren't too bad quality wise and I was able to make over 100 of them before running out of DALL-E credits.

Anyways, by Saturday morning, I changed my view point on this, and put on a "What can AI do for me (in a creative medium)" hat.

Flashback and history

Another quick flashback - I've played BG for about as long as I can remember (it came out late 90s), with large periods of inactivity, until I shelled out some cash on Steam for the GNU/Linux native (Enhanced Edition) versions in 2021 after seeing a news post about them still making updates to the game (the EE had been out for some time already at this point, nonetheless I was quite impressed the game, a previous favorite, was still being worked on "officially"). Ever since BG2 days (early 2000s) there have been strong modding communities around the game, including a full language/patch application system being built and accepted/extended by the community (https://weidu.org/~thebigg/README-WeiDU.html).

Chances are, if you've played any BG mods in the last 2 decades, they've leveraged this framework. So, as I was saying… I played through the BG:EE as a replay through BG1, the new Siege of Dragonspear (SOD) expansion, and BG2.

After the "vanilla" play through, I went to seek out mods. I had tried them in the past on the original game under Windows, but had mixed results with GNU/Linux + wine (some mods would really lag out/break the game).

Then, towards end of 2021, I began to dabble with my own modding. The reverse engineering of the game is, frankly, amazing - with everything mapped out down to the byte (https://gibberlings3.github.io/iesdp/).

I had a problem - my "bag of holding" was full of hundreds of items, and the game didn't offer any type of sorting mechanism. I wrote a javascript program to read through the save game file bytes, organize by item type, then alphabetize - it didn't use any of the WEIDU system, but it solved my problem.

A few months later, after more lurking of the Gibberlings 3 forums (to be referred to as "g3") I was on a "mod craze", to try to find and include as many mods as I could in a playthrough. Now, some of these are quite old (20+ years) and you can tell they were developed in a different time, both from the way they approach things, and perhaps the code quality and game balance as well. I would say I installed 200 distinct fan made mods easily. This really expands the game world (although from a game balance POV - things aren't quite balanced, but they're still quite fun).

In using some of these, I found various snags (broken interactions between mods, outdated code calls etc.) and began to poke around the "weidu" files (there are execution scripts, .tp2/.tpa files, and dialogue files, known as .d files) to apply fixes/workarounds, so that I could continue playing (shout out to the g3 and Artisan's Kitpack discord server participants - great and helpful communities).

This led me into writing some of my own weidu tweaks, and using the proper weidu conventions. Doing so ensures a few things:

Back to the future, and leveraging AI

Text AI (GPT-3) and the Plot/Ideas

So, now that you know all about my modding history and relationship with BG, we can hop back to this last week.

Motivated by my now seedling idea of "What if an AI designed a mod", I began by having some brainstorming chats (triggered by my idea of "what can AI do for me", as well as just wanting to check it out) with text-davinci-003 under the openAi playground (I noticed the chat.openai.com was too restricting/self-censoring after gaining some popularity). This helped me to build up 5 characters for the mod idea, a central story line, and even some statistics for in game items (AD&D 2nd edition was the intent - I wasn't specific enough and they came out a little more like 5th edition, which I'm unfamiliar with).

You can see the thread where I shared this intent with the g3 community here: https://www.gibberlings3.net/forums/topic/36224-the-first-ai-designed-mod-the-heart-of-baldurs-gate/ and I'll share a brief excerpt of the conversation inline (the g3 link has a link to the sprunge containing the full data dump).

Human: I want you to design a new mod for gibberlings3, can you come up with an outline for a new Baldur's Gate mod?

AI: Absolutely! Here's a rough outline for a new Baldur's Gate mod:
1. Establish a detailed story with interesting characters and
   interesting locations.
2. Develop a complete set of new quests and objectives specifically designed for the mod.
3. Create a believable environment, with randomly generated weather and realistic NPCs.
4. Introduce new items, armors, weapons, and spells to make the game more balanced and realistic.
5. Design an original user interface and menu system to help make the game easier to use.
6. Implement a balanced and enjoyable difficulty level that is suitable for all types of players.
7. Test the mod thoroughly and provide comprehensive bug fixes before its release.


Human: Can you give me the storyline of the quest?

AI: Sure! The
storyline of the quest would revolve around a powerful magical
artifact, hidden away in the deep dungeons of Baldur's Gate. The
player's task is to locate and recover the artifact, which will
require them to overcome various obstacles and enemies along the
way. On the journey, they may be joined by allies who can help them
battle particularly difficult foes, or provide crucial knowledge and
wisdom to help them make informed decisions. Ultimately, the player
will be responsible for protecting the artifact from those who seek to
misuse its power and bring peace to Baldur's Gate.


Human: What should I call the artifact?

AI: The artifact could be
called "The Heart of Baldur's Gate".

Human: Tell me about one of the friends, what would their name be, and
what is their biography?

AI: Your friend's name could be Athar, a
curious and helpful warrior of Baldur's Gate who was born into a noble
family. He is knowledgeable and wise, with a great affinity for the
ancient ways of magic and combat. He joined your quest to obtain the
Heart of Baldur's Gate, driven by a sense of adventure and a desire to
protect his home from those who seek to use its power for
ill. Together, you will overcome the dangers of your journey and
ensure the safety of Baldur's Gate.


Human: What class and alignment would Athar be?

AI: Athar would be a
Paladin of Lawful Good alignment. He would be skilled in the ways of
melee combat and have knowledge of ancient magical arts.


Human: Describe the other friend that could come along, what is their
name and biography?

AI: The other friend could be Anari, a wild elf
from the woods of Baldur's Gate. She is a master archer and swordsman,
as well as a powerful spellcaster. Anari is brave and fiercely loyal
to her friends and allies, and will do whatever it takes to protect
them from harm. She too has joined your quest to discover the Heart of
Baldur's Gate, and together you will face the many obstacles and
challenges that await.

Human: Are there any other heroes who wish to join the protagonist on
this adventure in addition to Anari and Athar?

AI: Yes, the
protagonist could be joined by a few additional allies on their
journey. These may include a dwarf berserker, a tiefling rogue, and a
mysterious cleric. Each of these characters would bring their own
unique skills to the team, helping them to overcome challenges and
ultimately succeed in their quest.


Human: Tell me more about the dwarf berserker, what is their history
and name?

AI: The dwarf berserker is named Olrun. He is a fierce
warrior, deeply devoted to the gods of his people, and driven by a
need to prove his strength in battle. Olrun has a penchant for
charging straight into danger, often without much thought or regard
for his own safety. However, his courage and bravery make him an
invaluable ally in difficult situations.


Human: Tell me about the tiefling rogue, what is their name and
history?

AI: The tiefling rogue is named Zariel. She is a cunning and
skilled thief, who grew up on the streets of Baldur's Gate. She is
incredibly agile and stealthy, able to slip through almost any
situation undetected. Zariel is driven by a need to prove her worth,
and is willing to do whatever it takes to get the job done.


Human: What about the mysterious cleric? What is their name?  Why are
they mysterious?

AI: The mysterious cleric is known only as The
Voice. Little is known about them, except that they have a great and
powerful connection to the gods. They are enigmatic and wise, offering
guidance and advice to those who seek it. It is said that The Voice
can be found in the darkest corners of Baldur's Gate, where they
protect the city from its enemies.


Human: Give me some sample quotes of what Anari might say to the protagonist
AI:

"We mustn't give up now—we're so close to achieving our goal!"
"Let's work together, and I'm sure we can make it through this challenge!"
"No matter what obstacles stand in our way, we will prevail!"
"We must stay alert and remain vigilant against those who would seek to use the Heart for ill."
"Let us bravely march forward and protect Baldur's Gate!"

This really got my creative juices going. All of a sudden, I began to understand the attachment some mod authors have to their characters (previously I couldn't empathize, thinking of their creations as nothing more than my standard FOSS utility functions/libraries I would write) despite the initial prompts and concepts being produced by the AI (note: I used this as a kick off point - I am taking them in a guided direction from this, but with much of my own interpretation now).

Now I needed to visualize these friends.

Dall-E / MidJourney (AI generative art)

So, I have these characters that are now becoming near and dear to me, yet I need to actually see them. For this, I tried my refreshed 15 dall-e credits - it wasn't terrible, but it didn't get me very far. They ended up looking quite similar to the generations I did a month ago (https://ahungry.com/img/ahungry-portraits/) - an Icewind Dale 2 flavor, but not really what I was looking for - I needed something better, and I was out of Dall-E credits.

I had heard of MidJourney in passing, and decided to give it a go. Turned out signing up was quite easy, and the main interface is via a Discord bot, where you type "/imagine <whatever prompt info here>".

After a couple hours, I had the initial designs, but they were a bit too realistic (BG art is not this realistic):

It was soon after this that I had exhausted my free MidJourney credits (had I been focusing on one character, I could have got it done in the free credits, but I was focusing on 5 new characters and 3 villains).

I took a look at their plans and saw for $30/month, I could get 15 hours of image generation (the free minutes were like half an hour or so). Since the tech is exciting, and this is still much cheaper than commissioned artwork (and less time intensive than drawing it on my own) I went ahead and subscribed, and began to resume my efforts.

With some more prompt tuning, I had what I felt was an acceptable result to work with:

The end result of all the characters:

Anari Athar Olrun The Voice Zariel

Having the character's personalities, as well as their visual portrayal worked out, my excitement to glue it all together as a mod was blooming.

However, one thing that often sets a mod apart (or any media, game, video etc.) is having a high quality (or at least, acceptable quality) audio score.

Replica Studios TTS (Text to Speech) - AI based voice actor work

How would I fulfill my needs for audio? Each character needed around 50 voiced lines for the basic in game interaction (clicking on them, commanding them to act etc.), and my own skill at voice acting (even with ffmpeg type filters/modification) is far from adequate - especially considering the roster has such a range of voices required, and I didn't want it to verbally assault the mod user's ears.

This was one where I hadn't heard much in the online space - I knew of AWS/Google/IBM offerings for TTS systems, but they are more integration oriented, for real time support/TTS (calling a customer service line, getting a robot that may fool some less than astute callers into believing it's a real person etc.).

I couldn't even find a basic trial or pricing structure to just run some test lines.

Searching a bit, I found one under a moniker that rhymed with "turf ai", and I'm still not sure if they're legitimate or not, but before trusting something, it's always good to see if "real" people are discussing it, by running a search similar to:

site:reddit.com your phrase here

to get results (via duckduckgo or google) that are limited to the domain following "site:" (in this case reddit).

Unfortunately for "turf"'s potential business, all the results were SEO spam under less than reputable looking subreddits, so I went looking further and eventually came across a reddit post mentioning "Replica Studios".

I found their webite (to my chagrin, they have a notice that their web UI is being deprecated in favor for Windows/Mac binaries - nothing mentioning future GNU/Linux support) and began testing some voices. It turns out they were much better than I had hoped, and I was able to produce all the soundsets (300+ audio clips of ~1 to 10 seconds in length) in the free 30m trial (I'm still trying to get a response from their Discord, if they plan to support GNU/Linux, or leave the web UI up, as I don't want to pay them $20/month and end up being stuck with an unusable tool, if they were to decomm the web UI).

In fact, we can compare my original voice over with one produced by the tool, and it'll be easy to tell this was a worthwhile endeavor.

My voice over (pitch shifted deeper with ffmpeg) for Olrun:

The AI voice of Olrun:

My voice over (pitch shifted higher with ffmpeg) for Anari:

The AI voice of Anari:

and a bonus to show the range (and that she's one of my favorite characters being developed)

The AI voice for Zariel:

So, not only is the voice acting work much simpler (copy/paste lines of dialogue, vs me trying to record it and stress my throat attempting to do a range, and then apply tweaks via audio software).

It's also much cheaper - from talking with the community, some other voice acted mods spent around $100 to get one character voiced - to shell out close to a thousand of my own dollars for a non-commercial community mod that is given out freely is just not an option - but paying $20/month for AI VA is certainly in the realm of "fun cash"/"expendable money" (I ended up using 18 of 30 free minutes - Replica Studios offers a monthly plan for 2 hours of credits for $20).

Background Music (BGM) and ambient sounds

So, I had almost all the resources necessary to really start glueing things together, yet the best mods I've seen also include a "theme song" of sorts for dialogue that is unvoiced, while the player reads the text.

Initially, I pulled out some old audio files I created using RoseGarden (a software that allows music generation similar to midi files of old, by placing notes and assigning instruments to "play" them in those notes). I had some unique tunes, however I've never had a musical background, and can't quite read sheet music (hmm, perhaps a new goal to focus on?), so I am certainly no composer.

However, trying to find an AI medium to create dynamic music (I know some exist that can produce pretty trippy AI generated videos, with scenes that transform before the eyes) was beyond my interest - I just wanted some CC-BY-0 / Royalty Free songs, and luckily I came across some on SoundCloud. This built out all the necessary resources I required (I did have to tweak a few with ffmpeg filters, as a lot are very recognizable as commonly used YouTube BGM).

Time to start hacking (a mini-tutorial about a Weidu Tutorial)

Getting started with the modding

After all the components were ready, I got to work putting them together, yet I had a few research tasks to perform - I had no idea how to build a weidu dialogue file from scratch (let alone 5 of them).

I began digging around, and found a nice beginner guide on the SHS Forums (some BG forums that were most popular circa 2010). You can find that here (http://www.shsforums.net/topic/7219-weidu-how-to-create-a-simple-npc/).

This got me to the point of basic understanding, however I hit some snags in following the instruction, as I was trying to develop the mod against (here comes parts only for the BG modding community to understand) an existing EET install. The "script" referred to as the "character DeathVar" kept being overwritten by MULTIG, a generic multi-player dialogue file, instead of my custom written one.

Some g3 discord members tried to help out with advice (which led me to a wonderful interactive mod, with the intent of teaching others how to build a mod - the Branwen mod: https://github.com/Pocket-Plane-Group/Branwen_for_BGII), but the problem persisted.

The first issue, EET with pdialog.2da and interdia.2da

It turns out that the base game looks into override/pdialog.2da (sort of like a tab separated values file, but any whitespace will work) as a lookup table - it'll find the character entry, in this case "Zariel", with the game asset ID of "uxzar", and from that row, read the dialog file value. BUT the big gotcha here is, in an EET install, there are a multitude of other files used in this process (all in override, bddialog.2da, 25dialog.2da, and bgdialog.2da). The EET tool has a sub-install called "EET_End", that's supposed to run at the very end of your mod setup - it'll propagate the old style entries in pdialog.2da into the new tables (this problem also occurs with in game banters run off the banter engine - in addition to interdia.2da, you need to populate bgbanter.2da, 25banter.2da, and bdbanter.2da).

Now, the astute reader may just ask, "Ok, so why don't you run EET_End after installing your mod?" - to which I would retort, "My mod installs in about 10 seconds, EET_End takes 5 to 10 minutes, and when I'm learning a new programming language or technology, having the time between cause (changing the code) and effect (seeing the code in action) needs to be as short as possible, and 10 seconds is much preferable to 10 minutes".

To automate the solution to this problem, I took my first major divergence from how the Branwen tutorial suggests handling code - I broke off some common segments into a "util.tpa" file, so I could write my own custom functions, including one that populates these tables conditionally based on the environment, and does so for any given character/resource based on parameters (vs duplicating the snippet 5x - not that the Branwen mod did that, it was only for one character, and it also has conditional application, but only between BG2 and BG2+TOB expansion, with a final block for handling some EET things).

It's Cobbling Time

Next up was hammering out "brute force" style the following core concepts I wanted in my mod:

  • Player Initiated Dialogues (pids)
  • NPC to NPC Banters
  • NPC to Player Banters
  • Custom items
  • Custom characters (binding the audio, portraits, game stats, class etc.)
  • Some rudimentary custom scripting (Zariel falls asleep, the player can wake her up - I haven't really noticed an event like this in the core game and it reminded me of waking up the warcraft peon)
  • Using "best practices" and trying to "traify" (writing weidu dialogue files with string references removed in favor of general ids, so translators can produce alternate "tras" (translations) in the future)

I noticed quite soon in the week that for each minor thing I was doing from above, I was having to propagate it via copy/paste 4 additional times (one for each npc). There are not a lot of meta-programming facilities, and while weidu is very featureful, some things are often done via copy/paste programming, which I'm not a fan of).

Having those things in place, I took a little break to try to make a video of the progress to share with the world mid-week (note - the audio quality is very poor, that was due to my recording setup - I had pulseaudio off, so no sound multiplexing, and the video recorded over ALSA using my computer laptop's default mic - causing an echo and some audible key presses).

Still, the reception was pretty good, and the product was looking quite nice!

Now, I needed to take a step back, and analyze the groundwork - if I plan to extend this to have hundreds of lines of dialogue and a compelling story/character building, I can't spend most my time deciphering obscure code and confusing file layouts.

Coming full circle

So, here I am, a fun video, some community excitement, and what I would consider a messy/boilerplatey code base.

BG modders and general programmers alike may have an opinion on this - this is what the code was looking like:

In file dialogues/ux_greet_olrun.d

BEGIN uxolr

// BEGIN Prelude segment
IF ~!Global("ux_prelude_done", "GLOBAL", 1)~ THEN BEGIN prelude
  SAY @110001
  ++ @980002 + prelude2
  ++ @980003 + prelude3
END

IF ~~ THEN BEGIN prelude2
  SAY @110004
  IF ~~ THEN GOTO prelude4
END

IF ~~ THEN BEGIN prelude3
  SAY @110005
  IF ~~ THEN GOTO prelude4
END

// Stub for INTERJECT_COPY_TRANS to be inserted
IF ~~ THEN BEGIN prelude4
  SAY @110060
  IF ~~ THEN EXIT
END
// END Prelude segment

In file lang/english/game.tra:

// Prelude segment
// Olrun
@110001 = ~Noooo, we were too late!~ [uxolr52]
  @980002 = ~Did...did you see that?~
    @110004 = "Only the aftermath - we just arrived in the wee hours of the morn." [uxolr53]
  @980003 = ~...~
    @110005 = "Cat got yer tongue?  Well, I don't blame ya. We just arrived in the wee hours of the morn, just in time to see the aftermath." [uxolr54]
// Olrun
@110060 = ~Going down in the glory of battle!  I'd have it no other way.~ [uxolr16]

Note:

The "++" is a shorthand for "REPLY", and the "+" is a shorthand for "GOTO".

While these facilities can reduce verbosity a bit, being able to read the dialogue tree as a cohesive script is very challenging! (note: some community members recommended just not traifying until the end, or not at all, for dialogue lines). That was one option, but an observation I had was that dialogue is like a choose your own adventure book, which is actually quite like a binary tree.

Being a big fan of lisps, I immediately began to envision a DSL (domain specific language) code generator that could hold the "source" of this dialogue in a tree structure, and produce the output, without me having to manually map "@0001234" type tra references by hand.

One more night of hacking, and I had the implementation working, so that I can now write my dialogue tree like this:

(var
 prelude-tree
 (say
  {:cond [(ng "ux_prelude_done")]}
  "Noooo, we were too late! [uxolr52]"
  (rep "Did...did you see that?"
       (say "Only the aftermath - we just arrived in the wee hours of the morn. [uxolr53]"
            (olrun-glory)))
  (rep "..."
       (say "Cat got yer tongue?  Well, I don't blame ya.  We just arrived in the wee hours of the morn, just in time to see the aftermath. [uxolr54]"
            (olrun-glory)))))

and it will produce the traified output shown in the previous snippets (including automatic numbering of strings to tra entries, and placing them in code).

Overall, much less code, more readable, and it should completely remove one type of logic error that crops up in these mods sometimes (or any code using GOTOs) - the potential for "dialogue loops" (where the chain of responses ends up going back in itself).

Using a tree approach, represented in lisp code, ensured that each ending node (a terminal) would never point elsewhere, and all dialogue paths are linear/unique (even identical text just gets repeated in my DSL, and created as a new node - to knowledgeable modders, they likely realize this could present a challenge if another mod tried to inject changes into the dialogue paths, as one common convention is to patch/inject via an ID of dialogue reference, or if it could be unknown/dynamic, as is often the case with non-core game dialogue, it could end up being impossible to find with the STATE_WHICH_SAYS weidu function that aims to ease scenario).

Bonus/closing ramblings

For the modders/potential modders

One neat thing I found is that the .acm files in the music/ directory do not need to be converted into true InterplayACM files (of which there are almost no converters online - ffmpeg/mpv support decoding, but not encoding) other than a very old SND2ACM.exe file from some Russian Fallout forums/message boards. The software didn't quite work - it changes the sample rate, so requires a lot of finnagling, and most audio produced by it would crash due to decoding errors when trying to replay it via ffmpeg after conversion to ACM. On a hunch (having noticed/been told that .ogg files can work if renamed to .wav files, for other BG:EE audio) I thought, "Why not see if this works for acm as well?" - well, it did!

So, now it's quite easy to add new background songs - I have about 5 to 6 new ones being added with the mod, which I think is pretty unique (this is the audio you can call with PlaySong(123), and will run in tandem with any audio that is triggered via bracket calls in the tra files). I'm not sure if I've seen any mods in the past using it at all, likely due to the proprietary acm format, and the lack of mods that would need it.

For the general programming language fans

If you notice the snippet above, I chose Janet, a small/minimal Clojure-like lisp - I wanted to flex my Nim chops, but trying to learn a new language, and get a task done, while a good learning opportunity, can be a bit "too much" all at once. I did get a quick and dirty prototype working with it, but lisp lends itself SO well to structuring code as data, while I was constantly refactoring my Nim code, and updating many different type definitions.

I really like Nim and would recommend it for apps that you want to have a small footprint, in particular ones that you plan to collaborate on others with, but to just whip out a working prototype, it's had to compete with lisps and their REPL features.

The Nim DSL looked like this:

next2(
  Say(
    cond: @[global("ux_prelude_done", 1), not_global("ux_another_check", 3)],
    text: "Hello",
    replies: @[
      Reply(
        text: "Can you help me?",
        next: Say(
          text: "Thank you!",
          replies: @[
            Reply(text: "I was am glad to help!", next: Say(text: "Goodbye!"))
          ]
      )),
      Reply(
        text: "I don't need your help",
        next: Say(text: "Well, thanks anyways...")),
    ]
  )
)

Which is a bit wordier than the aforementioned lisp.