Finishing out the semester

It’s the end of the semester and as such it felt appropriate to get one final blog post in. To follow up a bit on the last post, I began reading Artificial Intelligence: A Modern Approach. Specifically, I began reading about uncertainty. As I read I realized that my initial goals have nothing to do with uncertainty. What I am trying to do is compare a recorded sound from the user against a predefined sound or set of sounds. There’s nothing uncertain about the comparison since I know what I’m trying to match the user against. Uncertainty is what I’m going to have when I start trying to figure out what beat patterns the user is making. For now, I have a slightly simpler problem.

Trying to match a user’s input to a sample involves a bit of high level math and a deep knowledge of audio digitization. From speaking with some mathematics experts and computer science professors I’ve learned that I’m definitely going to need to use Fourier transformations on the audio signal. In practice, this is a bit difficult to understand.

From what I remember way back in a sound design class at a previous school, when you create a digital audio recording you’re actually capturing the sound in tiny chunks of digital values known as samples. Generally, a recording rate of 44.1kHz is used, meaning 44,100 samples are created each second. Each of these samples is a certain number of bits with the most common for recording being 16 bits. This sample bit rate how wide a range of values each sample can represent. So for example, in a 4 bit sample, every 1/44,100th of a second would have 1 of 16 different values (4 bits = 16 values). That’s not a lot which is why 16 is generally used since that gives 65,536 values. On top of this, a lot of digital audio, especially music, has multiple channels recorded into a single signal (think stereo – left and right channels). Basically, under the hood of an audio recording we have a very large stream of these little numerical packets we call samples.

The big issue I’ve faced is how exactly to deal with the audio signals and how to process the samples. I was told to use a Fourier transform but I haven’t quite figured out how to do that yet. Also, since a lot of input streams I’m seeing in my code show each sample as 4 bytes I’m not really sure what I’m looking at. Luckily I recently found a library call OpenIMAJ that I’m hoping will do a bit of the heavy lifting for me. Specifically, I’m going to be using it to perform the Fourier Transform for me so that I can look at the sound in the form I need for comparison.

The form of the audio I’m interested in is known as frequency-domain. You see, the Fourier transformation takes the recorded audio, which comes into my program in time-domain form and changes it to frequency-domain form. The time-domain form basically just tells me the volume of the sound at any given time. The frequency-domain, on the other hand, shows me the frequency and amplitude at a single moment in time. In case it’s not obvious, to compare two sounds you’re going to want to know the pitch (frequency) of those sounds so this is pretty important.

OpenIMAJ also offers some really nifty visualizations to help understand the data a bit better by creating a visual representation of the sound. So far I’ve been able to get my program to record audio from the user and display a spectrogram representing that audio. My hunch is that somehow this spectrogram is going to come in handle when comparing sounds. Basically, what the spectrogram does is show frequency, amplitude and time on a single 2-dimensional graph by using a varying color value to represent one of the values.

I’m looking forward to learning more about interpreting digital audio data. I’m hoping I get a chance to look into it over the summer. But first, I think I am going to take a well deserved break.

(You can check out the very simple application I have made in Java that records audio and displays a spectrogram of it on this github page. The other spectrogram is shows is just a sample sound wave for visual comparison.)

What have I gotten myself into?

Like I mentioned in the last post I decided I would start by looking into speech recognition algorithms. After an initial search for some sources, I’ve realized that this topic is pretty darn advanced. I’m not going to let that stop me. I will enumerate here the sources that I am considering.

For speech recognition I found a book, a scholarly article, and two sets of course notes.

I was also able to find a couple of scholarly articles related to something that might be closer to my topic than speech.

Skimming over some of these I found that one of the most common techniques for speech recognition was to use Hidden Markov Models. Since I have no knowledge of HMM I asked some of the professors in the CS department at UNCA and Dr. Kenneth Bogert recommended me a good source.

These all seem like a good place to start. I think specifically I will begin by looking into the HMM algorithms to at least be able to grasp what the other sources are talking about. Soon I will begin creating annotated bibliographies for each of these sources and will like have a new page on this site dedicated to that.

I’ve got a lot of reading ahead of me.

A Change of Topic

Over my Spring vacation I did a lot of reflecting. I began to believe that the topic of my research, creating a multiplayer game network protocol over UDP, was not all that research worthy. It is something I certainly want to create, largely for the challenge it presents, but there’s just nothing novel about it. It is a type of game networking that has been done many times over. After all, my project was essentially just going to be an implementation of the Source networking protocol; I wasn’t going to be discovering anything new on the subject. Worse still, there is a significant shortage of scholarly articles on the subject. I suspect this is partly due to these custom protocols being trade secrets. This poses a bit of a problem considering I need to have scholarly articles for at least some of my sources.

The solution is a change of topic. With a little inspiration from a friend I was able to land on something that is much more worthy of research. That is, music training software. More specifically, it would be for teaching someone to beatbox since that is something I am passionate about. Essentially, the software would play you examples of a certain sound or rhythm and would listen to you reproduce it while judging how accurate you are. As far as I know the closest thing to this is the game Rocksmith which listens to you playing a real guitar in order to train you. Since that Rocksmith’s technology is likely proprietary I’ll have to rely on other sources. The next best thing I know of would be speech recognition. That is where I intend to begin this research.

Check out the official working abstract here.

Look for more updates to come soon.

Getting all set up

After an arduous journey I have finally set up my project using the newest LibGDX version. One significant difference that complicated the normally rather simple LibGDX setup was their change from Maven to Gradle at some point since I last used it. Gradle is essentially Maven’s shiny new child that seems to be all the rage these days. It is something I’ve been wanting to learn for a while and this was the perfect opportunity. I was forced to really dig down into the details of Gradle due to one rather bizarre choice made by the LibGDX authors. They have decided to use a non-default directory structure. This was confusing to me as it didn’t really leave a good place to put unit tests. Switching to the default directory structure was no small matter, as I quickly learned; there were many moving parts to adjust.

My bare bones project was finally building and I quickly set to work. My first order of business this time around was to implement game states, or some way to go from a main menu to the game to an in-game/pause menu. I was pretty confused about how to implement game state in an ECS when I previously worked on this project so the game was just something you ran and were put immediately in the action. Additionally, if you wanted to connect to a server other than localhost you would have to do so from the command line when launching the game. This time I wanted to have a main menu with some option to connect to a server from within the game. Something else I desired out of this game state system was a layered pause menu. That is, when you open the pause menu, I want the player to be able to see what was happening in the background. To that end, my solution ended up being pretty simple but I’ve yet to see how well it works in practice. Basically, each state will have a set of entity systems that it allows to run. For instance, to pause the game, I can simply enable a UI rendering system and disable the entity movement system, effectively pausing the movement in the game and rendering the pause menu UI on top of it.

I’ve ended up salvaging a little bit from the first iteration of this project. So far, just a position component and the texture rendering system. These things worked well enough and weren’t fiercely entangled in the old netcode. I expect I will pull as much as I can from the old project as long as it is straight forward and is or can be decoupled from the old netcode.

My immediate goals now are to get a working main menu with the ability to transition to the main game state. After that, simple ship movement so that I can test how well this layered game state works with a pause menu. The pause menu of course only makes sense in a non-multiplayer game so going forward I’m going to be working under the assumption that I will have an offline mode available.

First post for a new beginning

Hi there,

My webpage has been empty for quite a long while and it’s time to change that. You see, last summer I began pursuing a B.S. in Computer Science at UNCA. This semester, I added on 1 credit hour for undergraduate research. My professor is rather hands off for this research project and overall it feels pretty free form. I figured I should probably have some kind of record of what I’ve been doing so this blog just seemed like the obvious place for that.

For a long while now I have been wanting to design a multiplayer game using an entity component system (ECS) and a client/server setup over UDP. Why UDP? Mostly because it is way harder to design for UDP than for TCP, but also because the game I want to create is an action oriented game. I thought that this concept would be great to explore for my undergrad research and an excellent way to really push my limits.

For some background, I had already started on this project a long while back. I began making the game in Java using the LibGDX game library. The game concept is a 2D, top-down space shooter/sim inspired by the likes of Escape Velocity. I got as far as having a single ship on a starry background and a very basic client/server setup allowing multiple players to connect to each other and see each other flying around – at least until you fly away from the starting point and are never able to find each other again due to there being no level boundary. I started looking at how to implement something like Source Multiplayer Networking and felt very overwhelmed by it all. Other projects came around and it wasn’t long before I moved on to other things.

So far my undergrad research this semester has consisted of studying these Gaffer on Games articles about multiplayer networking while looking over my old source code and hopelessly wondering how in the world to continue it. The old code, while it was a basic UDP client/server implementation, had the issue of not knowing how big the packets I was sending were and that bothers me a lot now. Additionally, I recall that the multiplayer was more or less working for at least 3 players but now it does not work very well at all. In my last test run, we got 2 players connected to the server and had their movement synced for all of about 1 second. I’ve come to the conclusion today that since I’m not very far along I might as well just remake it from scratch.

KryoNet will likely still work as my networking library but I will need to ensure that I am will be able to keep track of how much data I am sending/receiving with it. My choice of ECS will likely still be this reflection based ECS (RECS) that I have contributed to in the past though I plan to look into what new options are out there. The author of RECS has used a lot of very clever tricks in order to obtain a highly performant system; it will be interesting to see if other ECS libraries can compare.

I briefly considered using C++ for my rewrite but I have decided against it for the following reasons. I am not very familiar with C++. While many large game developers tend to use C++ there are a huge number of indie developers who don’t and are still able to produce excellent games. Getting better at game design, even if only in one language, will likely be better than using sub par techniques while stumbling through another language.

Here’s a link to the github repo for the project if you’d like to follow along.