Wednesday 17 June 2015

P0 Snake 64Kb, grab your copy today!

Just a very brief post to say that P0 Snake is finally out for sale, so grab your copy NOW:

http://rgcd.bigcartel.com/product/p0-snake-commodore-64




This is the real deal, guys: You get a nice big box with stickers, a poster, a booklet, a password table and, of course, your P0 Snake cartridge. Just hook your Commodore 64 to your telly, plug the cartridge in and play this game the way it's meant to be played! 


Enjoy

Friday 10 April 2015

The Making of P0 Snake - Part 3: Audio Compression For Poor CPUs

Introduction

Compression of digital sound has been a subject of study for some 50 years now, during which researchers have produced a wide number of audio codecs, that is systems to compress (or encode) sound and to decompress (or decode) it to something that can be played. We mentioned mp3 already, which is based on psychoacoustic modelling and was designed for music, and GSM, which is a very simple algorithm to encode speech, but there are many more. What pretty much all of these systems have in common is that the decoder does not return the original waveform, but an approximation of it. They are therefore always referred to as lossy audio codecs. We have seen already that even the introduction of heavy modifications in the original digitized waveform, like changing the frequency to 8000hz or moving from 16 to 4 bit quantization, does indeed affect the quality of the sound, but, especially for speech, it retains its "understandability". There are algorithms like CELP that can encode speech to as low as 800 bytes per second. Unfortunately, we can only dream of using this kind of technology in this context, for two simple reasons: first, any audio decoder would need space just for the code. What good would it be to compress our 9 seconds of audio to 4KB if we need, say, a 20KB decoder to play it? Second, and most importantly, audio decoders are too heavy on the CPU. In fact, for a computer like the Commodore 64, whose CPU only runs at 1mhz and is only capable of 8 bit integer math, even just playing the uncompressed samples is a demanding task!
How demanding? Let’s find out. (Warning: what follows might be a bit boring and very C64-centric, so you might want to take my word for it that playing digitized audio is very demanding and just skip to the next section.)

We mentioned already that the Commodore 64 was not designed to play digitized sound. The way programmers managed to do so was by exploiting a bug in the sound chip, the SID. Basically, this chip allows the programmer to set the volume to one of 16 different levels via a 4 bit register. Incidentally, the action of setting the volume creates a side effect, an audible “click”. The amplitude of this click is proportional to the volume; therefore changing the volume repeatedly at a steady frequency approximates the playing of any waveform. That’s very fortunate! So, to play the 7884Hz 4 bit waveform we have talked about so far, one should just alter the volume register 7884 times per second and feed it with the right sample value. Now, to keep the frequency so steady is the real problem here. Luckily, the C64 comes with a programmable interrupt system that can be driven by a timer. Without going too much into detail, what the Commodore 64 can do for you is to execute a piece of code every time the timer counts down to zero and restart it automatically for you while invoking your piece of code, which we’ll call interrupt handler. Anything that the CPU was doing when your interrupt handler is invoked is in fact interrupted and your interrupt handler routine takes control. This is beautiful, as in theory we can just program this routine to read the next sampled value from memory, put it in the volume register and exit, thus returning control to whatever the CPU was doing when it was interrupted. And that’s exactly what P0 Snake or any other talking game does.
The problem is that whatever the CPU was doing, it was doing so by using its registers, and our sound play routine will use the same registers to do its thing. This means that, upon returning control, our routine must make sure that the registers contain exactly the same values that they contained when the CPU was interrupted. This is achieved by saving all the registers somewhere as the first thing in the interrupt routine and then restoring all of them as the last thing before returning control. This somewhere is usually the system stack. Unfortunately, only one of the registers, the Accumulator, A, can write to the stack. The other registers, X and Y can’t. But they can write to the Accumulator! 
We now have the default skeleton code for a timer driven interrupt service routine, which will look like this:

  pha //push the Accumulator onto system stack. [3]
  txa //transfer X to Accumulator [2]
  pha //push accumulator again, thus saving X value. [3]
  tya //transfer Y to Accumulator [2]
  pha //push the Accumulator once again, thus saving Y value [3]

  //our sample playing routine here, 
  //which can now use the 3 registers
  
  [...]

  pla //pops Accumulator from the stack [4]
  tay //transfer Accumulator to Y [2]
  pla //pops Accumulator [4]
  tax //transfer Accumulator to X [2]
  pla //pops the Accumulator [4]
  rti // return to whatever CPU was doing,
      //All the registers correctly restored [6]


The numbers in the square brackets are the number of cycles that the instruction takes. Remember that the Commodore 64 is clocked at 1Mhz, or 1000000 cycles per second, so let’s see what the overhead of all this saving and restoring registers takes us to:

3+2+3+2+3+4+2+4+2+4+6 = 35

So for each call to the interrupt routine, 35 cycles go to waste without doing anything. Zero, zip, nada! We said that the highest sampling frequency at which we end up calling this routine is 7884 times per second, which means every second 7884 * 35 cycles are wasted. That’s 275940 cycles per second, which is roughly 28% of the CPU horsepower.

To cut a long story short, 28% of the CPU goes to waste just because of the overhead of this saving and restoring the registers, and we haven’t really played anything yet! Therefore, we can't really think of adding anything very complex on top of that if we want to be able to ALSO run the game at the same time. 

Although there are tricks involving self modifying code (yuk!) that will allow you to take that number down, and P0 Snake uses all of them, this number still gives us an idea of how demanding this task is for the CPU, therefore on the fly audio decompression is out of the question here.

Compression

Let’s summarize our findings so far:

  • We squeezed our original source audio into 21KB.
  • We want to land somewhere in the neighborhood of 4KB.
  • We accept to have a short wait before the game runs to have the 4KB of compressed audio decompressed to the 21KB that we are going to use from that moment on.
  • The decoder should be a fairly short program, whose memory footprint should be negligible compared to the 4KB of the compressed audio.
  • It should be really fast! No fancy floating point, no weird math, nothing like that. Possibly LUTs or Dictionaries but little more than that.

Having ruled out lossy audio compression schemas, classic, lossless compression approaches spring to mind. LZ77 is at the base of most of these approaches, such as RAR, ZIP, 7Z, you name them. It’s based on the idea that multiple occurrences of the same subsequence in a sequence can be encoded by using a single copy of such subsequence and references to the same data for the other copies of the subsequence. 
A string like 

ABCDABCDABCDE 

is encoded to something like

ABCD(-4,4)(-8,4)E

The pairs (i,j) between parentheses mean “go back i symbols and copy j symbols from there”.

While finding such repeated copies is a computationally intensive task, such a task is up to the encoder. The decoder, which is what we are most interested in, has a fairly straightforward job to do. Straightforward enough to be implemented on the C64 without too much pain. We are almost there! 
The only problem left is that LZ77, being a lossless compression approach, is not expected to compress much. We are lucky if we’ll see a 30-40% reduction (you can do this experiment yourself: try to record a wav file and zip it). That would take our memory footprint down to 12-15 KB. Much better, but not quite there yet.

But what if we manage to make LZ77 encoder’s life easier? What if we could somehow maximize the number and the size of identical subsequences of samples?

Let’s bring up once again that piece of the  “Oh No” waveform we have mentioned in part 2. 



We said that it almost looks like it was made up of the same segment repeated over and over. Let’s bring up these segments.



That’s very close to what LZ77 needs to do its dirty job at its best, but not quite that, yet. There are, in fact, minimal differences among the blocks. If this waveform was really made exactly of the same segment repeated over and over, LZ77 would only need one of those segments plus a few indices to encode the entire waveform in the picture. 
Which brings us to the following intuition: how bad would it be (sound) to just use one of these blocks and repeat it over and over to form the actual waveform? In this case, the differences are so minimal that we probably wouldn't be able to appreciate them, and LZ77 would encode this piece to roughly one fifth of its original size! So, all we need to do is try to find all the similar blocks, make them identical, and run a normal LZ77 compression pass to have something that will unpack easily and will sound reasonably similar to the unaltered waveform. We don't really need fancy approximate pattern matching algorithms here, given how small the amount of data we are talking about is, and we could just go the brute force route. The algorithm would be something like this:

encode(waveform, threshold)
{
  i=0;
  while (i < length(waveform))
  {
    for (blocksize = 64; blocksize > 4; blocksize--)
    {
      j = find_most_similar_block in range (0,i) 
          of size blocksize;
      if (similarity((j,j+blocksize),
                     (i,i+blocksize)) <  threshold(blocksize))
      {  
         // replace the current block with a copy 
         // of a similar one
         copy ((j,j+blocksize), (i,i+blocksize))
         i+=blocksize;
         break;
      }
    }
    else //no similar block of any size was found
    {
       i++;  // keep the current sample value and go ahead.
    }
  }
  return LZ77encode(waveform);
}

Dealing with blocks smaller than 4 samples doesn’t really make much sense, because the overhead for the indexes that LZ77 needs to replicate the blocks would outsize the length of the block itself. 
The most important part of this algorithm is the similarity function, which needs to be able to compare two blocks of a given size and return a distance between them. The temptation of going straight with the Euclidean distance is strong. And in fact we’ll give in to that, but there’s one last thing to speech encoding that must considered. The dynamic range of speech is not linear (which is a fact that some basic audio compression schemas, like a-law, leverage). Basically, for our concerns this means that an alteration to samples that are “close” to zero will introduce a heavier distortion than the same absolute alteration to samples that are at the boundary of the sample range. To account for this, assuming such a range to be between [-1,1], rather than using the absolute sample value, we’ll go with the logarithm of the amplitude.
Ultimately, given two vectors of samples P and Q, whose values range in [-1,1], the distance between the two vectors is



It’s important to note that this logarithmic decay is taken into account only when computing the distance between two blocks, but the waveforms we end up storing are the original, linear sampled versions.

Final tricks and conclusion

Although there are many more tricks/hand tuning that I had to resort to in order to improve compression and/or reduce the compression artifacts, the encoding algorithm we discussed so far does the largest part of the job, and puts us in a position in which the commodore 64 only has to LZ77-decode 4KB of data to 21KB before the game starts. Now the only thing we need to feed the encoder with is the right threshold, which is dynamically adjusted by the encoder according to the size of the vectors it compares (this part would probably deserve a post on its own).



We can basically try different thresholds until we are satisfied with the total size of the encoded waveform, which is 4KB in our case, and go for it. Of course a higher threshold will cause more blocks to be marked as similar and lead to a better compression, at the cost of a decrease of sound quality.



In fact, I used different thresholds for different sentences in the game, which leads me to the very last trick that I want to tell you about. Listen again to the speech:


You might have noticed that the worst sounding sentence of the bunch is the first one, “Welcome to P0 Snake”. It will surprise you to know that this part is actually the one that has been compressed the least and takes up half of the total space of 4KB alone. It should therefore sound better than the rest. Believe it or not, it actually does. I can further explain with an example: in the silence of the night you can probably hear both your watch ticking and your neighbour talking in another flat. Now, suppose you take your neighbour out for dinner in a crowded restaurant. You can still hear what she is saying, but even if you focus on your watch, you won't be able to hear it ticking. 
This phenomenon is called Auditory Masking, and it’s just one of the many concepts you'll get familiar with, should you decide to dive into the fascinating world of Psychoacoustics. Glitches and artifacts introduced by compression are like the ticking of your watch as your neighbour talks during the night. You can hear them in a quiet environment, but if you bring in more sound to “confuse” the ear, they'll become unnoticeable. That’s what I did with the remaining set of sentences in the game: whenever the game is talking, there’s always music playing along with the speech. It’s a bit like hiding the dust under the carpet, but, well… it works!

This concludes the third and last part about speech encoding. Next time we'll probably be looking at something a bit more geeky, like self modifying code or graphics compression.
Stay tuned!

Monday 16 March 2015

The Making of P0 Snake - Part 2: IT TALKS!

Introduction



One of the features that stood out about P0 Snake, according to the people who played it, is its digitized speech.






Now, mind you, talking games have been around for quite some time, and they were not exactly uncommon in the 80s either. The problem with speech though, is that it takes “a lot” of memory, Therefore, while modern productions can get very chatty (even too much),  usually old games could only afford few seconds of speech, before they filled up the host machine’s memory. But it's probably this scarcity that contributed to making the few words that came out of the talking video games so memorable. Elvin Atombender, the mad scientist in Impossible Mission, would welcome you to his lab with his signature “Another visitor… stay a while… stay forever!” that would send shivers down your spine. Also, with speech being such an "expensive" commodity, programmers had to resort to it only where it really added something to the gameplay.
Mega Apocalypse, another all-time favourite of mine, was so hectic that sometimes players didn't know what they were doing, such was the focus on trying to stay alive. So the game just told them what was going on. “Extra life!”, and the player knew he had hit something good, not a cursed asteroid.
With P0 Snake things were a bit tricky though. The RGCD 16Kb development competition, as the name suggests, only allows 16Kb games. So, how much speech can we fit into 16Kb? Or, even better, how much speech can we squeeze into 16Kb and still have enough space left to code a game? Let’s find out!


Enter “The Dictionary”



“When words are scarce they are seldom spent in vain.”
[William Shakespeare]


P0 Snake's dictionary is made up of  7 sentences:


“Welcome to P0 Snake”
“Get Ready”
“One Up!”
“Teleport!”
“Oh no!”
“Well Done!”
‘Game Over”.


In total, they account for roughly 9 seconds of speech.
It doesn't sound like a big deal, but, trust me, it is. Let’s see why.


A sound is basically a waveform, or, for our purposes, the digital representation of its analog form.  This process of analog-to-digital conversion has two steps: sampling and quantization. A signal is sampled by measuring its amplitude at a particular time; the sampling rate is the number of samples taken per second. In order to accurately measure a wave, it is necessary to have at least two samples in each cycle: one measuring the positive part of the wave and one measuring the negative part.
Of course having more than 2 samples per cycle will increase the accuracy and the quality of the waveform representation, but what’s really important is that you don’t have LESS than 2 samples per cycle, as this would cause the frequency of the wave to be completely missed. If you want to know more about this stuff, just look up the Nyquist-Shannon Sampling Theory and start from there, I’m trying to keep this simple. What all of this is telling us anyway, is that in order to digitize sound properly, we must keep a sample rate that is at least twice the frequency of the signal we are sampling. The 44.1 Khz frequency at which your CD track or your mp3s are sampled can render a signal of up to 20 Khz, which is enough to capture all the frequencies that the human ear can hear. If mp3 was made for dogs, it would need to use a sample rate higher than 120Khz, as dogs can hear frequencies of up to 60Khz. If it was made for cats, it would need to be more than 160Khz! It’s good that we evolved from apes and not from cats, otherwise our ipod could only contain a fourth of the music it contains now!
And there are more good news: human voice only spans a narrower set of frequencies which goes up to approximately 3500Hz. It means that if you want to sample speech you just need a sample rate of at least 7000hz to keep it intelligible. Indeed, most speech encoding systems, including GSM whose acronym probably rings a bell (ahem…) for most of you, only use 8000Hz. That’s the kind of frequency we are really dealing with, and this closes the math for the sampling, the first part of the analog to digital conversion. What about quantization? In general, the more the better, and your typical CD track, to go back to the original example, uses 16 bit, 2 bytes, which means that each sample is a number in the range (-32768, 32767). Let’s assume we are using the same resolution. We have all the numbers now, so let’s see how much space those 9 seconds of speech will eat up.


9 seconds X 8000 samples per second X 2 bytes per sample = 144000 bytes = 140 Kb


Given that we have to fit both speech AND a game into 16 Kb that’s a bit too much. We should target 4 or 5  Kb at most. It’s a long way uphill from here...


The first little help comes, again, from a limitation of the machine. 16 bit samples simply can’t be played on the Commodore 64. In fact, The SID, the Commodore 64 sound chip, was not designed to play sampled sound at all. The way programmers did it was by exploiting a bug in the sound chip. I won’t go into details as to how it works (try this if you dare), but, simply said, although a better quantization can be achieved with weird tricks, the most common way of playing sampled sound on a Commodore 64 only allows for 4 bit resolution. This means that the range we can address is not (-32768,32767) but (-8,7). Since 4 bits are one fourth of 16 bit, our original space consumption goes down to 35Kb from 140kb. That’s a lot better, but still more than twice the space we have for the entire game.


Before we move on, let’s take a look at what these waveforms we have been talking about look like. For example the piece of speech “Oh No!”.





From top to bottom, the original recording, the same one sampled at 8000Hz with 16bit quantization, and finally the same one at 8000Hz with 4 bit quantization.


As you can see, the fact that only 16 different amplitude values are possible with 4 bit quantization shows in the form of the waveform being a bit "blocky" at a glance. This really represents the BEST waveform we can think of playing on the Commodore 64, and, despite the appearance, it doesn’t sound that bad. This is what we really have to start from anyway.


Before we even start to think about compression, let’s see if we can bring memory occupation down from the current 35Kb


Let’s zoom in a bit, and let’s explore “the “oh” part of “oh no!”, going back to the original 16 bit quantization and 44.1Khz sampling,that is CD-quality. This is what 20 milliseconds of “oh” look like











We immediately notice something: the waveform looks very “regular”, as if it was made of the same piece repeated many times, with just minimal differences. We'll take advantage of this aspect when we deal with the compression, but for now another interesting piece of evidence is that there aren't really many “parts” in this waveform, that is it doesn't change very rapidly. We mentioned that the frequency of human voice tops at 3500hz, and that we need double that frequency, that is 7000Hz which we had rounded to 8000Hz, to sample it. But I bet you we need much less than that for this specific segment.
Let’s see what it looks like at 8000hz:











It looks very similar, but we knew this already, as we know that 8Khz will suffice for human voice in general


let’s see what happens at 5000hz











It’s starting to deteriorate a bit, still, all the transitions are there and this means that the “oh”, although “noisy”, will still sound like an “oh”. Let’s try 1000Hz





Bummer! Most of the transitions have disappeared. This waveform doesn't sound like an “oh” anymore. But what we take from this exercise is that we don't always need to use 8000Hz. Some pieces of our speech are happy with a lower sample rate.


Speech fragments can be divided into two categories: Voiced and Unvoiced. Voiced sounds are generated by the vocal cords’ vibration. Among their characteristics is the fact that they are periodic, and their period is called pitch. It’s the case for vowels and specifically the fragment that we have analyzed so far. Unvoiced sounds, on the other hand, are not generated by the vibration of the vocal cords and they don’t have a specific pitch. Furthermore, and most importantly, they come with a higher frequency. It’s the case for fricative sounds like “F” or “S”.




Now this is interesting, and you can see already where this is going: let’s use a higher sample rate for unvoiced segments and a lower sample rate for voiced segments.
We can validate this theory looking at another segment of P0 Snake’s speech. The “ZE” part in “welcome to P ZEro Snake”.

















You can clearly see where the “Z” ends and where the “E” starts already. Now let’s sample this segment at 5000Hz, which worked quite well in the previous example



















Look what just happened: the “Z” has almost completely gone but the “E” is still there.


In conclusion, we really need all of the 8000Hz to sample fricative sounds and preserve their understandability.

Putting everything together


So far we have learned the following facts:


  1. 4 bit amplitude is all we can afford
  2. 8000Hz is a sampling rate that allows any type of speech to be coded.
  3. Fricative sounds require this entire bandwidth.
  4. Voiced sounds will be happy with much less. How much less? It depends on the sound, the voice of the speaker and other factors.


So, in order to minimize the memory occupation of the speech in P0 Snake, we just need to use the minimum sampling frequency for each speech segment that retains the understandability of the segment. This frequency will be higher for unvoiced segments and lower for voiced segments.
Although we could choose ANY frequency for each of the segments we really want to limit this search to a small set of possible frequencies for various reasons, the most important of which will be clear in the next part, but we can say already that we also want to be able to encode this frequency somehow in the bitstream. P0 Snake uses only 4 frequencies, so it requires a 2 bit overhead to indicate the frequency for each speech segment:
S = {3500Hz, 3942Hz, 5256Hz, 7884Hz}


And the algorithm to preprocess the data is now quite easy:


S = [3500Hz, 3942Hz, 5256Hz, 7884Hz]
segments = split(source) //splits the source in segments
    //of equal duration


foreach (segment in segments)
{
f = Frequency_Analysis(segment)
i = 0
while (i < 3 && S[i] < f)
{
i = i + 1
}
sampled_segment = Sample(segment,S[i])
yield return( (i,sampled_segment) )
}


The result of this preprocessor was only good as a starting point for a (painful) manual refinement of each segment: although the frequency analysis lib I used worked very well for me, I found that for some segments I could still move to the lower frequency, keeping a decent sound quality. This pickiness might sound like overkill, but don't forget that we are fighting with bytes here, so every little helps!


In the end, all the speech in P0 Snake averages at roughly 4900hz. Let’s see where this takes us up to:


35Kb * 4900/8000 = ~21Kb


21 Kb. Now... we are talking! It’s still 4 times larger than what we are targeting, and (as we'll see in the next post), we can't use any known audio codec to bring this number down, because the Commodore 64 simply doesn't have enough horsepower to play a sampled sound AS it decompresses it AND to also runs a game. Therefore, we must play uncompressed data. But this number already looks very interesting for one simple reason: The rules of the competition are that the size of the game must not exceed 16 Kb, but of course we can use the entire memory space of the Commodore 64 at runtime, that is 64Kb. If we could come up with a way to compress our speech in 4Kb in a way that can be decompressed (in a reasonable time) to 21 Kb BEFORE the game runs, then that would be it.
We'll see how in the next post, if you wish.

Wednesday 11 March 2015

The Making of P0 Snake - Part 1

Preface


More than 20 years ago my teenage self had a dream. To be more exact, my teenage self shared the same boring dream with most of my friends. That dream was to create a video game for the Commodore 64. It sounds silly today, but that’s what every teenager wanted to do back in the 80s. You wanted to be an astronaut, a football player or a game developer. Not surprisingly, I did not make a career in space travel or any kind of professional sports and, although I went on to publish several games for modern platforms many years later, I didn’t make it my main occupation. Which is not something I regret now. What I really regretted was failing to write my own Commodore 64 game. I’ll spare you the details of what made me try to rectify that situation, although I guess it could simply be summarised by a friend of mine telling me that it is never too late (yes, it’s that simple), but anyway, fast forward a couple of decades and P0 Snake, my entry in the RGCD 16Kb game development competition, came first and will be published later this year. Better late than never, you silly daydreaming teenager!




In this and the next series of posts I’ll try to cover a few aspects of the development process that I think are worth sharing, especially those that may be of general programming interest. In fact, I’ll try not to make these notes inaccessible to someone who has never seen a Commodore 64 in his life, although I might be writing a couple of lines of assembler here and there (which I’ll try to explain in detail). Hopefully I’ll give some answers to those of you who are wondering how someone can even think of implementing something for a 33-year-old machine. And who knows? Maybe I’ll drag some of you into this madness. That would be great!

An appreciation of the limitations of the target platform

It’s no mystery that the Commodore 64 comes with 64KB of RAM. This might sound like a silly amount of silicon to achieve anything sensible nowadays, but it was not back in the 80s. In 1982, when the machine came out, 64KB of RAM was “A LOT” of memory. In fact, it was the best selling point of the machine. This is a very important fact to consider, because we are targeting this specific machine, not a modern PC. An assembler instruction on the Commodore 64 takes 1 to 3 bytes. 64KB could host a good 30,000 “lines” of code if we were to use all of that space for this purpose. Furthermore, old machines in general, and the Commodore 64 in particular, were designed to make video game development (well... certain aspects of it) “easy”… Programming a ball that bounces around the screen, for example, is something that can be achieved very easily, and it really takes few bytes to be implemented. If you want to play a sound effect that resembles an explosion, that’s easy too. This is because the Commodore 64’s strongest asset were its custom chips that allowed programmers to achieve certain effects without much effort. The most famous and one of the most useful aces up the C64’s sleeve are its mighty hardware sprites. Let’s go back to the bouncing ball example. This is what we want to achieve:


Now let’s write pseudocode to achieve this effect on a machine of that era with no Hardware sprites:


x = 0
y = 0
xdir = 1 //going right
ydir = 1 //going down


ball = {(x0,y0), (x1,y1), .. (xn,yn)} //a sequence of pixels making up the shape of the ball.


while(true)
{
foreach (pixel in ball)
{
screen[x + pixel.x, y + pixel.y] =
backbuffer[x + pixel.x, y + pixel.y]
}
x = x + xdir
y = y + ydir
if (x == 0 or x == rightborder)
{
xdir = -xdir //bounce horizontally
}


if (y == 0 or y == bottomborder)
{
ydir = -ydir //bounce vertically
}
foreach (pixel in ball)
{
screen[x+pixel.x, y+pixel.y] = ballcolour
}
somedelay()...
}



The first few lines set the ball position and direction. It will be initially positioned on the top-left corner of the screen. Each iteration of the loop will update the x and y position according to the direction. The problem here is what “update” means:
There are three main steps here:

  1. erase the pixels at position (x,y) + offset for each pixel offset in ball
  2. update the ball position (x,y)
  3. set all the pixels at position (x,y) + offset for each pixel offset in ball


Surprisingly, this is not very different from what a modern PC does. The fact that machines and graphic cards are so fast now, makes this process totally invisible to the user, but those of you who witnessed the birth of the first GUIs, will definitely remember that moving windows around the screen triggered that weird erase/redraw effect. That’s exactly what went on in PCs in the early nineties: the user moving a window would result in all the pixels making up the window at the old position being erased (background is drawn) and then all the pixels being redrawn in the new position. It’s blazing fast nowadays, not so much 20 years ago. And indeed, things would get much more complicated when you had multiple objects that moved on the screen, which is always the case in games, unless you are dealing with something as simple as a bouncing ball.


How would you achieve the same effect on the Commodore 64?
The commodore 64 comes with 8 Hardware sprites. A sprite is a shape that can be defined by the user and positioned anywhere on the screen. What’s great about sprites is that they exist on a different layer than the background. This means that upon moving them, you don’t have to worry about the background. Let’s see what the bouncing ball pseudocode looks like on the C64.


xdir = 1 //going right
ydir = 1 //going down


ball = {(x0,y0), (x1,y1), .. (xn,yn)} //a sequence of pixels making up the shape of the ball.


sprite0.pointer = ball
sprite0.x = 0
sprite0.y = 0
sprite0.visible = true


while(true)
{
sprite0.x = sprite0.x + xdir
sprite0.y = sprite0.y + ydir
if (sprite0.x == 0 or sprite0.x == rightborder)
{
xdir = -xdir //bounce horizontally
}
if (sprite0.y == 0 or sprite0.y == bottomborder)
{
ydir = -ydir //bounce vertically
}
somedelay()
}


And that’s it. We basically set up the sprite number 0 to represent the ball, then in the main loop we just update the position of sprite0 and take care of the bouncing, and good old commodore 64 takes care of everything else.
We spoke about efficiency, which is clearly the most obvious advantage of this approach, but there’s so much more going on under the hood. The erase-redraw approach that other computers are forced to resort to implies that you have to store the background in memory twice. Once for the clean, unaltered background that you need to copy the pixels FROM when you are erasing, and once for the actual displayed version of it, with the ball and all the other objects overlayed. The concept of double buffering is an integral part of game development today (and even the Commodore 64 provides an elegant way to implement it), but you really don’t want to resort to it, unless you really need it, to limit the footprint of your graphics into memory.
In our specific example, besides the memory that goes to waste for something that simple, a lot of cpu power is wasted in erasing and drawing back the same thing over and over. There are a lot of tricks that programmers of other platforms came up with in the 80s to make this process as quick as possible, but the basic approach stays the same: you have to erase and redraw your objects.
This video shows the bouncing ball implemented on a C64. As you can see the background is completely unaffected by the ball movement.




This long preamble was needed to put things into perspective: The 16Kb limitation of the competition is indeed a tight constraint, but things are not so bad after all. There is a good hardware support for some common tasks that allows you to achieve interesting results with limited memory. The pure horsepower of the machine is ridiculous compared to your mobile (let alone your computer) but the sheer amount of “things” that go on on the display is very limited, and even when things get very busy, the number of pixels that the machine needs to shift at a certain moment in time is one order of magnitude smaller than those that your mobile has to move around, say, when you unlock the screen. Yes, you can’t play the latest tv show on your stock Commodore 64 (well, some people will disagree with this, but let’s not make these posts too hardcore, shall we?), but you can move objects on the display and play some sound without having to implement the basics of that.
That’s why I’m not going to cover the aspects of how to squeeze a game in 16 kb on the Commodore 64, because in general that’s not really something to tell home about: there are literally thousands of videogames developed by pioneer coders in the 80s that are smaller than 16kb. Those unsung heroes deserve your respect because THEIR 16Kb creatures laid the foundation of the industry that brought you Halo, Fifa 2015 or whatever you guys are playing today. Nor I’m giving you an introduction to C64 coding, because there are dozen of tutorials on game programming for this machine (you’d be surprised!). Finally, I won’t introduce you to Commodore 64 programming at all, because this video does it better than I could dream of and is well worth your “I’m not slacking, my code is compiling”-time today. Check it out!
What the next posts will cover are those aspects of the development of P0 Snake that challenged and intrigued me the most. The hurdles I had to come to term with and the makeshifts that allowed me to overcome them. It’ll be about how we can conjugate modern tools and knowledge with vintage technology to craft something beautifully old. It’s not going to be a tutorial (I would never be that arrogant), nor a lesson in data compression (again). It’s just the way I did it, which is not necessarily the best way of doing it. But, heck, it worked!
Stay tuned.