Saturday, November 22, 2014
Why optimizations are bullshit.
I really hate the word "optimization". It gives people the impression that you have to do extra work to get something to run at a decent speed, but that is generally not true. Most SNES games are slow because their basic routines have far more instructions than necessary. Developers usually passed up an obvious easy efficient way for a complicated slow way. More on this in a bit. I'm sleepy.
Tuesday, February 26, 2013
Object Coordinates Trick
I just recently discovered out a trick around an old problem. I used to use to do my object coordinates in 16.16 format. (16-bit intergers, and 16-bit decimal) This method made it easy to do basic moving, such as walking left or right and jumping, but it made more advance movement, such as rotation or chain physics, more complicated to program.
I found if you use 16.8 format (16-bit intergers, and 8-bit decimal) you can calculate movement entirely in 8.8 format, and extend it back to 16.8 format in the end, with these steps:
1) Copy the lower 2 bytes of coordinates, into another memory location.
2) Calculate all movement, using the secondary memory location.
3) Extend it back to 3 bytes, by subtracting it by the low 2 bytes of the 3 byte coordinates, and adding it back as a signed value.
This works because objects are usually within 128 pixels from the previous frame.
I found if you use 16.8 format (16-bit intergers, and 8-bit decimal) you can calculate movement entirely in 8.8 format, and extend it back to 16.8 format in the end, with these steps:
1) Copy the lower 2 bytes of coordinates, into another memory location.
2) Calculate all movement, using the secondary memory location.
3) Extend it back to 3 bytes, by subtracting it by the low 2 bytes of the 3 byte coordinates, and adding it back as a signed value.
This works because objects are usually within 128 pixels from the previous frame.
Tuesday, July 10, 2012
How much memory do you need for object slots?
As I've said before using direct page as an object slot pointer is a good idea. If you want your game to be very flexible, it is best to have as many objects as possible and as much memory devoted to objects as possible.
A small quirk that I only recently discovered with the 65816, is that the direct page is always in the $00 bank. That means only 8kB of memory devoted to object slots. A full direct page is 256 bytes. If we were to use full 256 byte direct pages as slots, we'll only be able to have 32 objects slots. If we have 128 object slots, then we're limited to 64 byte slots.
To have a trade off, I beleive it is best to have 64 objects slots of 128 bytes. Well, actually more like 60, so you have room for the stack and universal registers.
A small quirk that I only recently discovered with the 65816, is that the direct page is always in the $00 bank. That means only 8kB of memory devoted to object slots. A full direct page is 256 bytes. If we were to use full 256 byte direct pages as slots, we'll only be able to have 32 objects slots. If we have 128 object slots, then we're limited to 64 byte slots.
To have a trade off, I beleive it is best to have 64 objects slots of 128 bytes. Well, actually more like 60, so you have room for the stack and universal registers.
Monday, February 6, 2012
Gradius 3's hi-oam management
From using a debugger on Gradius 3, the oam buffer is stored at $7e3c00 (or was it $7f3c00? Got to double check), and it manages the top 32 bytes of the oam like this:
First it sets the first word of the hi-oam 32 bytes to #$0001. Then, for every sprite it shifts the x and size bits into the word. When it shifts the "1" from the #$0001 into the carry bit, it knows it's time to set up the next word in the hi-oam.
This method is really inefficient. A much swifter way to do this is to temporarily store the x and size bits separate from the oam, and do all the hi-oam processing at the end of the main loop. It is a lot easier for the 65816 to do one job individually at a time, instead of going back and forth between many different jobs.
First it sets the first word of the hi-oam 32 bytes to #$0001. Then, for every sprite it shifts the x and size bits into the word. When it shifts the "1" from the #$0001 into the carry bit, it knows it's time to set up the next word in the hi-oam.
This method is really inefficient. A much swifter way to do this is to temporarily store the x and size bits separate from the oam, and do all the hi-oam processing at the end of the main loop. It is a lot easier for the 65816 to do one job individually at a time, instead of going back and forth between many different jobs.
Sunday, February 5, 2012
32-bit instructions are for losers.
Guess what, your college professors lied to you. 32-bit instructions are completely worthless. They are purely a marketing gimmick. I understand that programmers use 32-bit instructions all the time, my point is that programmers don't need to.
Think about this:
8-bit gives you numbers 0-255
16-bit gives you numbers 0-65535
32-bit gives you numbers 0-4294967295
8-bit is plenty enough for simple game logic, such as decrementing lives and going into game over mode when the number of lives is zero, or counting how many coins Mario has, since Mario never has more than 256 lives or 256 coins at once. Now calculating Marios actual gameplay physics requires numbers larger than 256.
16-bit is enough for level coordinates because one screen is 256 pixels long, and most games have levels that take up 16-32 screen legnths. 65536 is more than enough.
NES programmers were smart. Instead calculating game physics entirely using 16-bit values, they calculated the x and y velocity in 8-bit, and added the 8-bit velocity to 16-bit world coordinates. Like this:
lda y_velocity
clc
adc gravity
sta y_velocity
bpl negative_velocity
dec y_coordinate_hi
negative_velocity:
clc
adc y_coordinate_lo
sta y_coordinate_lo
bcc carry_bit
inc y_coordinate_hi
carry_bit:
SNES with it's 16-bit instruction set, doesn't have to do all this crap.
lda y_velocity
clc
adc gravity
sta y_velocity
clc
adc y_coordinate
sta y_coordinate
But here is the issue. The people who programmed SNES, unlike the NES, were stupid. They did EVERY LITTLE THING WITH 32-bit values. So instead of the example above, SNES programmers did this:
lda y_velocity_lo
clc
adc gravity_lo
sta y_velocity_lo
lda y_velocity_hi
adc gravity_hi
sta y_velocity_hi
lda y_velcoity_lo
clc
adc y_coordinate_lo
sta y_coordinate_lo
lda y_velocity_hi
adc y_coordinate_hi
sta y_coordinate_hi
Then there were programmers who were even stupider, who not only use 32-bit math, but left the 65816 in 8-bit mode!!!
Think about this:
8-bit gives you numbers 0-255
16-bit gives you numbers 0-65535
32-bit gives you numbers 0-4294967295
8-bit is plenty enough for simple game logic, such as decrementing lives and going into game over mode when the number of lives is zero, or counting how many coins Mario has, since Mario never has more than 256 lives or 256 coins at once. Now calculating Marios actual gameplay physics requires numbers larger than 256.
16-bit is enough for level coordinates because one screen is 256 pixels long, and most games have levels that take up 16-32 screen legnths. 65536 is more than enough.
NES programmers were smart. Instead calculating game physics entirely using 16-bit values, they calculated the x and y velocity in 8-bit, and added the 8-bit velocity to 16-bit world coordinates. Like this:
lda y_velocity
clc
adc gravity
sta y_velocity
bpl negative_velocity
dec y_coordinate_hi
negative_velocity:
clc
adc y_coordinate_lo
sta y_coordinate_lo
bcc carry_bit
inc y_coordinate_hi
carry_bit:
SNES with it's 16-bit instruction set, doesn't have to do all this crap.
lda y_velocity
clc
adc gravity
sta y_velocity
clc
adc y_coordinate
sta y_coordinate
But here is the issue. The people who programmed SNES, unlike the NES, were stupid. They did EVERY LITTLE THING WITH 32-bit values. So instead of the example above, SNES programmers did this:
lda y_velocity_lo
clc
adc gravity_lo
sta y_velocity_lo
lda y_velocity_hi
adc gravity_hi
sta y_velocity_hi
lda y_velcoity_lo
clc
adc y_coordinate_lo
sta y_coordinate_lo
lda y_velocity_hi
adc y_coordinate_hi
sta y_coordinate_hi
Then there were programmers who were even stupider, who not only use 32-bit math, but left the 65816 in 8-bit mode!!!
Tuesday, December 27, 2011
How to NOT make an SNES game slowdown.
It has been a long time since I written anything in this blog. In fact I wanted to post this back then, but since the world of video game forums were so uncivilized and barbaric, I knew that there would be lots of trolling in the comments section.
Thankfully internet forums seem to have matured a lot since when I first started going online. Especially when it comes to the SNES homebrew/dev scene. So I feel confident this wouldn't cause a big fuss as it would've back then.
Okay, here is my argument. Both the SNES and the Genesis are equally as fast. The 65816 can do twice as much work per cycle as the 68000. The reason why SNES games tend to slowdown more is because people are programming it wrong.
See, the 65816 has something called a Direct Page register. It is one of the most beneficial features of the 65816, but nobody seems to use it. Unlike most CPUs which have static local memory, where data is copied to and from to be used efficiently, the 65816 has moveable local memory where the location of the local memory can be changed using the DP regsiter.
If you have 10 objects on the screen with a specific amount of memory devoted to each object, for the 68000, it has to copy everything into it's internal registers to use, and store everything back into ram when the it is done processing the object. For the 65816, all it needs to do is change the DP location from one object to another.
For whatever reason, programmers just didn't do that. They just left the DP at $0000 through the entire game, and just copied memory to and from it, instead of doing it the more effective way.
Thankfully internet forums seem to have matured a lot since when I first started going online. Especially when it comes to the SNES homebrew/dev scene. So I feel confident this wouldn't cause a big fuss as it would've back then.
Okay, here is my argument. Both the SNES and the Genesis are equally as fast. The 65816 can do twice as much work per cycle as the 68000. The reason why SNES games tend to slowdown more is because people are programming it wrong.
See, the 65816 has something called a Direct Page register. It is one of the most beneficial features of the 65816, but nobody seems to use it. Unlike most CPUs which have static local memory, where data is copied to and from to be used efficiently, the 65816 has moveable local memory where the location of the local memory can be changed using the DP regsiter.
If you have 10 objects on the screen with a specific amount of memory devoted to each object, for the 68000, it has to copy everything into it's internal registers to use, and store everything back into ram when the it is done processing the object. For the 65816, all it needs to do is change the DP location from one object to another.
For whatever reason, programmers just didn't do that. They just left the DP at $0000 through the entire game, and just copied memory to and from it, instead of doing it the more effective way.
Monday, November 9, 2009
Sprite Field Coordinates
Since the Super Nintendo's OAM (object attribute memory) uses a 9-bit x and an 8-bit y screen coordinate, and the screen is 256x224 pixels big, you may think 16-bit field coordinates is the best approach since the top 8 bits will be what screen, and the bottom 8 bits will be what pixel of the screen, but there is a catch. You'd only be able to increment using whole numbers like 1 and 2 pixels per frame. If you need something inbetween like 1.5 pixels per frame you need decimal bits. If you do 16-bit whole numbers and 16-bit decimals such as this:
ssssssssttttpppp dddddddddddddddd
s: screen
t: tile
p: pixel
d: decimal
it will require 32-bit math which the 65816 isn't very good at. Thankfully there is a better way of doing this:
ssssttttppppdddd
s: screen
t: tile
p: pixel
d: decimal
This is not just optimistic because it uses 16-bit math which the 65816 IS good at. It also makes sprite-tile collision way easier. All you need to do is combine both top bytes of both x and y coordinates to calculate the correct tile number the sprite is overlapping.
ssssssssttttpppp dddddddddddddddd
s: screen
t: tile
p: pixel
d: decimal
it will require 32-bit math which the 65816 isn't very good at. Thankfully there is a better way of doing this:
ssssttttppppdddd
s: screen
t: tile
p: pixel
d: decimal
This is not just optimistic because it uses 16-bit math which the 65816 IS good at. It also makes sprite-tile collision way easier. All you need to do is combine both top bytes of both x and y coordinates to calculate the correct tile number the sprite is overlapping.
Subscribe to:
Posts (Atom)