Sokoworm
Sokoworm is my first 'published' 'videogame' (it has an itch.io page).
It is a sokoban (blockpusher), where the pusher's body snakes around behind the player.
I made it for the 2025 4MB gamejam, and the executable is only 2488 Bytes, so it has some odd UX decisions.
On this page, I want to document what I did and how it works.
Table of contents
- Quick preamble
- Technical details
- Program header
- Runtime and standard library
- User input and windowing
- Graphics
- Level representation and coordinate systems
- Undo
- Movement logic and ordering
- Closing thoughts
Quick preamble
I had learned around march of 2025 that there would be a 4MB gamejam in may (it only happens every other year). Since for a while before then, I had been working on an assembly macro programming language, Triops, so the plan was to write a game using that. Work intermittently continued on Triops upto the middle of may (when the gamejam was already half underway), but eventually ground to a halt because of illness. Triops was close, but not quite in a state to take on a project like this, so I had given up on the idea to participate.
However, the moment I said I wasn't going to do it (3 days before the end of the deadline), I got the conviction to simply do the whole project in nasm assembly. I did not yet have a solid plan for what sort of game I wanted, but at least I started.
Technical details
Program header
The basis of the program is a custom ELF-header. At some point during my research into how to write programs in assembly, I stumbled on a great post about making tiny linux executables, so I used that same idea. That tutorial uses a 32-bit header, but for my purposes, 64 bit would be easier to work with, even if slightly larger. Luckily, the ELF wikipedia page has a very complete description of the format.
This is the header:
; This is a custom ELF header for smaller executables. It is less memorysafe.
BITS 64
org 0x08048000
elf_header:
; e_ident
db 0x7F, "ELF"
db 2 ; 64 bit
db 1 ; little endian
db 1 ; version
db 0 ; ABI: UNIX system V (3 is GNU/Linux, but I'm not using any GNU)
db 0 ; more ABI (also dynlinker ABI for glibc?)
db 0, 0, 0, 0, 0, 0, 0 ; 7 zeroes to pad
dw 2 ; e_type: executable
dw 0x3E ; e_machine: AMD X86-64
dd 1 ; e_version
dq _start ; e_entry
dq program_header - $$ ; e_phoff: program header offset (should be 0x40)
dq 0 ; e_shoff: section header offset (we don't have this rn)
dd 0 ; e_flags
dw 0x40 ; e_ehsize: elf header size (should be 0x40 for 64 bit)
dw 0x38 ; e_phentsize: program header entry size (should be 0x38 for 64 bit)
dw 1 ; e_phnum: number of program header entries
dw 0 ; e_shentsize: section header stuff. we don't have it, so it's all zero
dw 0 ; e_shnum
dw 0 ; shstrndx
program_header:
dd 1 ; p_type: loadable segment (PT_LOAD)
dd 0x7 ; p_flags: executable (1) + writeable (2) + readable (4)
dq 0 ; p_offset (I think this is offset from _start)
dq $$ ; p_vaddr: $$ is that org number at the top, which is where it'll get loaded in
dq $$ ; p_paddr (I'm pretty sure this is ignored)
dq filesize ; p_filesz
dq filesize ; p_memsz
dq 0x1000 ; p_align: align to linux page size (4096 bytes)
The reason that this header makes the executable small, is approximately twofold:
- There is only one section with read, write and execute permissions, instead of one executable section and one writable section (which has more padding).
- There is no linking or debugging information.
But more pressingly, since the executable section is also writeable, programs with this header are inherently more vulnerable to exploits of that nature: code insertion. So don't write a critical program like this! Just accept the minimum 4kb executable size for actual applications.
Runtime and standard library
Next, I did not want to deal at all with the C runtime, which is often considered a mandatory dependency because of how linux is constructed. This is true for very serious large programs, but for small programs, it's not really true. Linux quite seriously maintains a consistent set of syscalls, with which userspace programs can tell the kernel to do things for them. The C runtime still lives in userspace, so a committed programmer can make their program do anything glibc can.
So, I wrote the absolute bare minimum for a program that has to do some dynamic work. All these procedures are very short since they're super thin wrappers over syscalls.
read_input
: Reading input from stdin, into a (sized) buffer.print_string
: Writing a (sized) buffer to stdout.alloc
: Only really calls mmap (which maps a page in your program's virtual memory), and keeps track of the buffer's size.exit
: Is basically a requirement if you don't want to crash at the end of your program.write_image
: Writes a pre-formatted pixel data buffer to a PPM file (more on this later).
write_entire_file
: Ended up changing intowrite_image
, since I didn't need more file IO.free
: I didn't really see a reason to free any memory for this particular program, in service of smaller codesize.sleep_ms
: Would have been useful for animations (because image readers don't like it when a file changes very fast), but I never ended up having any.
User input and windowing
There is one large downside to not having a C runtime: dynamically linking to a library at runtime won't normally simply work. Most libraries expect a C runtime, which you can't always deliver.
An example of such a library is xlib, or anything opengl, or basically everything that helps you do windowing (to my knowledge).
Now, it's still possible to make windows and do all the other things, but it would (for example) involve setting up an X11 client,
and doing network calls (there are syscalls for this) to the computer's X11 server, and this would all be at least several handfulls of code.
I did not want to keep the game to just the terminal. Doing terminal rendering relies on somewhat arcane escape codes, and strange methods of operation in general.
So, I opted for the next best thing: rendering to an image. The user would simply open the rendered image in an image viewer, and I'd have to do no windowing.
Of course, the image viewer needs to be capable of automatically reloading when the image on disk changes, but a reasonable section of image viewers does have this property[citation needed].
The simplest (and oldest, for portability's sake) image format I was aware of, was PPM.
The PPM header for binary image data with 24 bits per pixel is ~15 Bytes. My other option was MS BMP.
Much more platforms/image viewers support BMP, but it has a larger header and, more annoyingly, image data needs to be aligned to 4 bytes per row of pixels.
At the time, I chose the easy way out. In hindsight, the portability of BMP might have been worth it.
This all being said, the user input is just kept to the terminal. The program reads two bytes at a time from stdin (the command letter and a newline), and selects what to do from there. A fun side-effect of this is that stdin is not fully cleared after those two bytes are read. This means the player can input multiple moves at the same time, as long as they are seperated by a space or other character.
Graphics
At a high level, the renderer draws blocks from back to front, in two passes. The first and second pass run the same code, and render the 'ground floor' and 'first floor' of the level respectively. Per pass, the renderer essentially divides the screen in strips that overlap eachother, from top to bottom. For every strip, it goes through all pixels in that strip, and checks what block coordinate that pixel belongs to. The 'y-coordinate' of the block is simply the strip number, and the 'x-coordinate' is calculated like so:
mov rax, r14 ; rax now contains the pixel column number (which is kept track of in r14), this will be the dividend.
mov rdx, 0 ; rdx is set to zero because x86 division is Very Strange.
mov rcx, BLOCK ; rcx now contains the pixel width of a block (which is a constant set elsewhere), this will be the divisor.
div ecx ; The actual division, which puts the quotient in rdx, and the remainder in rax.
mov r8, rax ; r8 now contains the remainder, which is the x-coordinate of the block that this pixel is in.
As quick aside, this is the only div
instruction in the entire program, it's really cumbersome to use.
Anyway, the remainder is remembered, and the program goes on to think about where the pixel is in the current block: on the top, or on the side?
To do this, it calculates the y-value (in pixels) of 3 triangle waves.
Before drawing anything, the program must find out if there is a block at all at this location, and if so, which one. To do that, it calls a procedure that I will describe later in more detail, but that returns at least:
- A number that represents the type of block at the given coordinates.
- A pointer to the colour palette of the current block (one colour for the top of a block, one for the sides).
On every odd strip, the drawing is offset by half a block's width (to the right), so that they nicely fit inbetween the even strips of blocks. This is a very messy way to make these things align, and made the block coordinate system immensely hard to work with. Speaking of:
Level representation and coordinate systems
Levels are stored in memory as follows:
- 1 byte that gives the amount of blocks on the 'ground floor'.
- A set of blocks, each of which is 3 bytes: an x-coordinate, a y-coordinate, and a blocktype.
- 1 byte that gives the amount of blocks on the 'first floor'.
- A block that gives the head of the worm.
- A set of blocks that gives the rest of the worm, ordered from head to tail.
- A set of pushable blocks.
- The next level.
The coordinates here are all in on-screen coordinates (the entire game deals with them as such), which themselves deserve a proper description.
There are two possible coordinate systems: the on-screen one, and the in-field one.
The in-field system is the one that the player instructs the worm to move in (so the one that corresponds to the keyboard controls).
It is aligned to the orthogonal grid, which is oriented at a quarter turn to the player, and overall the one that makes the most sense.
The other coordinate system is the on-screen one. Because of how the block strips are drawn, every odd row of blocks has the same column number as the even row above it.
- it is on an odd row, and going topleft or bottomleft.
- it is on an even row, and going topright or bottomright.
; r8 contains xpos, r9 ypos (both in on-screen coordinates).
; rcx contains xdir, and rdx ydir (both in in-field coordinates).
mov r10, r9 ; r10 now contains ypos.
cmp rcx, -1 ; If xdir == -1, invert the behaviour wrt if it wasn't.
jne skip_inversion_head_mvmt
not r10 ; This is that inversion.
skip_inversion_head_mvmt:
and r10, 1 ; This checks for evenness (or oddness, in the case of xdir == -1) of ypos.
imul r10, rcx ; The resulting boolean is multiplied with xdir.
add r8b, r10b ; xpos is updated with our updated xdir.
add r9b, dl ; ypos is updated, no changes need to be done to ydir.
I designed the levels on paper in a normal grid (so essentially in-field coordinates). Converting those levels to on-screen coordinates was mostly an effort of trial-and error, moving blocks according to the above rules by hand. However, at some point, I realized a formatting trick to make it easier. As an example, here is level 3:
level_3:
db 18
db 3, 12, 1
db 2, 13, 1, 3, 13, 1
db 2, 14, 1, 3, 14, 3, 4, 14, 1
db 1, 15, 1, 2, 15, 2, 3, 15, 1, 4, 15, 1
db 2, 16, 1, 3, 16, 1, 4, 16, 1
db 2, 17, 1, 3, 17, 1
db 3, 18, 1, 4, 18, 1
db 3, 19, 1
db 7
db 3, 11, 6
db 3, 10, 5
db 2, 11, 5
db 2, 12, 5
db 1, 13, 5
db 2, 14, 5
db 3, 15, 4

db 7
) gives the positions of the worm and movable blocks.
Since those 'first floor' coordinates are also on-screen, and because there is no z-coordinate, they are just placed at y+2, which is the graphical equivalent.
Undo
Before any moves are read and processed, the game saves the entire current level to a (newly allocated) buffer. The pointer to that buffer is then stored another (more static) buffer. There are no free or realloc procedures, so there is an undo limit of ~256, which I reckoned was good enough. Whenever there's an illegal move, or the user selects undo, the pointers to the buffers that are undone are overwritten, so that memory is very much leaked (rather regrettably).
Initially, I wanted to just save the moves themselves. I quickly realized that undoing a move based on nothing but the move itself and the current level state
is actually impossible because of larger-than-first-order effects of moving blocks via other blocks. I could have done several things instead, but just went for the next dumbest thing.
That being said, my implementation of the 'next dumbest thing' could have used some work.
A lot of code is currently spent of shifting buffers, and copying over double pointers. A realloc
procedure would have cleaned all of that up.
On top of that, the extra level of indirection was nearly impossible to debug.
Movement logic and ordering
Moves are performed in-place on the level, so ordering of what needs to happen before what is very important.
Earlier, I mentioned a procedure that would give information on a block at a given location.
That procedures naively loops through the current level, and returns on the first block it can find with matching coordinates.
This too contributes to the need to be very careful with changing coordinates, because you might not be able to retrieve other blocks that you might've 'double stacked'.
A more tangible example of wrong ordering is that the worm might not be able to follow its own tail in a tight loop, because it'd 'collide' with the previous position of its own tail.
When a move is read, the first thing that happens is that the worm body snakes forward. This is ok, because the head position (in memory) is always known, and there is no other block that the worm body could overlap, coming from a valid state.
Then, the block on the first floor in the would-be new location of the head is checked. If this is air, it continues to the ground floor check. If it's a part of the snake, it's an illegal move, the previous level buffer is loaded, and the gameloop is restarted.
If there's a movable block in front of the worm, that block's coordinates get retrieved, and its next potential position is calculated. That new position is also checked for any collisions. This time however, the ground floor is first checked. If there's a hole beneath that block, it needs to be filled first. It then continues to the first floor check. If there's another block, this whole check repeats on that block, until no more blocks are being pushed. With this ordering, blocks can push another block and fill a hole simultaneously.
After that, the ground floor check happens for the head of the worm. It continues if there's ground or a filled-in hole, and loads the previous level buffer and restarts the gameloop if there's air or an unfilled hole. When there's a wormhole, it checks if there are any unfilled holes left on the ground floor. If not, it continues to the next level. When the level is reset, the undo buffer is also 'cleared' (the top is set to zero, and more memory is leaked (regrettably)).
Finally, the head of the worm updates, the program jumps back up to the start of the gameloop.
Closing thoughts
Since development was very rushed, every day since the end of the jam, I've had realizations about how every part of this could have been done smaller, faster, more robustly, and more user-friendly.
I won't go into all details about that quite yet (maybe at a later date), because I'd honestly rather have spent more time designing puzzles. That is to say, I think I did good work for the time and effort I put in.
I'm very happy with the final result, even if it isn't the smallest it can be (or with as much content as it should have). I hope you give it a try!
I think this concept has a lot of potential for a starting game designer like myself, maybe I'll polish it up a ton and release a steam version with actual windowing and a whole heap more puzzles :P