Sokoworm

Sokoworm is my first 'published' 'videogame' (it has an itch.io page).
It is a sokoban (blockpusher), where the pusher's body snakes around behind the player. I made it for the 2025 4MB gamejam, and the executable is only 2488 Bytes, so it has some odd UX decisions.
On this page, I want to document what I did and how it works.

Table of contents

Quick preamble

I had learned around march of 2025 that there would be a 4MB gamejam in may (it only happens every other year). Since for a while before then, I had been working on an assembly macro programming language, Triops, so the plan was to write a game using that. Work intermittently continued on Triops upto the middle of may (when the gamejam was already half underway), but eventually ground to a halt because of illness. Triops was close, but not quite in a state to take on a project like this, so I had given up on the idea to participate.

However, the moment I said I wasn't going to do it (3 days before the end of the deadline), I got the conviction to simply do the whole project in nasm assembly. I did not yet have a solid plan for what sort of game I wanted, but at least I started.

Technical details

The basis of the program is a custom ELF-header. At some point during my research into how to write programs in assembly, I stumbled on a great post about making tiny linux executables, so I used that same idea. That tutorial uses a 32-bit header, but for my purposes, 64 bit would be easier to work with, even if slightly larger. Luckily, the ELF wikipedia page has a very complete description of the format.

This is the header:

; This is a custom ELF header for smaller executables. It is less memorysafe.
BITS 64
	org 0x08048000
elf_header:
	; e_ident
	db 0x7F, "ELF"
	db 2		; 64 bit
	db 1		; little endian
	db 1		; version
	db 0		; ABI: UNIX system V (3 is GNU/Linux, but I'm not using any GNU)
	db 0		; more ABI (also dynlinker ABI for glibc?)
	db 0, 0, 0, 0, 0, 0, 0 ; 7 zeroes to pad

	dw 2		; e_type: executable
	dw 0x3E		; e_machine: AMD X86-64
	dd 1		; e_version
	dq _start		; e_entry
	dq program_header - $$		; e_phoff: program header offset (should be 0x40)
	dq 0		; e_shoff: section header offset (we don't have this rn)
	dd 0		; e_flags
	dw 0x40		; e_ehsize: elf header size (should be 0x40 for 64 bit)
	dw 0x38		; e_phentsize: program header entry size (should be 0x38 for 64 bit)
	dw 1		; e_phnum: number of program header entries
	dw 0		; e_shentsize: section header stuff. we don't have it, so it's all zero
	dw 0		; e_shnum
	dw 0		; shstrndx

program_header:
	dd 1		; p_type: loadable segment (PT_LOAD)
	dd 0x7		; p_flags: executable (1) + writeable (2) + readable (4)
	dq 0		; p_offset (I think this is offset from _start)
	dq $$		; p_vaddr: $$ is that org number at the top, which is where it'll get loaded in
	dq $$		; p_paddr (I'm pretty sure this is ignored)
	dq filesize		; p_filesz
	dq filesize		; p_memsz
	dq 0x1000		; p_align: align to linux page size (4096 bytes)
The reason that this header makes the executable small, is approximately twofold:
  1. There is only one section with read, write and execute permissions, instead of one executable section and one writable section (which has more padding).
  2. There is no linking or debugging information.
Some antivirus scanners might not like this, since virtually every executable generated nowadays has at least a section header table, so that's somewhat of an assumption.
But more pressingly, since the executable section is also writeable, programs with this header are inherently more vulnerable to exploits of that nature: code insertion. So don't write a critical program like this! Just accept the minimum 4kb executable size for actual applications.

Runtime and standard library

Next, I did not want to deal at all with the C runtime, which is often considered a mandatory dependency because of how linux is constructed. This is true for very serious large programs, but for small programs, it's not really true. Linux quite seriously maintains a consistent set of syscalls, with which userspace programs can tell the kernel to do things for them. The C runtime still lives in userspace, so a committed programmer can make their program do anything glibc can.

So, I wrote the absolute bare minimum for a program that has to do some dynamic work. All these procedures are very short since they're super thin wrappers over syscalls.

I also wrote some basic utilities that I didn't end up using at all: Notable is that some of these procedures are way nicer than their equivalent libc functions, since there are no zero-terminated strings anywhere. Unlike C, my program has sized arrays as a basic datatype. The size of a buffer is stored directly in front of that buffer, one byte in front if it's a string, 8 bytes if it is dynamically allocated.

User input and windowing

There is one large downside to not having a C runtime: dynamically linking to a library at runtime won't normally simply work. Most libraries expect a C runtime, which you can't always deliver. An example of such a library is xlib, or anything opengl, or basically everything that helps you do windowing (to my knowledge).
Now, it's still possible to make windows and do all the other things, but it would (for example) involve setting up an X11 client, and doing network calls (there are syscalls for this) to the computer's X11 server, and this would all be at least several handfulls of code.

I did not want to keep the game to just the terminal. Doing terminal rendering relies on somewhat arcane escape codes, and strange methods of operation in general. So, I opted for the next best thing: rendering to an image. The user would simply open the rendered image in an image viewer, and I'd have to do no windowing. Of course, the image viewer needs to be capable of automatically reloading when the image on disk changes, but a reasonable section of image viewers does have this property[citation needed].
The simplest (and oldest, for portability's sake) image format I was aware of, was PPM. The PPM header for binary image data with 24 bits per pixel is ~15 Bytes. My other option was MS BMP. Much more platforms/image viewers support BMP, but it has a larger header and, more annoyingly, image data needs to be aligned to 4 bytes per row of pixels. At the time, I chose the easy way out. In hindsight, the portability of BMP might have been worth it.

This all being said, the user input is just kept to the terminal. The program reads two bytes at a time from stdin (the command letter and a newline), and selects what to do from there. A fun side-effect of this is that stdin is not fully cleared after those two bytes are read. This means the player can input multiple moves at the same time, as long as they are seperated by a space or other character.

Graphics

At a high level, the renderer draws blocks from back to front, in two passes. The first and second pass run the same code, and render the 'ground floor' and 'first floor' of the level respectively. Per pass, the renderer essentially divides the screen in strips that overlap eachother, from top to bottom. For every strip, it goes through all pixels in that strip, and checks what block coordinate that pixel belongs to. The 'y-coordinate' of the block is simply the strip number, and the 'x-coordinate' is calculated like so:

mov rax, r14    ; rax now contains the pixel column number (which is kept track of in r14), this will be the dividend.
mov rdx, 0      ; rdx is set to zero because x86 division is Very Strange.
mov rcx, BLOCK  ; rcx now contains the pixel width of a block (which is a constant set elsewhere), this will be the divisor.
div ecx	        ; The actual division, which puts the quotient in rdx, and the remainder in rax.
mov r8, rax     ; r8 now contains the remainder, which is the x-coordinate of the block that this pixel is in.
As quick aside, this is the only div instruction in the entire program, it's really cumbersome to use.

Anyway, the remainder is remembered, and the program goes on to think about where the pixel is in the current block: on the top, or on the side? To do this, it calculates the y-value (in pixels) of 3 triangle waves.

Graphic of three coloured triangle waves. From top to bottom, they are labeled 1, 2 and 3, and are green, magenta and orange respectively. All three have the same rate, though line 2 and 3 are inverted, and line 2 touches its peaks with line 1's troughs.
The three triangle waves within a strip, here drawing out the silhouettes of two blocks.
The first triangle wave represents the far-away edge of the block, the second the line between the top and the sides, and the third the close-by edge of the block. To keep the reader on their toes, it is now on them to think of what the set of conditions look like for it to draw the top, side, or nothing.

Before drawing anything, the program must find out if there is a block at all at this location, and if so, which one. To do that, it calls a procedure that I will describe later in more detail, but that returns at least:

If the block is air, no pixels are drawn, since air is totally transparent. In any other case, the relevant colour values are copied from the palette pointer to the image.

On every odd strip, the drawing is offset by half a block's width (to the right), so that they nicely fit inbetween the even strips of blocks. This is a very messy way to make these things align, and made the block coordinate system immensely hard to work with. Speaking of:

Level representation and coordinate systems

Levels are stored in memory as follows:

  1. 1 byte that gives the amount of blocks on the 'ground floor'.
  2. A set of blocks, each of which is 3 bytes: an x-coordinate, a y-coordinate, and a blocktype.
  3. 1 byte that gives the amount of blocks on the 'first floor'.
  4. A block that gives the head of the worm.
  5. A set of blocks that gives the rest of the worm, ordered from head to tail.
  6. A set of pushable blocks.
  7. The next level.
Because these levels are given in the executable, they cannot be resized, so the number of blocks on either floor has to stay constant.
The coordinates here are all in on-screen coordinates (the entire game deals with them as such), which themselves deserve a proper description.

There are two possible coordinate systems: the on-screen one, and the in-field one.
The in-field system is the one that the player instructs the worm to move in (so the one that corresponds to the keyboard controls). It is aligned to the orthogonal grid, which is oriented at a quarter turn to the player, and overall the one that makes the most sense.

The other coordinate system is the on-screen one. Because of how the block strips are drawn, every odd row of blocks has the same column number as the even row above it.

Graphic of a blue grid of rhombi. The rows are numbered from 0 to 6 with green numbers. The columns are numbered from 0 to 4 (also green), but instead of how the rows are numbered, every other column has an index. From row 2 down, the rhombi are shaded magenta. The magenta-shaded rhombi are always orthogonal to at least one other magenta-shaded rhombus.
The on-screen coordinate system. Highlighted in pink is column 2.
This means that, whenever the worm moves its head in in-field coordinates, its on-screen column coordinate does not change when: Luckily, the on-screen row coordinate reliably changes based on if it's going to the top of bottom, regardless of the coordinates or lateral movement. Any time the worm head has to move, or a block has to move, these rules are applied in approximately 8 instructions, which looks like this:
; r8 contains xpos, r9 ypos (both in on-screen coordinates).
; rcx contains xdir, and rdx ydir (both in in-field coordinates).
mov r10, r9      ; r10 now contains ypos.
cmp rcx, -1      ; If xdir == -1, invert the behaviour wrt if it wasn't.
jne skip_inversion_head_mvmt
	not r10      ; This is that inversion.
skip_inversion_head_mvmt:
and r10, 1       ; This checks for evenness (or oddness, in the case of xdir == -1) of ypos.
imul r10, rcx    ; The resulting boolean is multiplied with xdir.
add r8b, r10b    ; xpos is updated with our updated xdir.
add r9b, dl      ; ypos is updated, no changes need to be done to ydir.

I designed the levels on paper in a normal grid (so essentially in-field coordinates). Converting those levels to on-screen coordinates was mostly an effort of trial-and error, moving blocks according to the above rules by hand. However, at some point, I realized a formatting trick to make it easier. As an example, here is level 3:

level_3:
	db 18
				db 3, 12, 1
			db 2, 13, 1,	3, 13, 1
		db 2, 14, 1,	3, 14, 3,	4, 14, 1
	db 1, 15, 1,	2, 15, 2,	3, 15, 1,	4, 15, 1
		db 2, 16, 1,	3, 16, 1,	4, 16, 1
			db 2, 17, 1,	3, 17, 1
				db 3, 18, 1,	4, 18, 1
					db 3, 19, 1
	db 7
	db 3, 11, 6
	db 3, 10, 5
	db 2, 11, 5
	db 2, 12, 5
	db 1, 13, 5
	db 2, 14, 5
	db 3, 15, 4
A screenshot of level 3
If you squint your eyes, you can just about make out the layout of the level. The second part (after db 7) gives the positions of the worm and movable blocks. Since those 'first floor' coordinates are also on-screen, and because there is no z-coordinate, they are just placed at y+2, which is the graphical equivalent.

Undo

Before any moves are read and processed, the game saves the entire current level to a (newly allocated) buffer. The pointer to that buffer is then stored another (more static) buffer. There are no free or realloc procedures, so there is an undo limit of ~256, which I reckoned was good enough. Whenever there's an illegal move, or the user selects undo, the pointers to the buffers that are undone are overwritten, so that memory is very much leaked (rather regrettably).

Initially, I wanted to just save the moves themselves. I quickly realized that undoing a move based on nothing but the move itself and the current level state is actually impossible because of larger-than-first-order effects of moving blocks via other blocks. I could have done several things instead, but just went for the next dumbest thing. That being said, my implementation of the 'next dumbest thing' could have used some work. A lot of code is currently spent of shifting buffers, and copying over double pointers. A realloc procedure would have cleaned all of that up. On top of that, the extra level of indirection was nearly impossible to debug.

Movement logic and ordering

Moves are performed in-place on the level, so ordering of what needs to happen before what is very important.
Earlier, I mentioned a procedure that would give information on a block at a given location. That procedures naively loops through the current level, and returns on the first block it can find with matching coordinates. This too contributes to the need to be very careful with changing coordinates, because you might not be able to retrieve other blocks that you might've 'double stacked'. A more tangible example of wrong ordering is that the worm might not be able to follow its own tail in a tight loop, because it'd 'collide' with the previous position of its own tail.

When a move is read, the first thing that happens is that the worm body snakes forward. This is ok, because the head position (in memory) is always known, and there is no other block that the worm body could overlap, coming from a valid state.
Then, the block on the first floor in the would-be new location of the head is checked. If this is air, it continues to the ground floor check. If it's a part of the snake, it's an illegal move, the previous level buffer is loaded, and the gameloop is restarted.

If there's a movable block in front of the worm, that block's coordinates get retrieved, and its next potential position is calculated. That new position is also checked for any collisions. This time however, the ground floor is first checked. If there's a hole beneath that block, it needs to be filled first. It then continues to the first floor check. If there's another block, this whole check repeats on that block, until no more blocks are being pushed. With this ordering, blocks can push another block and fill a hole simultaneously.

After that, the ground floor check happens for the head of the worm. It continues if there's ground or a filled-in hole, and loads the previous level buffer and restarts the gameloop if there's air or an unfilled hole. When there's a wormhole, it checks if there are any unfilled holes left on the ground floor. If not, it continues to the next level. When the level is reset, the undo buffer is also 'cleared' (the top is set to zero, and more memory is leaked (regrettably)).

Finally, the head of the worm updates, the program jumps back up to the start of the gameloop.

Closing thoughts

Since development was very rushed, every day since the end of the jam, I've had realizations about how every part of this could have been done smaller, faster, more robustly, and more user-friendly. I won't go into all details about that quite yet (maybe at a later date), because I'd honestly rather have spent more time designing puzzles. That is to say, I think I did good work for the time and effort I put in. I'm very happy with the final result, even if it isn't the smallest it can be (or with as much content as it should have). I hope you give it a try!
I think this concept has a lot of potential for a starting game designer like myself, maybe I'll polish it up a ton and release a steam version with actual windowing and a whole heap more puzzles :P