As I mentioned in my previous post I am using a Parallax Propeller in my COLE-2 SBC project. The propeller is a neat little chip. I won’t go into a whole lot of details about it here, since that is well-covered elsewhere, but the basics are that it’s an 8-core (or “cog”) processor running at 80 MHz, and is very hobbyist-friendly. It’s a bit of an odd duck from a programming point of view, but once you get used to it you can do some amazing things with it. There are literally hundreds of open source modules available implementing all sorts of software-designed peripherals so you can do a lot while writing little to no code of you own, if you so choose.
What initially drew me to the propeller is its ability to generate video signals, both composite (NTSC or PAL) and analog VGA. It’s able to do this thanks to some custom hardware included in each cog that facilitates the generation of the proper timing and the shifting out of pixel data based on that timing. You can do this with as little as one cog, although multiple cogs working together will allow you to get higher resolutions and/or better color depth. The only limitations are that is is limited to 6-bit RRGGBB color, and there is only 32 KB of shared RAM (called “hub RAM”) available for holding your frame buffer and other shared data.
Once you have the propeller integrated into your design, however, you might as well put all eight cogs to use; the only cost incurred is adding the extra driver code and maybe some I/O pins (more on that later). With that in mind I decided to see how much I could pack into that single chip.
The Bus Interface
To start with I need a way to interface the propeller to the rest of the system. Many projects just do this over a serial link; the chip can bit-bang serial at modest bit rates, and I could easily have attached it to the second port of the UART. But serial can be a real bottleneck for graphics modes, so I wanted some sort of parallel interface. In an ideal world I would let the chip pretend to be a 32 KB RAM chip, but this would eat up every single available I/O pin and would probably be overkill for the modest graphics modes the chip is capable of producing. So, I decided to borrow from the 1970s-era TMS9918, which did all configuration and VRAM access through just two addressable registers. This means I only need 1 address input instead of 15.
To implement this I connected the propeller to 14 signals bus: D0-D7, Φ2, /IOSEL2 (from my address decoder GAL), RWB, RDY, and /IRQ. The RDY line would be held low to halt the CPU while the propeller responds to a bus request, since it’s not fast enough to keep up with the CPU at full speed. All of these signals would be fed through a pair of 74LVC245 buffers, because the propeller is a 3.3V part but the rest of my system is 5V. So far so good, or so I thought…
RDY and Waiting
As it turns out using RDY with the 65816 is not quite as straightforward as I had hoped, due to the way it multiplexes the bank address onto the data bus. The bank address is emitted during the first half of the CPU cycle, when Φ2 is low. The data bus is connected to a 74ACT573 latch, which is kept open (transparent) during Φ2 low, but which closes and captures the bank address when Φ2 goes high.
Normally this setup works fine, and in fact it’s the exact design recommended by WDC. The problem comes in when you start trying to use RDY. When RDY is pulled low, the CPU halts as soon as Φ2 transitions from high to low. The actual Φ2 clock, however, does not stop. If RDY is kept low long enough for Φ2 to go high again, the bank address latch will capture some random data bus data as the bank address, and when the CPU finally resumes it will likely access the wrong memory address.
For my test implementation I solved this by using some extra lines on a GAL to construct a latch enable signal that such that the latch remains closed as long as RDY is low. Unfortunately this seems to have made the system slightly unstable, even when the propeller is not being accessed (during which times it’s not even on the bus, as its data bus buffer is disabled until /IOSEL2 goes low while Φ2 is high.) My GALs are fast (7 ns parts), but it’s possible the extra delay is the causing the strange behavior.
For my next attempt I am going to try a different approach: halting the CPU’s Φ2 clock during the high phase using a circuit like this:
My current clock generator is the top half of that circuit, which means I already have the second half of the flip-flop available to use to add the bottom half; I just need to do the wiring. Once that’s done the propeller will pull /STP low instead of RDY, and my bank address latch will go back to being directly qualified by Φ2. I am hoping this will result in a stable system.
As a bonus I am hoping this new setup will solve another issue I have so far ignored; when first powering on, the propeller takes a few seconds to boot, during which it is not properly asserting RDY. This causes boots to randomly fail. With the new setup /STP will be low during this time so the CPU will not even try to boot until the propeller is up and running.
The Software
Writing the PASM code to implement the 65xx bus interface turned out to be much easier than I was expecting; I was able to hack out a working proof-of-concept implementation in a few hours. It isn’t even a lot of code; the basic code forming the main loop is just this:
mainloop waitpeq Pin_PHI2, Pin_CS_PHI2 'Wait for /CS to go low with PHI2 high
andn outa, Pin_RDY 'Pull RDY low
mov _in, ina 'Capture the input port
and _in, Pin_RS WZ,NR 'Check RS bit (0 = vram, 1 = registers)
and _in, Pin_RWB 'Mask RWB bit for later
if_e jmp #:vram
tjz _in, #write_register
jmp #read_register
:vram tjz _in, #write_vram
jmp #read_vram
'' Common code for all ops; unhalts the CPU, waits for /CS to go high and then loops
finish_request
or outa, Pin_RDY 'Unpause the CPU
waitpeq Pin_CS, Pin_CS 'Wait for /CS to go high again
andn dira, Pins_Data 'Set data bus pins to high-Z (input state)
jmp #mainloop 'Rinse and repeat
In the end I had a working setup in which reading the propeller on either I/O port would return a constantly incrementing byte, which I could also change by writing to either port. This allowed me to verify that the bus interface was working properly.
With the test code working I’ve started adding useful functionality. So far I’ve gotten the VRAM read and write working, and I’ve been able to successfully fill the screen with characters using assembly code running on the main CPU.
Video
At the moment the video output is being driven by the “80×25 C0DF” driver from the waitvid.2048 repository. It generates 80×25 text using a 9×16 font; each character has an attribute byte associated with it that points to a 256-entry color palette. Each color entry in turn consists of a foreground/background color pair and a blink bit. The driver also supports two independent hardware cursors that can be a block or an underline, with or without blinking. The video buffer, color palette, and the font are all in hub RAM, so in theory they could all be made changeable by the main CPU.
I would like to offer the ability to switch to an alternate video driver (either a limited resolution bitmap, or perhaps a tiled driver with sprites), but as of yet I have not worked out how to accomplish this.
Sound
Since I have video the most natural choice for another thing to add is audio. As it turns out someone has written a Propeller module called SIDcog that emulates the C64 SID chip. It takes only a single cog to run, uses very little hub RAM (just a couple dozen bytes for registers) and only two I/O pins.
The SIDcog module is very simple to use; you tell it what pins to use for left/right audio, and it returns a pointer to a block of emulated SID registers in hub memory. Reading or writing those locations will affect the emulated SID just like it would a real one. So, in theory, once I’ve finished implementing the write_register function in my bus interface I will be able to play sound.
SPI
An SPI cog is the last piece I plan to add, since at that point I will be almost out of I/O pins. The propeller will handle the actual SPI transfers and signal the main CPU via interrupt once the transfer is complete. To allow reading or writing of entire SDcard blocks I plan to implement a small 512-byte buffer. I am not yet sure how this will be implemented at the bus interface side but I have some ideas.
Future Work
At the moment I’m hard at work getting the bus interface fully implemented. My focus is on getting the video registers implemented enough that I can try redirecting the console output to VGA. Expect another post (with pictures) once that VGA boot screen is working!