1The SPARC port 2############## 3 4The SPARC port targets various machines from Sun product lineup. The initial effort is on the 5Ultra 60 and Ultra 5, with plans to latter add the Sun T5120 and its newer CPU. This may change 6depending on hardware donations and developer interest. 7 8Support for 32-bit versions of SPARC is currently not planned. 9 10SPARC ABI 11========= 12 13The SPARC architecture has 32 integer registers, divided as follows: 14 15- global registers (g0-g7) 16- input (i0-i7) 17- local (l0-l7) 18- output (o0-o7) 19 20Parameter passing and return is done using the output registers, which are 21generally considered scratch registers and can be corrupted by the callee. The 22caller must take care of preserving them. 23 24The input and local registers are callee-saved, but we have hardware assistance 25in the form of a register window. There is an instruction to shift the registers 26so that: 27 28- o registers become i registers 29- local and output registers are replaced with fresh sets, for use by the 30 current function 31- global registers are not affected 32 33Note that as a side-effect, o7 is moved to i7, this is convenient because these 34are usually the stack and frame pointers, respectively. So basically this sets 35the frame pointer for free. 36 37Simple enough functions may end up using just the o registers, in that case 38nothing special is necessary, of course. 39 40When shifting the register window, the extra registers come from the register 41stack in the CPU. This is not infinite, however, most implementations of SPARC 42will only have 8 windows available. When the internal stack is full, an overflow 43trap is raised, and the handler must free up old windows by storing them on the 44stack, likewise, when the internal stack is empty, an underflow trap must fill 45it back from the stack-saved data. 46 47Misaligned memory access 48======================== 49 50The SPARC CPU is not designed to gracefully handle misaligned accesses. 51You can access a single byte at any address, but 16-bit access only at even 52addresses, 32bit access at multiple of 4 addresses, etc. 53 54For example, on x86, such accesses are not a problem, it is allowed and handled 55directly by the instructions doing the access. So there is no performance cost. 56 57On SPARC, however, such accesses will cause a SIGBUS. This means a trap handler 58has to catch the misaligned access and do it in software, byte by byte, then 59give back control to the application. This is, of course, very slow, so we 60should avoid it when possible. 61 62Fortunately, gcc knows about this, and will normally do the right thing: 63 64- For usual variables and structures, it will make sure to lay them out so that 65 they are aligned. It relies on stack alignment, as well as malloc returning 66 sufficiently aligned memory (as required by the C standard). 67- On packed structure, gcc knows the data is misaligned, and will automatically 68 use the appropriate way to access it (most likely, byte-by-byte). 69 70This leaves us with two undesirable cases: 71 72- Pointer arithmetics and casting. When computing addresses manually, it's 73 possible to generate a misaligned address and cast it to a type with a wider 74 alignment requirement. In this case, gcc may access the pointer using a 75 multi byte instruction and cause a SIGBUS. Solution: make sure the struct 76 is aligned, or declare it as packed so unaligned access are used instead. 77- Access to hardware: it is a common pattern to declare a struct as packed, 78 and map it to hardware registers. If the alignment isn't known, gcc will use 79 byte by byte access. It seems volatile would cause gcc to use the proper way 80 to access the struct, assuming that a volatile value is necessarily 81 aligned as it should. 82 83In the end, we just need to be careful about pointer math resulting in unalined 84access. -Wcast-align helps with that, but it also raises a lot of false positives 85(where the alignment is preserved even when casting to other types). So we 86enable it only as a warning for now. We will need to ceck the sigbus handler to 87identify places where we do a lot of misaligned accesses that trigger it, and 88rework the code as needed. But in general, except for these cases, we're fine. 89 90The Ultrasparc MMUs 91============================ 92 93First, a word of warning: the MMU was different in SPARCv8 (32bit) 94implementations, and it was changed again on newer CPUs. 95 96The Ultrasparc-II we are supporting for now is documented in the Ultrasparc 97user manual. There were some minor changes in the Ultrasparc-III to accomodate 98larger physical addresses. This was then standardized as JPS1, and Fujitsu 99also implemented it. 100 101Later on, the design was changed again, for example Ultrasparc T2 (UA2005 102architecture) uses a different data structure format to enlarge, again, the 103physical and virtual address tags. 104 105For now te implementation is focused on Ultrasparc-II because that's what I 106have at hand, later on we will need support for the more recent systems. 107 108Ultrasparc-II MMU 109----------------- 110 111There are actually two separate units for the instruction and data address 112spaces, known as I-MMU and D-MMU. They each implement a TLB (translation 113lookaside buffer) for the recently accessed pages. 114 115This is pretty much all there is to the MMU hardware. No hardware page table 116walk is provided. However, there is some support for implementing a TSB 117(Translation Storage Buffer) in the form of providing a way to compute an 118address into that buffer where the data for a missing page could be. 119 120It is up to software to manage the TSB (globally or per-process) and in general 121keep track of the mappings. This means we are relatively free to manage things 122however we want, as long as eventually we can feed the iTLB and dTLB with the 123relevant data from the MMU trap handler. 124 125To make sure we can handle the fault without recursing, we need to pin a few 126items in place: 127 128In the TLB: 129 130- TLB miss handler code 131- TSB and any linked data that the TLB miss handler may need 132- asynchronous trap handlers and data 133 134In the TSB: 135 136- TSB-miss handling code 137- Interrupt handlers code and data 138 139So, from a given virtual address (assuming we are using only 8K pages and a 140512 entry TSB to keep things simple): 141 142VA63-44 are unused and must be a sign extension of bit 43 143VA43-22 are the 'tag' used to match a TSB entry with a virtual address 144VA21-13 are the offset in the TSB at which to find a candidate entry 145VA12-0 are the offset in the 8K page, and used to form PA12-0 for the access 146 147Inside the TLBs, VA63-13 is stored, so there can be multiple entries matching 148the same tag active at the same time, even when there is only one in the TSB. 149The entries are rotated using a simple LRU scheme, unless they are locked of 150course. Be careful to not fill a TLB with only locked entries! Also one must 151take care of not inserting a new mapping for a given VA without first removing 152any possible previous one (no need to worry about this when handling a TLB 153miss however, as in that case we obviously know that there was no previous 154entry). 155 156Entries also have a "context". This could for example be mapped to the process 157ID, allowing to easily clear all entries related to a specific context. 158 159TSB entries format 160------------------ 161 162Each entry is composed of two 64bit values: "Tag" and "Data". The data uses the 163same format as the TLB entries, however the tag is different. 164 165They are as follow: 166 167Tag 168*** 169 170Bit 63: 'G' indicating a global entry, the context should be ignored. 171Bits 60-48: context ID (13 bits) 172Bits 41-0: VA63-22 as the 'tag' to identify this entry 173 174Data 175**** 176 177Bit 63: 'V' indicating a valid entry, if it's 0 the entry is unused. 178Bits 62-61: size: 8K, 64K, 512K, 4MB 179Bit 60: NFO, indicating No Fault Only 180Bit 59: Invert Endianness of accesses to this page 181Bits 58-50: reserved for use by software 182Bits 49-41: reserved for diagnostics 183Bits 40-13: Physical Address<40-13> 184Bits 12-7: reserved for use by software 185Bit 6: Lock in TLB 186Bit 5: Cachable physical 187Bit 4: Cachable virtual 188Bit 3: Access has side effects (HW is mapped here, or DMA shared RAM) 189Bit 2: Privileged 190Bit 1: Writable 191Bit 0: Global 192 193TLB internal tag 194**************** 195 196Bits 63-13: VA<63-13> 197Bits 12-0: context ID 198 199Conveniently, a 512 entries TSB fits exactly in a 8K page, so it can be locked 200in the TLB with a single entry there. However, it may be a wise idea to instead 201map 64K (or more) of RAM locked as a single entry for all the things that needs 202to be accessed by the TLB miss trap handler, so we minimize the use of TLB 203entries. 204 205Likewise, it may be useful to use 64K pages instead of 8K whenever possible. 206The hardware provides some support for mixing the two sizes but it makes things 207a bit more complex. Let's start out with simpler things. 208 209Software floating-point support 210=============================== 211 212The SPARC instruction set specifies instruction for handling long double 213values, however, no hardware implementation actually provides them. They 214generate a trap, which is expected to be handled by the softfloat library. 215 216Since traps are slow, and gcc knows better, it will never generate those 217instructions. Instead it directly calls into the C library, to functions 218specified in the ABI and used to do long double math using softfloats. 219 220The support code for this is, in our case, compiled into both the kernel and 221libroot. It lives in src/system/libroot/os/arch/sparc/softfloat.c (and other 222support files). This code was extracted from FreeBSD, rather than the glibc, 223because that made it much easier to get it building in the kernel. 224 225Openboot bootloader 226=================== 227 228Openboot is Sun's implementation of Open Firmware. So we should be able to share 229a lot of code with the PowerPC port. There are some differences however. 230 231Executable format 232----------------- 233 234PowerPC uses COFF. Sparc uses a.out, which is a lot simpler. According to the 235spec, some fields should be zeroed out, but they say implementation may chose 236to allow other values, so a standard a.out file works as well. 237 238It used to be possible to generate one with objcopy, but support was removed, 239so we now use elf2aout (imported from FreeBSD). 240 241The file is first loaded at 4000, then relocated to its load address (we use 242202000 and executed there) 243 244Openfirmware prompt 245------------------- 246 247To get the prompt on display, use STOP+A at boot until you get the "ok" prompt. 248On some machines, if no keyboard is detected, the ROM will assume it is set up 249in headless mode, and will expect a BREAK+A on the serial port. 250 251STOP+N resets all variables to default values (in case you messed up input or 252output, for example). 253 254Useful commands 255--------------- 256 257Disable autoboot to get to the openboot prompt and stop there 258 259.. code-block:: text 260 261 setenv auto-boot? false 262 263Configuring for keyboard/framebuffer io 264 265.. code-block:: text 266 267 setenv screen-#columns 160 268 setenv screen-#rows 49 269 setenv output-device screen:r1920x1080x60 270 setenv input-device keyboard 271 272Configuring openboot for serial port 273 274.. code-block:: text 275 276 setenv ttya-mode 38400,8,n,1,- 277 setenv output-device ttya 278 setenv input-device ttya 279 reset 280 281Boot from network 282----------------- 283 284static ip 285********* 286 287This currently works best, because rarp does not let the called binary know the 288IP address. We need the IP address if we want to mount the root filesystem using 289remote_disk server. 290 291.. code-block:: text 292 293 boot net:192.168.1.2,somefile,192.168.1.89 294 295The first IP is the server from which to download (using TFTP), the second is 296the client IP to use. Once the bootloader starts, it will detect that it is 297booted from network and look for a the remote_disk_server on the same machine. 298 299rarp 300**** 301 302This needs a reverse ARP server (easy to setup on any Linux system). You need 303to list the MAC address of the SPARC machine in /etc/ethers on the server. The 304machine will get its IP, and will use TFTP to the server which replied, to get 305the boot file from there. 306 307.. code-block:: text 308 309 boot net:,somefile 310 311(net is an alias to the network card and also sets the load address: /pci@1f,4000/network@1,1) 312 313dhcp 314**** 315 316This needs a DHCP/BOOTP server configured to send the info about where to find 317the file to load and boot. 318 319.. code-block:: text 320 321 boot net:dhcp 322 323 324 325Debugging 326--------- 327 328.. code-block:: text 329 330 202000 dis (disassemble starting at 202000 until next return instruction) 331 4000 1000 dump (dump 1000 bytes from address 4000) 332 .registers (show global registers) 333 .locals (show local/windowed registers) 334 %pc dis (disassemble code being exectuted) 335 ctrace (backtrace) 336