xref: /haiku/docs/develop/kernel/arch/sparc/overview.rst (revision 3d4afef9cba2f328e238089d4609d00d4b1524f3)
1The SPARC port
2##############
3
4The SPARC port targets various machines from Sun product lineup. The initial effort is on the
5Ultra 60 and Ultra 5, with plans to latter add the Sun T5120 and its newer CPU. This may change
6depending on hardware donations and developer interest.
7
8Support for 32-bit versions of SPARC is currently not planned.
9
10SPARC ABI
11=========
12
13The SPARC architecture has 32 integer registers, divided as follows:
14
15- global registers (g0-g7)
16- input (i0-i7)
17- local (l0-l7)
18- output (o0-o7)
19
20Parameter passing and return is done using the output registers, which are
21generally considered scratch registers and can be corrupted by the callee. The
22caller must take care of preserving them.
23
24The input and local registers are callee-saved, but we have hardware assistance
25in the form of a register window. There is an instruction to shift the registers
26so that:
27
28- o registers become i registers
29- local and output registers are replaced with fresh sets, for use by the
30  current function
31- global registers are not affected
32
33Note that as a side-effect, o7 is moved to i7, this is convenient because these
34are usually the stack and frame pointers, respectively. So basically this sets
35the frame pointer for free.
36
37Simple enough functions may end up using just the o registers, in that case
38nothing special is necessary, of course.
39
40When shifting the register window, the extra registers come from the register
41stack in the CPU. This is not infinite, however, most implementations of SPARC
42will only have 8 windows available. When the internal stack is full, an overflow
43trap is raised, and the handler must free up old windows by storing them on the
44stack, likewise, when the internal stack is empty, an underflow trap must fill
45it back from the stack-saved data.
46
47Misaligned memory access
48========================
49
50The SPARC CPU is not designed to gracefully handle misaligned accesses.
51You can access a single byte at any address, but 16-bit access only at even
52addresses, 32bit access at multiple of 4 addresses, etc.
53
54For example, on x86, such accesses are not a problem, it is allowed and handled
55directly by the instructions doing the access. So there is no performance cost.
56
57On SPARC, however, such accesses will cause a SIGBUS. This means a trap handler
58has to catch the misaligned access and do it in software, byte by byte, then
59give back control to the application. This is, of course, very slow, so we
60should avoid it when possible.
61
62Fortunately, gcc knows about this, and will normally do the right thing:
63
64- For usual variables and structures, it will make sure to lay them out so that
65  they are aligned. It relies on stack alignment, as well as malloc returning
66  sufficiently aligned memory (as required by the C standard).
67- On packed structure, gcc knows the data is misaligned, and will automatically
68  use the appropriate way to access it (most likely, byte-by-byte).
69
70This leaves us with two undesirable cases:
71
72- Pointer arithmetics and casting. When computing addresses manually, it's
73  possible to generate a misaligned address and cast it to a type with a wider
74  alignment requirement. In this case, gcc may access the pointer using a
75  multi byte instruction and cause a SIGBUS. Solution: make sure the struct
76  is aligned, or declare it as packed so unaligned access are used instead.
77- Access to hardware: it is a common pattern to declare a struct as packed,
78  and map it to hardware registers. If the alignment isn't known, gcc will use
79  byte by byte access. It seems volatile would cause gcc to use the proper way
80  to access the struct, assuming that a volatile value is necessarily
81  aligned as it should.
82
83In the end, we just need to be careful about pointer math resulting in unalined
84access. -Wcast-align helps with that, but it also raises a lot of false positives
85(where the alignment is preserved even when casting to other types). So we
86enable it only as a warning for now. We will need to ceck the sigbus handler to
87identify places where we do a lot of misaligned accesses that trigger it, and
88rework the code as needed. But in general, except for these cases, we're fine.
89
90The Ultrasparc MMUs
91============================
92
93First, a word of warning: the MMU was different in SPARCv8 (32bit)
94implementations, and it was changed again on newer CPUs.
95
96The Ultrasparc-II we are supporting for now is documented in the Ultrasparc
97user manual. There were some minor changes in the Ultrasparc-III to accomodate
98larger physical addresses. This was then standardized as JPS1, and Fujitsu
99also implemented it.
100
101Later on, the design was changed again, for example Ultrasparc T2 (UA2005
102architecture) uses a different data structure format to enlarge, again, the
103physical and virtual address tags.
104
105For now te implementation is focused on Ultrasparc-II because that's what I
106have at hand, later on we will need support for the more recent systems.
107
108Ultrasparc-II MMU
109-----------------
110
111There are actually two separate units for the instruction and data address
112spaces, known as I-MMU and D-MMU. They each implement a TLB (translation
113lookaside buffer) for the recently accessed pages.
114
115This is pretty much all there is to the MMU hardware. No hardware page table
116walk is provided. However, there is some support for implementing a TSB
117(Translation Storage Buffer) in the form of providing a way to compute an
118address into that buffer where the data for a missing page could be.
119
120It is up to software to manage the TSB (globally or per-process) and in general
121keep track of the mappings. This means we are relatively free to manage things
122however we want, as long as eventually we can feed the iTLB and dTLB with the
123relevant data from the MMU trap handler.
124
125To make sure we can handle the fault without recursing, we need to pin a few
126items in place:
127
128In the TLB:
129
130- TLB miss handler code
131- TSB and any linked data that the TLB miss handler may need
132- asynchronous trap handlers and data
133
134In the TSB:
135
136- TSB-miss handling code
137- Interrupt handlers code and data
138
139So, from a given virtual address (assuming we are using only 8K pages and a
140512 entry TSB to keep things simple):
141
142VA63-44 are unused and must be a sign extension of bit 43
143VA43-22 are the 'tag' used to match a TSB entry with a virtual address
144VA21-13 are the offset in the TSB at which to find a candidate entry
145VA12-0 are the offset in the 8K page, and used to form PA12-0 for the access
146
147Inside the TLBs, VA63-13 is stored, so there can be multiple entries matching
148the same tag active at the same time, even when there is only one in the TSB.
149The entries are rotated using a simple LRU scheme, unless they are locked of
150course. Be careful to not fill a TLB with only locked entries! Also one must
151take care of not inserting a new mapping for a given VA without first removing
152any possible previous one (no need to worry about this when handling a TLB
153miss however, as in that case we obviously know that there was no previous
154entry).
155
156Entries also have a "context". This could for example be mapped to the process
157ID, allowing to easily clear all entries related to a specific context.
158
159TSB entries format
160------------------
161
162Each entry is composed of two 64bit values: "Tag" and "Data". The data uses the
163same format as the TLB entries, however the tag is different.
164
165They are as follow:
166
167Tag
168***
169
170Bit 63: 'G' indicating a global entry, the context should be ignored.
171Bits 60-48: context ID (13 bits)
172Bits 41-0: VA63-22 as the 'tag' to identify this entry
173
174Data
175****
176
177Bit 63: 'V' indicating a valid entry, if it's 0 the entry is unused.
178Bits 62-61: size: 8K, 64K, 512K, 4MB
179Bit 60: NFO, indicating No Fault Only
180Bit 59: Invert Endianness of accesses to this page
181Bits 58-50: reserved for use by software
182Bits 49-41: reserved for diagnostics
183Bits 40-13: Physical Address<40-13>
184Bits 12-7: reserved for use by software
185Bit 6: Lock in TLB
186Bit 5: Cachable physical
187Bit 4: Cachable virtual
188Bit 3: Access has side effects (HW is mapped here, or DMA shared RAM)
189Bit 2: Privileged
190Bit 1: Writable
191Bit 0: Global
192
193TLB internal tag
194****************
195
196Bits 63-13: VA<63-13>
197Bits 12-0: context ID
198
199Conveniently, a 512 entries TSB fits exactly in a 8K page, so it can be locked
200in the TLB with a single entry there. However, it may be a wise idea to instead
201map 64K (or more) of RAM locked as a single entry for all the things that needs
202to be accessed by the TLB miss trap handler, so we minimize the use of TLB
203entries.
204
205Likewise, it may be useful to use 64K pages instead of 8K whenever possible.
206The hardware provides some support for mixing the two sizes but it makes things
207a bit more complex. Let's start out with simpler things.
208
209Software floating-point support
210===============================
211
212The SPARC instruction set specifies instruction for handling long double
213values, however, no hardware implementation actually provides them. They
214generate a trap, which is expected to be handled by the softfloat library.
215
216Since traps are slow, and gcc knows better, it will never generate those
217instructions. Instead it directly calls into the C library, to functions
218specified in the ABI and used to do long double math using softfloats.
219
220The support code for this is, in our case, compiled into both the kernel and
221libroot. It lives in src/system/libroot/os/arch/sparc/softfloat.c (and other
222support files). This code was extracted from FreeBSD, rather than the glibc,
223because that made it much easier to get it building in the kernel.
224
225Openboot bootloader
226===================
227
228Openboot is Sun's implementation of Open Firmware. So we should be able to share
229a lot of code with the PowerPC port. There are some differences however.
230
231Executable format
232-----------------
233
234PowerPC uses COFF. Sparc uses a.out, which is a lot simpler. According to the
235spec, some fields should be zeroed out, but they say implementation may chose
236to allow other values, so a standard a.out file works as well.
237
238It used to be possible to generate one with objcopy, but support was removed,
239so we now use elf2aout (imported from FreeBSD).
240
241The file is first loaded at 4000, then relocated to its load address (we use
242202000 and executed there)
243
244Openfirmware prompt
245-------------------
246
247To get the prompt on display, use STOP+A at boot until you get the "ok" prompt.
248On some machines, if no keyboard is detected, the ROM will assume it is set up
249in headless mode, and will expect a BREAK+A on the serial port.
250
251STOP+N resets all variables to default values (in case you messed up input or
252output, for example).
253
254Useful commands
255---------------
256
257Disable autoboot to get to the openboot prompt and stop there
258
259.. code-block:: text
260
261   setenv auto-boot? false
262
263Configuring for keyboard/framebuffer io
264
265.. code-block:: text
266
267   setenv screen-#columns 160
268   setenv screen-#rows 49
269   setenv output-device screen:r1920x1080x60
270   setenv input-device keyboard
271
272Configuring openboot for serial port
273
274.. code-block:: text
275
276   setenv ttya-mode 38400,8,n,1,-
277   setenv output-device ttya
278   setenv input-device ttya
279   reset
280
281Boot from network
282-----------------
283
284static ip
285*********
286
287This currently works best, because rarp does not let the called binary know the
288IP address. We need the IP address if we want to mount the root filesystem using
289remote_disk server.
290
291.. code-block:: text
292
293    boot net:192.168.1.2,somefile,192.168.1.89
294
295The first IP is the server from which to download (using TFTP), the second is
296the client IP to use. Once the bootloader starts, it will detect that it is
297booted from network and look for a the remote_disk_server on the same machine.
298
299rarp
300****
301
302This needs a reverse ARP server (easy to setup on any Linux system). You need
303to list the MAC address of the SPARC machine in /etc/ethers on the server. The
304machine will get its IP, and will use TFTP to the server which replied, to get
305the boot file from there.
306
307.. code-block:: text
308
309    boot net:,somefile
310
311(net is an alias to the network card and also sets the load address: /pci@1f,4000/network@1,1)
312
313dhcp
314****
315
316This needs a DHCP/BOOTP server configured to send the info about where to find
317the file to load and boot.
318
319.. code-block:: text
320
321    boot net:dhcp
322
323
324
325Debugging
326---------
327
328.. code-block:: text
329
330   202000 dis (disassemble starting at 202000 until next return instruction)
331   4000 1000 dump (dump 1000 bytes from address 4000)
332   .registers (show global registers)
333   .locals (show local/windowed registers)
334   %pc dis (disassemble code being exectuted)
335   ctrace (backtrace)
336