![]() |
RTEMS 6.1-rc7
|
BSP support middleware for 'new-exception' style PPC.
T. Straumann, 12/2007
In this README we refer to exceptions and sometimes to 'interrupts'. Interrupts simply are asynchronous exceptions such as 'external' exceptions or 'decrementer' /'timer' exceptions.
Traditionally (in the libbsp/powerpc/shared implementation), synchronous exceptions are handled entirely in the context of the interrupted task, i.e., the exception handlers use the task's stack and leave thread-dispatching enabled, i.e., scheduling is allowed to happen 'in the middle' of an exception handler.
Asynchronous exceptions/interrupts, OTOH, use a dedicated interrupt stack and defer scheduling until after the last nested ISR has finished.
The 'new-exception' processing API works at a rather low level. It provides functions for installing low-level code (which must be written in assembly code) directly into the PPC vector area. It is entirely left to the BSP to implement low-level exception handlers and to implement an API for C-level exception handlers and to implement the RTEMS interrupt API defined in cpukit/include/rtems/irq.h.
The result has been a Darwinian evolution of variants of this code which is very hard to maintain. Mostly, the four files
libbsp/powerpc/shared/vectors/vectors.S (low-level handlers for 'normal' or 'synchronous' exceptions. This code saves all registers on the interrupted task's stack and calls a 'global' C (high-level) exception handler.
libbsp/powerpc/shared/vectors/vectors_init.c (default implementation of the 'global' C exception handler and initialization of the vector table with trampoline code that ends up calling the 'global' handler.
libbsp/powerpc/shared/irq/irq_asm.S (low-level handlers for 'IRQ'-type or 'asynchronous' exceptions. This code is very similar to vectors.S but does slightly more: after saving (only the minimal set of) registers on the interrupted task's stack it disables thread-dispatching, switches to a dedicated ISR stack (if not already there which is possible for nested interrupts) and then executes the high level (C) interrupt dispatcher 'C_dispatch_irq_handler()'. After 'C_dispatch_irq_handler()' returns the stack is switched back (if not a nested IRQ), thread-dispatching is re-enabled, signals are delivered and a context switch is initiated if necessary.
libbsp/powerpc/shared/irq/irq.c implementation of the RTEMS ('new') IRQ API defined in cpukit/include/rtems/irq.h.
have been copied and modified by a myriad of BSPs leading to many slightly different variants.
The code in this directory is an attempt to provide the functionality implemented by the aforementioned files in a more generic way so that it can be shared by more BSPs rather than being copied and modified.
Another important goal was eliminating all conditional compilation which tested for specific CPU models by means of C-preprocessor symbols (#ifdef ppcXYZ). Instead, appropriate run-time checks for features defined in cpuIdent.h are used.
The assembly code has been (almost completely) rewritten and it tries to address a few problems while deliberately trying to live with the existing APIs and semantics (how these could be improved is beyond the scope but that they could is beyond doubt...):
The middleware uses exception 'categories' or 'flavors' as defined in raw_exception.h.
The middleware consists of the following parts:
1 small 'prologue' snippets that encode the vector information and jump to appropriate 'flavored-wrapper' code for further handling. Some PPC exceptions are spaced only 16-bytes apart, so the generic prologue snippets are only 16-bytes long. Prologues for synchronuos and asynchronous exceptions differ.
2 flavored-wrappers which sets up a stack frame and do things that are specific for different 'flavors' of exceptions which currently are
Assembler macros are provided and they can be expanded to generate prologue templates and flavored-wrappers for different flavors of exceptions. Currently, there are two prologues for all aforementioned flavors. One for synchronous exceptions, the other for interrupts.
3 generic assembly-level code that does the bulk of saving register context and calling C-code.
4 C-code (ppc_exc_hdl.c) for dispatching BSP/user handlers.
5 Initialization code (vectors_init.c). All valid exceptions for the detected CPU are determined and a fitting prologue snippet for the exception category (classic, critical, synchronous or IRQ, ...) is generated from a template and the vector number and then installed in the vector area.
The user/BSP only has to deal with installing high-level handlers but by default, the standard 'C_dispatch_irq_handler' routine is hooked to the external and 'decrementer' exceptions.
6 RTEMS IRQ API is implemented by 'irq.c'. It relies on a few routines to be provided by the BSP.
BSP writers must provide the following routines (declared in irq_supp.h): Interrupt controller (PIC) support:
Note that BSP_rtems_irq_mngt_set() hooks the C_dispatch_irq_handler() to the external and decrementer (PIT exception for bookE; a decrementer emulation is activated) exceptions for backwards compatibility reasons. C_dispatch_irq_handler() must therefore be able to support these two exceptions. However, the BSP implementor is free to either disconnect C_dispatch_irq_handler() from either of these exceptions, to connect other handlers (e.g., for SYSMGMT exceptions) or to hook C_dispatch_irq_handler() to yet more exceptions etc. *after* BSP_rtems_irq_mngt_set() executed. Hooking exceptions: The API defined in vectors.h declares routines for connecting a C-handler to any exception. Note that the execution environment of the C-handler depends on the exception being synchronous or asynchronous: - synchronous exceptions use the task stack and do not disable thread dispatching scheduling. - asynchronous exceptions use a dedicated stack and do defer thread dispatching until handling has (almost) finished. By inspecting the vector number stored in the exception frame the nature of the exception can be determined: asynchronous exceptions have the most significant bit(s) set. Any exception for which no dedicated handler is registered ends up being handled by the routine addressed by the (traditional) 'globalExcHdl' function pointer. Makefile.am: - make sure the Makefile.am does NOT use any of the files vectors.S, vectors.h, vectors_init.c, irq_asm.S, irq.c from 'libbsp/powerpc/shared' NOR must the BSP implement any functionality that is provided by those files (and now the middleware). - (probably) remove 'vectors.rel' and anything related - add
to 'include_bsp_HEADERS'
add
to 'libbsp_a_LIBADD'
(irq.c is in a separate '.rel' so that you can get support for exceptions only).
On classic PPCs, early (and late) parts of the low-level exception handling code run with the MMU disabled which mean that the default caching attributes (write-back) are in effect (thanks to Thomas Doerfler for bringing this up). The code currently assumes that the MMU translations for the task and interrupt stacks as well as some variables in the data-area MATCH THE DEFAULT CACHING ATTRIBUTES (this assumption also holds for the old code in libbsp/powepc/shared/vectors ../irq).
During initialization of exception handling, a crude test is performed to check if memory seems to have the write-back attribute. The 'dcbz' instruction should - on most PPCs - cause an alignment exception if the tested cache-line does not have this attribute.
BSPs which entirely disable caching (e.g., by physically disabling the cache(s)) should set the variable ppc_exc_cache_wb_check = 0 prior to calling initialize_exceptions(). Note that this check does not catch all possible misconfigurations (e.g., on the 860, the default attribute is AFAIK [libcpu/powerpc/mpc8xx/mmu/mmu_init.c] set to 'caching-disabled' which is potentially harmful but this situation is not detected).
The problematic race condition is as follows:
Usually, ISRs are allowed to use certain OS primitives such as e.g., releasing a semaphore. In order to prevent a context switch from happening immediately (this would result in the ISR being suspended), thread-dispatching must be disabled around execution of the ISR. However, on the PPC architecture it is neither possible to atomically disable ALL interrupts nor is it possible to atomically increment a variable (the thread-dispatch-disable level). Hence, the following sequence of events could occur: 1) low-priority interrupt (LPI) is taken 2) before the LPI can increase the thread-dispatch-disable level or disable high-priority interupts, a high-priority interrupt (HPI) happens 3) HPI increases dispatch-disable level 4) HPI executes high-priority ISR which e.g., posts a semaphore 5) HPI decreases dispatch-disable level and realizes that a context switch is necessary 6) context switch is performed since LPI had not gotten to the point where it could increase the dispatch-disable level. At this point, the LPI has been effectively suspended which means that the low-priority ISR will not be executed until the task interupted in 1) is scheduled again!
The solution to this problem is letting the first machine instruction of the low-priority exception handler write a non-zero value to a variable in memory:
ee_vector_offset:
stw r1, ee_lock@sdarel(r13) .. save some registers etc.. .. increase thread-dispatch-disable-level .. clear 'ee_lock' variable
After the HPI decrements the dispatch-disable level it checks 'ee_lock' and refrains from performing a context switch if 'ee_lock' is nonzero. Since the LPI will complete execution subsequently it will eventually do the context switch.
For the single-instruction write operation we must a) write a register that is guaranteed to be non-zero (e.g., R1 (stack pointer) or R13 (SVR4 short-data area). b) use an addressing mode that doesn't require loading any registers. The short-data area pointer R13 is appropriate.
CAVEAT: unfortunately, this method by itself is NOT enough because raising a low-priority exception and executing the first instruction of the handler is NOT atomic. Hence, the following could occur:
1) LPI is taken 2) PC is saved in SRR0, PC is loaded with address of 'locking instruction' stw r1, ee_lock@sdarel(r13) 3) ==> critical interrupt happens 4) PC (containing address of locking instruction) is saved in CSRR0 5) HPI is dispatched
For the HPI to correctly handle this situation it does the following:
a) increase thread-dispatch disable level b) do interrupt work c) decrease thread-dispatch disable level d) if ( dispatch-disable level == 0 ) d1) check ee_lock d2) check instruction at *CSRR0 d3) do a context switch if necessary ONLY IF ee_lock is NOT set AND *CSRR0 is NOT the 'locking instruction' this works because the address of 'ee_lock' is embedded in the locking instruction 'stw r1, ee_lock@sdarel(r13)' and because the registers r1/r13 have a special purpose (stack-pointer, SDA-pointer). Hence it is safe to assume that the particular instruction 'stw r1,ee_lock&sdarel(r13)' never occurs anywhere else. Another note: this algorithm also makes sure that ONLY nested ASYNCHRONOUS interrupts which enable/disable thread-dispatching and check if thread-dispatching is required before returning control engage in this locking protocol. It is important that when a critical, asynchronous interrupt interrupts a 'synchronous' exception (which does not disable thread-dispatching) the thread-dispatching operation upon return of the HPI is NOT deferred (because the synchronous handler would not, eventually, check for a dispatch requirement). And one more note: We never want to disable machine-check exceptions to avoid a checkstop. This means that we cannot use enabling/disabling this type of exception for protection of critical OS data structures. Therefore, calling OS primitives from a asynchronous machine-check handler is ILLEGAL and not supported. Since machine-checks can happen anytime it is not legal to test if a deferred context switch should be performed when the asynchronous machine-check handler returns (since _Context_Switch_is_necessary could have been set by a IRQ-protected section of code that was hit by the machine-check). Note that synchronous machine-checks can legally use OS primitives and currently there are no asynchronous machine-checks defined.
You have to disable all asynchronous exceptions which may cause a context switch before the restoring of the SRRs and the RFI. Reason:
Suppose we are in the epilogue code of an EE between the move to SRRs and the RFI. Here EE is disabled but CE is enabled. Now a CE happens. The handler decides that a thread dispatch is necessary. The CE checks if this is possible:
o The thread dispatch disable level is 0, because the EE has already decremented it. o The EE lock variable is cleared. o The EE executes not the first instruction.
Hence a thread dispatch is allowed. The CE issues a context switch to a task with EE enabled (for example a task waiting for a semaphore). Now a EE happens and the current content of the SRRs is lost.