=============================================================================
Date: 26 Feb 1999 19:11:11 +0300
From: Sergei Organov <osv@Javad.RU>
To: rtems-list@oarcorp.com
Subject: Possible bug in RTEMS core.

Hello;

I think there is a bug in the '_Thread_Yield_processor()' routine. I
had encountered problem (the program just hangs) that disappears if I
comment out '_ISR_Flash(level)' call in the middle of the routine. The
problem description follows. It'd be fine if somebody experienced in
RTEMS internals takes a close look into '_Thread_Yield_processor()'.

The problem occurs when there are two tasks with equal priorities that
are just usual forever loops calling
rtems_task_wake_after(RTEMS_YIELD_PROCESSOR) at every loop
iteration. To run into trouble it's enough to have one of these two
tasks to wait on, let's say semaphore, that is released by _interrupt
service routine_ that handles asynchronous interrupt. In my program
the semaphore is inside RS232 driver and task just calls 'printf' that
eventually results in waiting on the semaphore:

task1() {
  for(;;) {
    printf("Task1\n");
    rtems_task_wake_after(RTEMS_YIELD_PROCESSOR);
  }
}

task2() {
  for(;;) {
    rtems_task_wake_after(RTEMS_YIELD_PROCESSOR);
  }
}

Commenting out any of calls makes problem go away. Making priorities
for 'task1' and 'task2' different (not the same) makes problem go
away. Commenting out '_ISR_Flash(level)' inside
'_Thread_Yield_processor' makes problem go away.

I used RTEMS 3.6.0 and my own BSP for custom hardware when problem
occured. Unfortunately I have no possibility to check if problem
exists if some of BSPs provided with RTEMS is used. All tests provided
with RTEMS run smoothly with my configuration, though. I looked into
RTEMS 4.0 code - it seems nothing significant was changed in this area
since 3.6.

BTW, do you think it is really critical for maximum interrupt latency
to have _ISR_Flash(level) call in the '_Thread_Yield_processor()'?

Also, I'd like to say that all these isn't at all critical for me - I
just want to inform you that maybe there is some bug here.

Regards,
Sergei Organov.

=============================================================================
Date: 26 Feb 1999 21:59:29 +0300
From: Sergei Organov <osv@Javad.RU>
To: joel@OARcorp.com
Subject: Re: Possible bug in RTEMS core.


Unfortunately the fix doesn't help. Still need to remove _ISR_Flash 
to make test work. :-(

I must say that I'm absolutely not sure that I didn't break something
in RTEMS playing with it (e.g. I put under ifdefs all code that deal
with multiprocessing to see how much space it takes). I tried to be
careful, but who knows. That's why it'd be fine if somebody who has
access to hardware natively supported by RTEMS makes similar test.

Please let me know if you have some other things for me to try.

Sergei.


joel@OARcorp.com writes:
> I am replying privately to make sure that my fix is right before posting
> it to the general list.
> 
> The problem is actually that a dispatch occurs at the flash.  Dispatching
> is supposed to be disabled but is not.  
> 
> See sched_yield() in posix/src/sched.c for the right way to call
> _Thread_Yield_processor() based on its current design.
> 
> So the code in rtems_task_wake_after() should be:
> 
> if ( ticks == 0 ) {
>   _Thread_Disable_dispatch();
>   _Thread_Yield_processor();
>   _Thread_Enable_dispatch();
> } else ....
> 
> Please try this fix and see if it resolves the problem.  I agree that the
> Flash point is probably not that critical but I think the root of the bug
> is that rtems_task_wake_after() is not disabling dispatching before
> calling yield.
> 
> If this fixes the problem, I will post a fix to the list.  This is a fix
> that would also go in a 4.0.1 branch. I have a couple of BSP specific
> problems on that branch now but this is more serious than any other
> reported problem. :(
> 
> We do appreciate your report.  Please help us close this problem.
> 
> Out of curiousity what are you doing with RTEMS?
> 
> --joel
> Joel Sherrill                    Director of Research & Development
> joel@OARcorp.com                 On-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>    Support Available             (256) 722-9985


=============================================================================
Date: 27 Feb 1999 14:00:44 +0300
From: Sergei Organov <osv@Javad.RU>
To: joel@OARcorp.com
Subject: Re: Possible bug in RTEMS core.

joel@OARcorp.com writes:
> On 26 Feb 1999, Sergei Organov wrote:
> 
> > Unfortunately the fix doesn't help. Still need to remove _ISR_Flash 
> > to make test work. :-(
> 
> Did it make it worse?  When you tinnker with dispatch enable, doing things
> wrong tends to break the entire system.

I didn't notice any difference. System just hangs in one-to-ten seconds
after start.

> 
> Do you have a simple test case that I can try locally?  Or is this just
> something you have to be very unlucky to run into.

The test itself is very simple. However, it does require that driver
for device 'printf' sends data to works using interrupts. It's indeed
that simple as I wrote in initial message. The only thing that is
different, it blinks on-board LEDs for me to see when it hangs. I can
send you this test code or/and entire RTEMS sources if you wish, but
hardware is proprietary board using Motorola MPC509 (Power-PC) chip.

> 
> > I must say that I'm absolutely not sure that I didn't break something
> > in RTEMS playing with it (e.g. I put under ifdefs all code that deal
> > with multiprocessing to see how much space it takes). I tried to be
> > careful, but who knows. That's why it'd be fine if somebody who has
> > access to hardware natively supported by RTEMS makes similar test.
> 
> 
> An admirable goal and one that will be in the next major release.  There
> is now a --disable-multiprocessing option to configure.  It did not save
> all that much code space BUT it is some.

Yes, it did not save much space/time for me as well.
=============================================================================
Date: 20 Apr 1999 17:45:25 +0400
From: Sergei Organov <osv@Javad.RU>
To: joel@OARcorp.com
Subject: Re: Possible bug in RTEMS core.

Joel,

I'm sorry, the hang of your test case was caused by too small stack
size. The configured RTEMS_MINIMUM_STACK_SIZE is too small when
'printf' is used in task.

After I fixed that, your test works.

However, turning on clock driver (by adding
#define CONFIGURE_TEST_NEEDS_CLOCK_DRIVER) to your code makes it hang
the same way as my code.

I'll take close look to my clock driver now.

Regards,
Sergei.

=============================================================================
Date: 20 Apr 1999 16:59:36 +0400
From: Sergei Organov <osv@Javad.RU>
To: joel@OARcorp.com
Subject: Re: Possible bug in RTEMS core.
Parts/Attachments:
   1 Shown     15 lines  Text
   2          808 bytes  Application
   3 Shown    154 lines  Text
----------------------------------------


I just ran your test case and it works the same (i.e. hangs after
output of approximately 1700 chars at 115200 baud).

I did change your test a little bit, because I need
RTEMS_FLOATING_POINT to be set for task for 'printf' to work (I think
it is known problem with 'printf' and friends, isn't it?).

Attached is 'test.c' with my changes for you to check.

Do you think the problem is PowerPC specific?

Regards,
Sergei.


=============================================================================
Date: 20 Apr 1999 12:48:02 +0400
From: Sergei Organov <osv@Javad.RU>
To: joel@OARcorp.com
Subject: Re: Possible bug in RTEMS core.

Joel,

I'll try your test case today and let you know the results. Below are
some answers:

joel@OARcorp.com writes:
> [1  <text/plain; US-ASCII (7bit)>]
> 
> I am sorry that it has taken so long to get back to you on this.  It was
> not a critical problem from your perspective and was not reported by
> anyone else so I wanted to wait until I thought I could explain exactly
> what was happening.  On top of that, I have been trying to finish my PhD
> so non-critical items have really not gotten much attention.  I defended
> in March and turned in all required copies last week so it is all downhill
> now. :)

No problem.

> 
> Anyway... attached you will find my test program based on your report.
> This did not reproduce the problem on the couple of targets I tried.  I
> could have a couple of test setup things different from you:
> 
> 1.  is a clock tick enabled?  If so, what is the clock tick
> configured at?

Yes, the clock tick is enabled. It is configured to occur 100 times
per second.

> 
> 2.  Is there another interrupt source not mentioned?

I don't think there is. MPC505 has only one external interrupt input
and only RS232 interrupt is enabled by external interrupt controller 
mask.

> 
> 3.  How long does this take to occur?

A few seconds.

> 
> 4.  Anything else I might have gotten wrong in the test
>     construction.

Will see after I try your test. The only difference I see just now is
that in your test case clock tick is apparently disabled.

> 
> 5.  What CPU are you using?  I don't seem to have noticed this in the
>     email we have exchanged.  There is always a chance that there is 
>     some register being clobbered in the ISR that is causing this.  Then
>     it would not be a general design issue but CPU specific.  And the
>     _ISR_Flash would fix it.

MPC505 (MPC509 actually). PowerPC-603 core, I believe.

=============================================================================
