EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: CA subscription synchronisation shutdown problem
From: <[email protected]>
To: <[email protected]>, <[email protected]>
Date: Thu, 2 May 2013 07:52:04 +0000
Hi Jeff,

This e-mail is really a follow up to this thread from a year ago: http://www.aps.anl.gov/epics/tech-talk/2012/msg00584.php .  (Alas, I can't check this link because the APS web site seems to be poorly this morning.)

Back then I was seeing signs that CA subscription callbacks were being called after returning from ca_clear_subscription ... in this e-mail I have what looks like a definitive demonstration!

In the attached test IOC I repeatedly create 500 subscriptions to 500 locally published PVs, pause a few hundred microseconds, and then proceed to tear them all down again.  The context pointer I pass (args.usr) just contains a validity flag which I reset after ca_clear_subscription returns -- and which I test in the callback.

Below is a typical run:

$ ./test 10 500
dbLoadDatabase("dbd/TEST.dbd", NULL, NULL)
TEST_registerRecordDeviceDriver(pdbbase)
dbLoadRecords("db/TEST.db", NULL)
iocInit()
Starting iocInit
############################################################################
## EPICS R3.14.11 $R3-14-11$ $2009/08/28 18:47:36$
## EPICS Base built Nov  4 2011
############################################################################
iocRun: All initialization complete
All channels connected
Testing 10 cycles, interval 500 us
[........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................whoops!
][


The two arguments to `test` are number of times to try and how long to pause between create and clear (in microseconds, passed to usleep(3)). [ and ] are printed at the start and end of a cycle (so [ is immediately followed by a burst of ca_create_subscription() calls) and each . represents a successful callback.  An unsuccessful (invalid) callback is shown by 'whoops!' which is followed by an exit() call.

This test can be very delicate and difficult to reproduce, and may need to be run many times with slightly different pause intervals before being even partially repeatable -- the fault only appears to show when there isn't time for all 500 PVs to complete their initial updates, but there has to be enough time for them all to make the effort.

Another interesting detail follows from some locking I'm doing.  Here is an extract of the relevant code (LOCK() is just pthread_mutex_lock(3p) on a global mutex):

1	static void on_update(struct event_handler_args)
2	{
3	    struct event *event = args.usr;
4	    LOCK();
5	    bool valid = event->valid;
6	    UNLOCK();
7	    if (valid) ...
8	}

	...

9	    LOCK();		// This should trigger deadlock
10	    ca_clear_subscription(event->event_id);
11	    event->valid = false;
12	    UNLOCK();

It seems to me that if ca_clear_subscription() is correctly doing what we discussed a year ago, which is to say, if it is waiting for all outstanding callbacks to complete before returning, then the LOCK() on line 9 should trigger a deadlock when ca_clear_subscription() is called with its associated callback still only on line 3 (or earlier).  But I never see my test deadlock.

I'm seeing this problem occur on test code which is repeatedly creating and destroying subscriptions, but I've previously reported this on CA client shutdown, so it does look to me like there is a general synchronisation problem here.  I believe I have a workaround, which is to delay releasing the callback context to give time for outstanding callbacks to complete, but this is a bit worrysome...



-- 

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.

Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 

Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.

Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

 







Attachment: test-ioc.tgz
Description: test-ioc.tgz


Navigate by Date:
Prev: Re: EPICS Extensions Compilation Errors murad ali
Next: Re: Change in behavior of the fullPathName.pl Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: areaDetector driver for Princeton Instruments PICAM cameras? Mark Rivers
Next: RE: CA subscription synchronisation shutdown problem michael.abbott
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·