EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: CA client (on IOC) question
From: "Jeff Hill" <[email protected]>
To: "'Jeff Hill'" <[email protected]>, "'Ralph Lange'" <[email protected]>, "'EPICS Core Talk'" <[email protected]>
Date: Tue, 8 Aug 2006 10:26:41 -0600
PS: You might also run casw to see if there are beacon anomalies routinely
occurring on your network (which can also impact search rates). At the SNS I
did see at one point a high rate of beacon anomalies occurring due to lost
frames resulting from mismatched full/half duplex mode between the IOC and
its switch.

Jeff

> -----Original Message-----
> From: Jeff Hill [mailto:[email protected]]
> Sent: Tuesday, August 08, 2006 10:22 AM
> To: 'Ralph Lange'; 'EPICS Core Talk'
> Subject: RE: CA client (on IOC) question
> 
> 
> Ralph,
> 
> > looking at bad things happening and logs from our network switches it
> > seems that the CA client that runs on the IOC does a name resolve
> > request whenever any record with a link pointing into nirwana (aka an
> > unconnected link) is being processed.
> > Example: on IOC1, there are 100 records (scanned at 10 Hz) pointing to
> > 100 other records sitting on IOC2. As soon as IOC2 is down, IOC1
> > broadcasts a name resolution request for those 100 channels 10 times a
> > second.
> 
> The name resolution request rate shouldn't be tied to the record
> processing
> rate in any way whatsoever unless the channel (in earlier versions any
> channel) is being deleted and then recreated by DBCA whenever the record
> is
> being processed. I assume that DBCA doesn't do that.
> 
> After a new channel is created its name resolution requests are sent with
> an
> exponential back off controlling the delay between each subsequent search
> request.
> 
> The above mentioned behavior is only subtly different between different
> versions, but it *is* more robust for large systems with the very latest
> versions of R3.14. For example, in the latest versions of R3.14 the
> following are true.
> 
> 1) When creating a new channel this does not cause the search rate for
> preexisting unresolved channels to be set to the search rate of the new
> channel.
> 
> 2) When a circuit is detected to be unresponsive the client application
> receives a disconnect notify callback, but the circuit itself is not
> disconnected.
> 
> To get in context, what version of EPICS is running in the IOC that is
> doing
> all of the searching?
> 
> > (trying to find out why a single IOC going halfway down drives _all_ our
> > IOCs into 95+ percent of cpu usage)
> 
> I have not seen that on any of the projects I have worked on. Is it
> possible
> that there is an issue there with a gateway's forwarding (and infinite
> looping) a search request?
> 
> > As soon as IOC2 is down, IOC1 broadcasts a name resolution
> > request for those 100 channels 10 times a second.
> 
> I do see that you are stating that the trouble is coming from the IOC
> which,
> if correct, admittedly makes my GW loop guess off the mark.
> 
> > As soon as IOC2 is down, IOC1 broadcasts a name resolution
> > request for those 100 channels 10 times a second.
> 
> It shouldn't be searching at that rate, but nevertheless, I am suspicious
> that this search rate would slam the CPU to 95+ percent. Perhaps that's
> true
> with old iron. One could easily write a test program that creates and then
> almost immediately deletes 100 channels on a 10 Hz rate. This program
> might
> be useful for demonstrating what the load impacts might be. Was the IOC
> already substantially loaded before it transitioned to a 95% loading?
> 
> In summary, an IOC should _not_ behave the way that you are describing.
> After I know what version is running I will have a closer look. I am also
> willing to log in remotely and debug the issue in an IOC that might be
> behaving this way if you would like.
> 
> Jeff
> 
> > -----Original Message-----
> > From: Ralph Lange [mailto:[email protected]]
> > Sent: Tuesday, August 08, 2006 4:57 AM
> > To: EPICS Core Talk
> > Subject: CA client (on IOC) question
> >
> > Hello Core,
> >
> > looking at bad things happening and logs from our network switches it
> > seems that the CA client that runs on the IOC does a name resolve
> > request whenever any record with a link pointing into nirwana (aka an
> > unconnected link) is being processed.
> > Example: on IOC1, there are 100 records (scanned at 10 Hz) pointing to
> > 100 other records sitting on IOC2. As soon as IOC2 is down, IOC1
> > broadcasts a name resolution request for those 100 channels 10 times a
> > second.
> >
> > Is that true? Is that smart?
> >
> > Confused,
> > Ralph
> > (trying to find out why a single IOC going halfway down drives _all_ our
> > IOCs into 95+ percent of cpu usage)


References:
RE: CA client (on IOC) question Jeff Hill

Navigate by Date:
Prev: RE: CA client (on IOC) question Jeff Hill
Next: Interesting blog posting Andrew Johnson
Index: 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: CA client (on IOC) question Jeff Hill
Next: Interesting blog posting Andrew Johnson
Index: 2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·