PS: You might also run casw to see if there are beacon anomalies routinely
occurring on your network (which can also impact search rates). At the SNS I
did see at one point a high rate of beacon anomalies occurring due to lost
frames resulting from mismatched full/half duplex mode between the IOC and
its switch.
Jeff
> -----Original Message-----
> From: Jeff Hill [mailto:[email protected]]
> Sent: Tuesday, August 08, 2006 10:22 AM
> To: 'Ralph Lange'; 'EPICS Core Talk'
> Subject: RE: CA client (on IOC) question
>
>
> Ralph,
>
> > looking at bad things happening and logs from our network switches it
> > seems that the CA client that runs on the IOC does a name resolve
> > request whenever any record with a link pointing into nirwana (aka an
> > unconnected link) is being processed.
> > Example: on IOC1, there are 100 records (scanned at 10 Hz) pointing to
> > 100 other records sitting on IOC2. As soon as IOC2 is down, IOC1
> > broadcasts a name resolution request for those 100 channels 10 times a
> > second.
>
> The name resolution request rate shouldn't be tied to the record
> processing
> rate in any way whatsoever unless the channel (in earlier versions any
> channel) is being deleted and then recreated by DBCA whenever the record
> is
> being processed. I assume that DBCA doesn't do that.
>
> After a new channel is created its name resolution requests are sent with
> an
> exponential back off controlling the delay between each subsequent search
> request.
>
> The above mentioned behavior is only subtly different between different
> versions, but it *is* more robust for large systems with the very latest
> versions of R3.14. For example, in the latest versions of R3.14 the
> following are true.
>
> 1) When creating a new channel this does not cause the search rate for
> preexisting unresolved channels to be set to the search rate of the new
> channel.
>
> 2) When a circuit is detected to be unresponsive the client application
> receives a disconnect notify callback, but the circuit itself is not
> disconnected.
>
> To get in context, what version of EPICS is running in the IOC that is
> doing
> all of the searching?
>
> > (trying to find out why a single IOC going halfway down drives _all_ our
> > IOCs into 95+ percent of cpu usage)
>
> I have not seen that on any of the projects I have worked on. Is it
> possible
> that there is an issue there with a gateway's forwarding (and infinite
> looping) a search request?
>
> > As soon as IOC2 is down, IOC1 broadcasts a name resolution
> > request for those 100 channels 10 times a second.
>
> I do see that you are stating that the trouble is coming from the IOC
> which,
> if correct, admittedly makes my GW loop guess off the mark.
>
> > As soon as IOC2 is down, IOC1 broadcasts a name resolution
> > request for those 100 channels 10 times a second.
>
> It shouldn't be searching at that rate, but nevertheless, I am suspicious
> that this search rate would slam the CPU to 95+ percent. Perhaps that's
> true
> with old iron. One could easily write a test program that creates and then
> almost immediately deletes 100 channels on a 10 Hz rate. This program
> might
> be useful for demonstrating what the load impacts might be. Was the IOC
> already substantially loaded before it transitioned to a 95% loading?
>
> In summary, an IOC should _not_ behave the way that you are describing.
> After I know what version is running I will have a closer look. I am also
> willing to log in remotely and debug the issue in an IOC that might be
> behaving this way if you would like.
>
> Jeff
>
> > -----Original Message-----
> > From: Ralph Lange [mailto:[email protected]]
> > Sent: Tuesday, August 08, 2006 4:57 AM
> > To: EPICS Core Talk
> > Subject: CA client (on IOC) question
> >
> > Hello Core,
> >
> > looking at bad things happening and logs from our network switches it
> > seems that the CA client that runs on the IOC does a name resolve
> > request whenever any record with a link pointing into nirwana (aka an
> > unconnected link) is being processed.
> > Example: on IOC1, there are 100 records (scanned at 10 Hz) pointing to
> > 100 other records sitting on IOC2. As soon as IOC2 is down, IOC1
> > broadcasts a name resolution request for those 100 channels 10 times a
> > second.
> >
> > Is that true? Is that smart?
> >
> > Confused,
> > Ralph
> > (trying to find out why a single IOC going halfway down drives _all_ our
> > IOCs into 95+ percent of cpu usage)
- References:
- RE: CA client (on IOC) question Jeff Hill
- Navigate by Date:
- Prev:
RE: CA client (on IOC) question Jeff Hill
- Next:
Interesting blog posting Andrew Johnson
- Index:
2002
2003
2004
2005
<2006>
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
RE: CA client (on IOC) question Jeff Hill
- Next:
Interesting blog posting Andrew Johnson
- Index:
2002
2003
2004
2005
<2006>
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|