EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <2024 Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <2024
<== Date ==> <== Thread ==>

Subject: Re: caRepeater question
From: Torsten Bögershausen via Core-talk <core-talk at aps.anl.gov>
To: "J. Lewis Muir" <jlmuir at imca-cat.org>, Zimoch Dirk <dirk.zimoch at psi.ch>
Cc: "core-talk at aps.anl.gov" <core-talk at aps.anl.gov>
Date: Fri, 2 Feb 2024 11:36:01 +0100


On 2024-02-01 21:12, J. Lewis Muir wrote:
On 02/01, Zimoch Dirk wrote:
Normally I an running it as a service. I gave that simpler scenario because it
shows the critical points and is simpler to reproduce.
Our actual problem was that [snip].

Ah, OK, thanks for that explanation; makes sense.

I tested with casw:
0. caRepeater.service is running
1. start casw
2. start an ioc. casw shows the beacon anomaly
3. sudo systemctl restart caRepeater.service
4. start an ioc. casw does not show any beacon anomalies any more
5. restart casw. It works again.

Unfortunately, casw (or any ca client) cannot find out that the caRepeater it
had registered to has died. Thus it never tries to reconnect.

Ouch.  That seems like a major problem to me.  It seems like that means
that to upgrade caRepeater, you have to restart all CA clients as well,
which would include IOCs that are CA clients.  If you don't do that, the
CA clients (including IOCs that are CA clients, for example, via a CA
link) will stop working correctly.  Is that right?  If so, that's rough.

Well, I somewhat wonder, why people update the repeater ?
The code that is used inside the repeater has been stable for years,
at least, there are no day-by-day improvements.
We can say either do not update the reapeater every night, only
when things have really changed, there was a real problem that
now had been solved.

What is the repeater good for ?
To forwared the beacons from the different IOCs (on different hosts) to the different clients (all on the same host as the repeater).

Right now it seems that "a missing beacon" (via UDP) from one IOC
makes the client (camonitor in my case) going out with a camessage (?) via TCP. So things do continue to work.
Depending on the number of clients, any gateways, the network load,
CPU capacity, this may be a working solution.
But that is my limited understanding.



I don't know hardly anything about the CA protocol, so what I'm about
to say may not be possible or may not even make sense, but I wonder
if caRepeater could be changed to send some kind of CA message to all
registered clients when it's about to exit?  That wouldn't work for
the case of caRepeater being sent a SIGKILL or SIGSTOP signal (or the
equivalent on Windows), nor the case of caRepeater crashing, but it
would work for the case of signals that can be caught.  Still, such a
solution doesn't seem particularly robust since it wouldn't work if the
CA message didn't get delivered to all clients for whatever reason.

Does the repeater ever exit ? Normally not, unless it is terminated
by a signal.


I wonder if the CA protocol could be extended to support some kind of
mechanism to allow clients to detect when the caRepeater has died,
stopped working, or restarted?  For example, maybe CA clients could
periodically poll for a unique caRepeater ID that would change when a
new caRepeater process is started?

There is, may be, no need to extend the protocol.
The client(s) will realize, the beacons are missing.
And that can mean a lot of things:
IOC down.
Network down.
repeater down.
It could be possible to fiddle in a "repeater, are you alive" thing
into the code. How much sense that makes, I don't know.
uff.


Having used TCP instead of UDP to connect to the caRepeater would not have this
problem, I think.

Patches welcome, or is this too harsh ? just trying to be helpful



Interesting.

Lewis

References:
caRepeater question Zimoch Dirk via Core-talk
Re: caRepeater question Torsten Bögershausen via Core-talk
Re: caRepeater question Zimoch Dirk via Core-talk
Re: Re: caRepeater question J. Lewis Muir via Core-talk
Re: Re: caRepeater question Zimoch Dirk via Core-talk
Re: caRepeater question J. Lewis Muir via Core-talk

Navigate by Date:
Prev: Build completed: epics-base base-socket_accept_type-54 AppVeyor via Core-talk
Next: Build failed: EPICS Base 7 base-7.0-1090 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <2024
Navigate by Thread:
Prev: Re: caRepeater question J. Lewis Muir via Core-talk
Next: Build failed: epics-base base-7.0-53 AppVeyor via Core-talk
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  <2024
ANJ, 05 Feb 2024 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·