EPICS

ParseCASW Users Guide



Kenneth Evans, Jr.

August 2005


Advanced Photon Source
Argonne National Laboratory
9700 South Cass Avenue
Argonne, IL 60439


Table of Contents

Introduction

Overview

Channel Access servers send beacons, which are UDP broadcast packets, to indicate their health. When a server is healthy, it sends a beacon every EPICS_CA_BEACON_PERIOD (15 sec. by default). This is like a heartbeat. When it is coming up, each server broadcasts a startup sequence of beacons. This sequence starts with a small interval (25 ms, 75 ms for VxWorks) that doubles each time until it reaches EPICS_CA_BEACON_PERIOD and then stays there, becoming the heartbeat. It takes about 10 beacons and 40 s to get to steady state with the defaults. Note that UDP packets are one-way and unreliable. There is no guarantee that they will be received.

The beacons actually get sent to CaRepeater, and CaRepeater sends them to the clients. This is to overcome the problem in early implementations of UDP that all recipients on a given host (i.e. workstation) were not guaranteed to receive all the broadcasts.

A beacon anomaly is any change from the normal beacon interval. This could be no beacons (indicating a disconnect), the startup sequence (indicating a server has come online), or an abnormal interval (perhaps owing to an unhealthy server or to network problems).

Clients monitor the beacons to determine their connection status. If a Channel Access client has not received any beacons for EPICS_CA_CONN_TMO (30 sec. by default) from a server and has not received any other communication from the server for that time, then it sends an "Are you there?" message over its TCP connection to the server. If there is no reply for 5 sec, then the client decides that it is disconnected. MEDM screens turn white at that time. Before EPICS Base 3.14.5 the clients then reissued searches for any unresolved process variables. Starting with Base 3.14.5, that is not done. A client can also decide it is disconnected if it receives a hangup message over the TCP connection and in that case does reissue searches.

Clients also monitor the beacons to see if a server has come online. If they detect the startup sequence, then they reissue searches for any unresolved process variables. Thus, beacon anomalies are associated with searches, which can impact the servers and possibly the network.

When debugging Channel Access network problems, it may be useful to keep track of beacon anomalies. There is a routine, CASW (standing for Channel Access Server Watcher) that comes with EPICS Base and which prints a line with the server name and a timestamp whenever it detects a beacon anomaly. The anomaly logic used should be the same as that used by the client for that version of EPICS Base. This is useful information, but it can be hard to interpret by just looking at the output lines. The output from multiple servers is mixed together and the intervals, which are important, are hard to easily determine from the timestamps.

Description

ParseCASW was written to help interpret the CASW data. It can be used to interpret a file of lines output by CASW, or CASW can be piped into ParseCASW so that the CASW lines are interpreted in real time as they occur.

Beacon anomalies come in groups. ParseCASW defines a group to have ended when there are no more anomalies from that server for 60 sec. (This default interval can be changed via a command line option.) ParseCASW puts these groups into categories as follows:

Single Anomaly Only one anomaly in the group. Could be owing to a delay in the server or a delay caused by network traffic.
Short Sequence 5 or fewer anomalies.
Regular Beacons Network coming back. The sequence is not short and the difference between the maximum and minimum intervals is < 0.25 sec. Note that the server is sending out regular beacons in this case, but the average beacon rate estimate is initially very slow because none have been received recently. It typically takes on the order of ten beacons for the average rate to become the same as the received rate.
Server Coming Up A sequence that is monotonic and not short with the ratio of the maximum interval to the minimum interval being > 25.
Probably Server Coming Up Same as a server coming up except that 2 intervals are allowed to be non-increasing. Owing to network delays the short intervals, especially, may not be received at the same intervals at which they were sent.
Medium Long Sequence None of the above and 15 or fewer anomalies. This case may be caused by a server coming up or may not.
Long Sequence None of the above and 50 or fewer anomalies.
Very Long Sequence More than 50 anomalies.
Anomalies Out of Order The time stamps of the anomalies are out of order. (Should not happen unless there is an error in the data.)

Note that since ParseCASW has to wait until 60 sec. after the last anomaly in a sequence to determine its category, the output when CASW is piped into ParseCASW happens some time after the sequence started. In other words, if you start up a server, do not expect any output to happen until 60 sec. after the average interval gets close to the beacon period.

The image below shows the output from CASW and the output from ParseCASW for the same data as saved and viewed in an editor (Emacs).

CASW - ParseCASW Comparison

(Click here for full-size image)

First note that the CASW output file is much larger, as indicated by the size of the handles on the scroll bars to the right of the windows. The red line shows how the data from iocid04b gets converted to a line in ParseCASW. This is a medium long sequence. Note that there are three medium long sequences from this same IOC in the lower window. This is most likely an unhealthy IOC. The green line shows how the data from ioclid5 gets converted. The line shown in the upper window is part of a short sequence, the rest of which is not visible. This IOC also has three other medium or short sequences and is also most likely unhealthy. (Healthy servers do not have beacon anomalies unless the network is causing delays.) The blue line shows how the lines for iocid1 are interpreted as a server coming up.

The output in the lower window is the default output. The terse output for these three sequences would be:

The verbose output would be:

For information on obtaining ParseCASW, consult the EPICS Documentation. There is another diagnostic, CaSnooper, that monitors search requests instead of beacons. It can be obtained in the same place. It is often useful to use these two routines together, along with CASW.

Starting ParseCASW

ParseCASW is started by typing the following on a command line:

parsecasw [options] [filename]

or to pipe CASW into ParseCASW:

casw | parsecasw [options]

The options are:

-help Help.
-echo Echo the input lines as they are read.
-int <integer> Do checking and output every <integer> seconds.
-oag Use OAG data format. This is the format for CASW archival data at the Advanced Photon Source.
-server Sort output by server. The default is by group.
-terse Terse output. The default is between terse and verbose.
-verbose Verbose output.

The first character of the options is sufficient. (That is, -h is the same as -help.)

A typical command line is:

% parsecasw -v casw.txt > parsecasw.txt

Output

ParseCASW writes its output to standard output (stdout). You have a choice of terse, default, or verbose output. The following is an example of CASW output (part of the file shown in the editor above) and how it is converted with each of these three options:

CASW Output

iocid03:5064                             2004-05-18 14:40:53.126354159
iocid03:5064                             2004-05-18 14:41:49.537067355
iocid03:5064                             2004-05-18 14:42:04.537256674
iocid03:5064                             2004-05-18 14:42:19.538077826
iocid03:5064                             2004-05-18 14:42:34.536262728
iocid03:5064                             2004-05-18 14:42:49.535585714
iocid03:5064                             2004-05-18 14:45:36.613654888
iocid03:5064                             2004-05-18 14:45:51.927365873
iocid03:5064                             2004-05-18 14:46:07.344241109
iocid03:5064                             2004-05-18 14:46:22.328719845
iocid03:5064                             2004-05-18 14:46:37.328110497
ioclid5:5064                             2004-05-18 15:00:50.452589267
ioclid5:5064                             2004-05-18 15:01:05.453590670
ioclid5:5064                             2004-05-18 15:01:20.455975488
ioclid5:5064                             2004-05-18 15:04:17.460584986
ioclid5:5064                             2004-05-18 15:04:35.429871969
ioclid5:5064                             2004-05-18 15:04:53.013039452
ioclid5:5064                             2004-05-18 15:05:08.042793438
ioclid5:5064                             2004-05-18 15:05:42.011020322
ioclid5:5064                             2004-05-18 15:06:35.505764438
ioclid5:5064                             2004-05-18 15:06:50.512081340
iocid04b:5064                            2004-05-18 15:07:02.382328579
iocid04b:5064                            2004-05-18 15:07:02.699825995
iocid04b:5064                            2004-05-18 15:07:02.820376078
iocid04b:5064                            2004-05-18 15:07:02.949850578
iocid04b:5064                            2004-05-18 15:07:03.227257161
iocid04b:5064                            2004-05-18 15:07:03.773111244
iocid04b:5064                            2004-05-18 15:07:04.860120243
iocid04b:5064                            2004-05-18 15:07:06.983292074
iocid04b:5064                            2004-05-18 15:07:11.248919487
ioclid5:5064                             2004-05-18 15:08:02.752896604
iocid01:5064                             2004-05-18 15:08:06.104330352
iocid01:5064                             2004-05-18 15:08:06.106132852
iocid01:5064                             2004-05-18 15:08:06.138858851
iocid01:5064                             2004-05-18 15:08:06.208655601
iocid01:5064                             2004-05-18 15:08:06.338916851
iocid01:5064                             2004-05-18 15:08:06.605788351
iocid01:5064                             2004-05-18 15:08:07.145271684
iocid01:5064                             2004-05-18 15:08:08.219170849
iocid01:5064                             2004-05-18 15:08:10.339235597
iocid01:5064                             2004-05-18 15:08:14.606061676
ioclid5:5064                             2004-05-18 15:08:50.501317309

ParseCASW Terse Output

iocid03:5064 May 18 14:40:53 Medium long sequence
iocid03:5064 May 18 14:45:36 Short sequence
ioclid5:5064 May 18 15:00:50 Short sequence
ioclid5:5064 May 18 15:04:17 Medium long sequence
iocid04b:5064 May 18 15:07:02 Probably server coming up
ioclid5:5064 May 18 15:08:02 Short sequence
iocid01:5064 May 18 15:08:06 Server coming up

ParseCASW Default Output

iocid03:5064
 Medium long sequence
 May 18 14:40:53 6 event(s) for 116.41 sec = 1.94 min = 0.03 hours

iocid03:5064
 Short sequence
 May 18 14:45:36 5 event(s) for 60.71 sec = 1.01 min = 0.02 hours

ioclid5:5064
 Short sequence
 May 18 15:00:50 3 event(s) for 30.00 sec = 0.50 min = 0.01 hours

ioclid5:5064
 Medium long sequence
 May 18 15:04:17 7 event(s) for 153.05 sec = 2.55 min = 0.04 hours

iocid04b:5064
 Probably server coming up
 May 18 15:07:02 9 event(s) for 8.87 sec = 0.15 min = 0.00 hours

ioclid5:5064
 Short sequence
 May 18 15:08:02 2 event(s) for 47.75 sec = 0.80 min = 0.01 hours

iocid01:5064
 Server coming up
 May 18 15:08:06 10 event(s) for 8.50 sec = 0.14 min = 0.00 hours

ParseCASW Verbose Output

iocid03:5064
 Medium long sequence
 6 event(s)
 May 18 14:40:53 to May 18 14:42:49 (116.41 sec = 1.94 min = 0.03 hours)
 Mean=23.28 Sigma=16.56 Min=15.00 Max=56.41 Increasing=3

iocid03:5064
 Short sequence
 5 event(s)
 May 18 14:45:36 to May 18 14:46:37 (60.71 sec = 1.01 min = 0.02 hours)
 Mean=15.18 Sigma=0.19 Min=14.98 Max=15.42 Increasing=3

ioclid5:5064
 Short sequence
 3 event(s)
 May 18 15:00:50 to May 18 15:01:20 (30.00 sec = 0.50 min = 0.01 hours)
 Mean=15.00 Sigma=0.00 Min=15.00 Max=15.00 Increasing=2 Monotonically increasing

ioclid5:5064
 Medium long sequence
 7 event(s)
 May 18 15:04:17 to May 18 15:06:50 (153.05 sec = 2.55 min = 0.04 hours)
 Mean=25.51 Sigma=14.11 Min=15.01 Max=53.49 Increasing=3

iocid04b:5064
 Probably server coming up
 9 event(s)
 May 18 15:07:02 to May 18 15:07:11 (8.87 sec = 0.15 min = 0.00 hours)
 Mean=1.11 Sigma=1.35 Min=0.12 Max=4.27 Increasing=7

ioclid5:5064
 Short sequence
 2 event(s)
 May 18 15:08:02 to May 18 15:08:50 (47.75 sec = 0.80 min = 0.01 hours)
 Mean=47.75 Sigma=0.00 Min=47.75 Max=47.75 Increasing=1 Monotonically increasing

iocid01:5064
 Server coming up
 10 event(s)
 May 18 15:08:06 to May 18 15:08:14 (8.50 sec = 0.14 min = 0.00 hours)
 Mean=0.94 Sigma=1.34 Min=0.00 Max=4.27 Increasing=9 Monotonically increasing

Channel Access

Channel Access is the part of EPICS that governs communication between servers and clients. You can find out more information by looking at the EPICS Channel Access Reference Manual. There is a version included with each EPICS release. They can be found by starting at IOC Software in the EPICS Documentation and following links to the release and point release of the desired EPICS base.

Acknowledgements

Jeff Hill of Los Alamos National Laboratory, the person responsible for Channel Access, was of great help in understanding the operation of Channel Access.

Copyright

ParseCASW is governed by the EPICS Open License.


Valid HTML 4.01!