Monitoring

Monitoring Checks to track the performance of the services your app depends on to function. Are all API Endoints reachable? Returning the correct response code? Returning data to your app in the expected format? And how long do they take to do this? Monitoring checks will alert you to any problems before they impact your audience.

Select the App, expand "Diagnostics" and select "Monitoring Checks" in the left hand menu.

Monitoring Checks

If you have a lot of endpoints you want to monitor, click "Contact Us" to arrange to send us a Postman collection or Swagger/OpenAPI spec and we'll import them for you.

Add a Monitoring Check¶

To start monitoring an API Endpoint, click the primary action button.

Give the endpoint a descriptive name, select the HTTP Method (GET, POST, PUT, DELETE or PATCH), enter the URL and any query parameters (as key, value pairs). Click "Next" when done.

Add Check Endpoint Details

Add any HTTP headers to the Request such as an Authorization header and click "Next".

Add Check Request Details

In addition to testing that the API Endpoint is Reachable and returning a successful 2xx Response Code, the Monitoring Check will test how long the API Endpoint takes to respond to the HTTP Request from each Data Centre. Specify a Warning and Critical threshold for Response Time in milliseconds.

Add Check Expectations

If you also want the Monitoring Check to test that the payload returned validates against a JSON schema, click the "Add Schema" button.

You can enter a JSON schema manually into the right-hand pane, but it is simpler to infer the JSON schema from the actual response body of the API Endpoint itself. Click "Fetch Payload" to execute the HTTP Request and then "Infer Schema" to derive the JSON schema from the response. You can manually edit the schema if required. Click "Save JSON Schema" when done and then "Next".

Add Check Schema

You can now review everything entered and click "Back" to make any changes. Click "Save" to add the Monitoring Check.

Add Check Review

The Monitoring Check will now be added and results will appear within the next 5 minutes.

View Detailed Results¶

To view detailed results, click on the Endpoint in either the table in the Diagnostics Dashboard or the Monitoring Checks page (expand "Diagnostics" and select "Monitoring Checks" in left hand menu).

Detailed Results

The results will automatically refresh every 5 minutes.

Endpoint Status¶

The overall Endpoint Status and how long the Endpoint has been in that status is shown at the top.

Endpoint Status

For the Endpoint Status to be 'Available' (green), all tests from all Data Centers must be OK.

If a test from one Data Center is Crtical, the Endpoint Status will be 'Partial Disruption' (yellow) as the problem is localised to one region (i.e. the other Data Centers are OK for the same test).

If a test from two or more Data Centers is Critical, the problem is obviously more widespread and the Endpoint Status will be 'Outage' (red).

Endpoint Status	Description
Available	All Tests from all Data Centers are OK
Partial Disruption	One Data Center is Critical (for any Test) One or more Data Centers are Warning (for Response Time)
Outage	Two or more Data Centers are Critical (for same Test)

Uptime¶

Uptime shows the percentage of time that the Endpoint Status was Available during the selected time period as well as the absolute downtime (when Endpoint Status was Outage) over same time.

Endpoint Uptime

By default, this will show the last 24 hours, but can be changed to the last 7 days (1 week), 30 days (1 month) or 90 days (1 quarter). Alternatively, select Custom and enter a start and end date to see this over a period of time up to 180 days (6 months).

Tests¶

Detailed results for each Test from each Data Center are shown.

Test Results

Data Centers are ordered in the table by Status with those Data Centers that have a Critical test or a Warning test appearing first. For each Data Center, the detailed results of each test will be shown.

Test	Status	Description
Reachable	OK	HTTP Request successfully made
	Critical	Network error prevented HTTP Request
Response Code	OK	Response Code was 2xx (Success)
	Critical	Response Code not 2xx e.g. 4xx (Client Error) or 5xx (Server Error)
Response Time	OK	Response received before Warning Threshold (e.g. 3500 ms)
	Warning	Response received after Warning but before Critical Threshold
	Critical	Response received after Critical Threshold (e.g. 5000 ms)
Payload	OK	Response body passed validation against saved JSON schema
	Critical	Response body failed validation against saved JSON schema

Any tests that are Critical will be shown in red. Hover over that test to see a tool-tip with the reason such as the network error (e.g. TLS handshake failed) or the JSON path that failed validation.

Validation Failed

Response Time¶

The chart shows how long on average it took each Data Center to receive a response to the HTTP Request from the Endpoint along with the worst Endpoint Status for each interval on the chart.

Response Time

By default, the chart will show the last 24 hours at 15 minute intervals. This can be changed to the last 7 days (with hourly intervals) or 30 and 90 days (with daily intervals). Alternatively, select Custom and enter a start and end date to see this over a period of time up to 180 days (6 months).

Hover over any interval on the chart to see the average response time for each data centre during that interval as well as the worst Endpoint Status during that interval.

Soon you will be able to drill down and see how the Response Time is comprised i.e. how much time was spent on DNS Resolution, Transfer etc and how this varies over time.

As response times shown are the average for that interval, it is possible that the EndPoint Status can be Partial Disruption or Outage caused by a slow Response Time for a test that exceeds a threshold, but that the average for the interval is under the threshold. In which case, look at the corresponding incident to see more details.

Incidents¶

Incidents show any current problems as well as a history of all past problems with an API Endpoint.

If a Test is not OK, an Incident will be opened and will remain open until that Test is OK for all Data Centers. However, if the same Test is not OK from multiple Data Centers, then only one Incident will be opened. Click on the Incident to see a timeline for each Data Center and hover over to see exact failure reason.

Incidents

New incidents will be opened for the most important test only. For example: if an Incident is opened because Response Code is Critical, an Incident will not be opened for Payload validation etc.

However, if an Incident is already open for one Test e.g. Response Time and later a more important Test e.g. Response Code becomes Critical, the Response Time Incident will remain Open and a new Incident for Response Code will also be opened. The Response Code Incident will be closed when Response Code is OK for all Data Centers and then the Response Time Incident will be closed when Response Time is OK for all Data Centers.

Open Incidents will show the current worst Test Status for any Data Center whereas Closed Incidents will show the worst Test Status at any point during the Incident.

Edit a Monitoring Check¶

To edit any of the Endpoint details (such as request headers), warning and critical thresholds or the JSON schema, first select the App, expand "Diagnostics" and select "Monitoring Checks" in the left hand menu. Expand the context menu for that Monitoring Check and select "Edit".

Edit Check

Remember to save any changes or click the back arrow at any point to discard.

Delete a Monitoring Check¶

To stop monitoring an API Endpoint, select the App, expand "Diagnostics" and select "Monitoring Checks" in the left hand menu. Expand the context menu for that Monitoring Check, select "Delete" and then click "OK" when prompted.