Network SLA

Greetings
I am curious to know about any tools/techniques that a service provider uses
to assess an SLA before signing it. That is to say, how does an
administrator know if he/she can meet what he is promising. Is it based on
experience? Are there commonly used tools for this?
Thanks and best regards

Availability cannot be calculated in advance. It typically is based on
historical component failure information. Sound design ensures
redundancy and eliminates single point of failure.

As for the rest, CIR, Latency, Jitter, Loss ..... this can be tested
prior to customer handover with any number of tools and protocols
including IEEE 802.11ag/ah, ITU-T 1731, IETF RFC2544. Hand-helds are
typically not cost effective.

Rich Andreas
Comcast Network Engineering

Maybe the best way of addressing this is knowing exactly what we need to measure- if IP traffic, services or processes. If the timescale of a process (ie: MTTR's)and/or procedure or just data and/or voice traffic from point A to B. Or just scoping the measurments as being the performance of the core network, or only related to usage based service. And that takes us to the TMN model and to the bottom-up approach starting w/ the FCAPs.

you have fereware, shareware and licenced tools or most likely specific vendor-related tools and only linked to one vendor or one type of equipment. I am sure you've heard of RRD/MRTG, just like a few others that normally sit on the botton tier and have an upstream chain correlating the events. Most times the options are about suitablity and what the software version is prepared to report on so they are seen as more "suitable" to customers.

IME, the administrators don't have anything to do with what is signed. The "company" chooses what SLAs to sign with customers (typically whatever the customer requests, possibly with various levels of pricing for different agreements), but the operational staff are not involved.

If you're lucky, you have this information before you build and can -try- to build to suite. But most times, the SLAs are signed after you've built, and everyone just crosses their fingers.

IME.

..david

We use the BRIX active measurement instrumentation product to measure
round-trip, jitter, and packet loss SLA conformity.

Saqib Ilyas wrote:

Greetings
I am curious to know about any tools/techniques that a service provider uses
to assess an SLA before signing it. That is to say, how does an
administrator know if he/she can meet what he is promising. Is it based on
experience? Are there commonly used tools for this?
Thanks and best regards
  

Not necessarily as a direct answer (I am pretty sure there'll be others on this list giving details in the area of specific tools and standards), but I think this may be a question (especially considering your end result concern: *signing the SLA!) equally applicable to your legal department. In the environment we live, nowadays, the SLA could (should?!? ... unfortunately) be "refined" and (at the other end - i.e. receiving) "interpreted" by the lawyers, with possibly equal effects (mostly financial and as overall impact on the business) as the tools we (the technical people) would be using to measure latency, uptime, bandwidth, jitter, etc...

Stefan

As I gather, there is a mix of answers, ranging from "building the resources
according to requirements and HOPE for the best" to "use of arguably
sophisticated tools and perhaps sharing the results with the legal
department".

I would be particularly interested in hearing the service providers'
viewpoint on the following situation.

Consider a service provider with MPLS deployed within its own network.

(A) When the SP enters into a relation with the customer, does the SP
establish new MPLS paths based on customer demands (this is perhaps similar
to "building" based on requirements as pointed out by David)? If yes,
between what sites/POPs? I assume the answer may be different depending upon
a single-site customer or a customer with multiple sites.

(B) For entering into the relationship for providing X units of bandwidth
(to another site of same customer or to the Tier-1 backbone), does the SP
use any wisdom (in addition to MRTG and the likes)? If so, what scientific
parameters are kept in mind?

(C) How does the customer figure out that a promise for X units of bandwidth
is maintained by the SP? I believe customers may install some measuring
tools but is that really the case in practice?

Thanks,
Zartash

I must thank everyone who has answered my queries. Just a couple more
short questions.
For instance, if one is using MRTG, and wants to check if we can meet
a 1 Mbps end-to-end throughput between a couple of customer sites, I
believe you would need to use some traffic generator tools, because
MRTG merely imports counters from routers and plots them. Is that
correct?
We've heard of the BRIX active measurement tool in replies to my
earlier email. Also, I've found Cisco IP SLA that also sends traffic
into the service provider network and measures performance. How many
people really use IP SLA feature?
Thanks and best regards

Saqib,

I must thank everyone who has answered my queries. Just a couple more
short questions.
For instance, if one is using MRTG, and wants to check if we can meet
a 1 Mbps end-to-end throughput between a couple of customer sites, I
believe you would need to use some traffic generator tools, because
MRTG merely imports counters from routers and plots them. Is that
correct?

Yes, if you want to do a test bandwidth, iperf should probably be your first stop.

We've heard of the BRIX active measurement tool in replies to my
earlier email. Also, I've found Cisco IP SLA that also sends traffic
into the service provider network and measures performance. How many
people really use IP SLA feature?

I know a lot of people that use IPSLA. Remember, that you set it up between two routers or higher-end switches and it constantly tests that connection. However, IPSLA is the wrong tool for a one-off test of whether you can push a Mbps from site A to site B, because you need to saturate the link to do that test. IPSLA is great for monitoring things like jitter.

HTH,

Chris

Saqib,

>I must thank everyone who has answered my queries. Just a couple more
>short questions.
>For instance, if one is using MRTG, and wants to check if we can meet
>a 1 Mbps end-to-end throughput between a couple of customer sites, I
>believe you would need to use some traffic generator tools, because
>MRTG merely imports counters from routers and plots them. Is that
>correct?

Yes, if you want to do a test bandwidth, iperf should probably be your
first stop.

Or for more sophisticated matricies of spot-checks, BWCTL
(http://www.nanog.org/meetings/nanog43/presentations/Boote_tools_N43.pdf)

>We've heard of the BRIX active measurement tool in replies to my
>earlier email. Also, I've found Cisco IP SLA that also sends traffic
>into the service provider network and measures performance. How many
>people really use IP SLA feature?

I know a lot of people that use IPSLA. Remember, that you set it up
between two routers or higher-end switches and it constantly tests
that connection. However, IPSLA is the wrong tool for a one-off test
of whether you can push a Mbps from site A to site B, because you need
to saturate the link to do that test. IPSLA is great for monitoring
things like jitter.

While Birx is awesome and a cisco-heavy site certainly should use
rtr/ipsla in their mix, don't underestimate the value of a lightweight
system built on smokeping (SmokePing - About SmokePing). Choose
the right set of tools for your budget and environment.

Cheers!

Joe

We use BRIX for SLA's by measuring round trip times, jitter, and packet
loss across all of our backbone links. In conjunction with a traffic
generator to add background traffic, and potentially invoke queueing on
interfaces, we have found that BRIX enables us to accurately predict the
behavior of new applications, particularly multicast and HD video,
without the need to implement elaborate QoS configurations. BRIX is now
owned by EXFO, a fiber optic test equipment manufacturer. Low values for
rtt, jitter, and packet loss imply a relatively queue-free network,
which makes confident predictions about network behavior easier.
When we last looked at the technology, the Cisco IP SLA probes did not
capture a random distribution of network events, as the probes are
triggered every N minutes. BRIX randomizes the probes within a
configurable window, so that, over time, all time intervals are covered
by the accumulated probes.

What products/services do you use for traffic generation? Also what sort of testing methodology do you use? As for random probes that certainly seems like a nice feature.

Holmes,David A wrote:

I have found that Cisco IPSLA is heavily used in the MSO/Service
Provider Space. Juniper has equivalent functionality via RPM.

Rich

Anyone interested in setting up his own IP SLA probes by hand and then
collect the measurements into a database, can use a Perl tool we developed
at 2005:

http://sourceforge.net/projects/saa-collector

It's rather old (SAA got renamed into IPSLA in the meantime) and, in
retrospect, the code is a little rough around the edges, but it's
nevertheless usable.

Regards,
Athanasios

I'm back! Thanks again to all those who replied. I am wondering how a
service provider might assess availability or reliability figures using
active measurements. Granted that one could set up traffic generators
between the two PoPs which will be connected to a customer's sites, and then
after a day of test traffic, I can look for downtimes and restoration times.
But a one day estimate is not a good estimate for what the service provider
is promising, which is usually "maximum of 10 hours downtime in an year", is
it not?
Thanks and best regards

I'm back! Thanks again to all those who replied. I am wondering how a
service provider might assess availability or reliability figures using
active measurements. Granted that one could set up traffic generators
between the two PoPs which will be connected to a customer's sites, and then
after a day of test traffic, I can look for downtimes and restoration times.

This is an exact description of IPSLA. Of course you don't know whether a maximum bandwidth was in fact available, because you don't want to saturate the link.

But a one day estimate is not a good estimate for what the service provider
is promising, which is usually "maximum of 10 hours downtime in an year", is
it not?

You need a year of measurement.

I talked to the NOC personnel at a small (compared to North American
standards) ISP in Pakistan. They said that their core links are operating at
less than 50% utilization most of the time. Under such conditions, violating
SLA conditions in the core is unlikely. If such is also the case with most
service providers in the North America as well, then why would they even use
active measurement such as iPerf or BRIX or Cisco IP SLAs before signing an
SLA?
Thanks and best regards

Hmmm. Good point. Perhaps the Internet traffic gets only a small share of
the link capacity and the rest is reserved for corporate clients' VPN
traffic etc. I was thinking more along the lines of corporate SLAs, not for
Internet traffic.

For private, point to point, line, I agree with a previous posting on the
subject:

"As for the rest, CIR, Latency, Jitter, Loss ..... this can be tested
prior to customer handover with any number of tools and protocols
including IEEE 802.11ag/ah, ITU-T 1731, IETF RFC2544. " -Rich Andreas

Asking to receive the testing report as part of an acceptance process is not
unusual.

For corporate IP service, you may want to measure end to end performance and
not get too specific in the core. Writing an SLA against city pair
performance is a responsible method to do this e.g. "Islamabad->Kabol not
equal to more than 1ms". That should encompass everything along the required
path(s) and hopefully incent your provider to keep their network up to
snuff and their MTTR low. You may also consider codifying the MTTR i.e. MTTR
= < 2 Hours "or" service credit. (Again, depends on your economic power).

Don't forget that your power to negotiate SLA's with service credits is
proportionate to the size of the purchase. Buying 10 Mb/s vs. 10 Gb/s
services are two different types of economics when it comes to SLA.

Best,

Martin

From the network operators' standpoint, designing a network that

operates at 50% utilization (without using ponderous QoS schemes)
assumes that there is no random queuing behavior in the network that can
result in dropped packets and large variations in packet arrival jitter.
An active measurement tool such as BRIX gathers empirical data for
packet drops and jitter from which accurate predictions about network
behavior can be made. Think of active measurement tools as a means of
implementing a scientific approach to determining network behavior.

From the users' standpoint, BRIX can be used to validate the service

providers' contractual SLA, and provide empirical data to support SLA
violation penalties.