Make VoIP work on your LAN

I have a client with a local area network with about forty client PCs. A few months ago, a third-party phone vendor added an Elastix “PBX” phone server, as well as a new workgroup router. Elastix provides a graphic administrator interface to its underlying Asterix telephone private branch exchange (PBX). Elastix and Asterix run on a Linux server.

The client has complained that phone callers’ voices have been randomly distorting, incoming calls randomly terminating, and after ringing, desktop phones’ handsets randomly die. The phone vendor assured my client that the new router (an Asus RT-N66U) was not at fault, and suspected that the problems were caused by cabling problems or configuration problems with the Elastix server.

After months of frustration, the client asked me to have a look. I began with the workgroup router. I noticed that it was configured to use cut-through switching. (Asus calls this “NAT Acceleration”, which sounds like a good thing, doesn’t it?  NAT is Network Address Translation. The router apparently defaults to cut-through switching mode.) VoIP (Voice over Internet Protocol) uses UDP (User Datagram Protocol), rather than TCP/IP. UDP is used for streaming real time audio and video because of its low overhead and potentially reduced latency. It does, though, require that its underlying transport mechanism be rock solid.

Cut-through switching does NOT provide a rock solid transport mechanism! Cut-through switching is fast, but it can damage frames and forward previously damaged frames. The more conservative store and forward method ensures that all frames that traverse a switch remain undamaged. It also will not forward damaged frames. Result?  A cleaner network.

Onion layer 1

Troubleshooting system problems is like peeling an onion. You remove one layer at a time and look for changes.

For our first layer, I reconfigured the workgroup router so that it employed store and forward, rather than cut-through switching. Then I waited for user reports. Users reported that we’d fixed the distortion problem, but calls occasionally dropped and/or weren’t initiated.

Onion layer 2

Next, I activated the workgroup router’s QOS (Quality Of Service) feature. I assigned highest priority to all traffic in and out of all active ports on the Elastix server. Then I waited. Users reported that all phones now work as they should.

Problem solved.

Think before adding boxes

Adding boxes to networks often works with little tweaking. Eventually, though, services begin to fail and users complain of slow response, as traffic jams the network’s pipes. Eventually, someone must reduce unnecessary traffic, and assign priorities to different classes of network traffic. I recommend doing this before problems occur — not after.

