Thursday, December 12, 2019

Why Adding More APs isn't Always Better

Many Wi-Fi experts work on networks with hundreds of APs with thousands of devices, and thus rely heavily upon a vendor's auto-channel and auto-power features (a.ka. radio resource management or RRM).  Such algorithms shouldn't be trusted, though often understanding why can only be done by stepping back to relatively simple examples.

In this case, the example is a small warehouse fulfillment center consisting of 6' tall shelving, as well as some refrigerators, coolers, and walk-in freezers.  The facility is approximately 5,000 square feet, where we have deployed four Meraki MR33 (802.11ac wave 2 2x2:2, w/ internal 3 dBi antennas).  The APs are set to static 20 MHz channels (non-DFS) on both 2.4 GHz and 5 GHz (one AP has its 2.4 GHz radio disabled), and auto-power has been restricted to 8 dBm +/- 3 dB on the 2.4 GHz band and 14 dBm +/- 3 dB on the 5 GHz band.   The site has two SSIDs, one for facilities for Android-based dual-band barcode scanners and other facility PCs and devices, and one for guests, primarily for the personal cell phones and tablets of the employees who work at the facility.  Originally, the SSIDs were both setup for dual-band operation with band steering (i.e. dual band clients are “encouraged” to associate at 5 GHz).

The general manager at one particular site is complaining that the main order tracking PC, as well as a tablet used for employee time tracking, are having frequent disconnects.  Both devices are located in the Dispatch area in the upper left side of the floor plan.




This type of complaint is usually indicative of interference.  Since our own APs are set to static non-overlapping channels (which is in our control), I immediately started looking for external interference (which is out of our control).  

An initial Wi-Fi scan from the APs (using the “Air Marshal” feature of Meraki, but every AP vendor can do this) detected over 120 distinct SSIDs from 3rd party APs in the area.  Most of these are only on 2.4 GHz, and while most were at fairly low power levels (< 5 dB), there were enough at significantly higher levels (20 – 30 dB) to indicate that the 2.4 GHz band is quite saturated with external APs.   Hence, my initial troubleshooting solution was to set the facilities SSID to be on 5 GHz only, as all of their devices are dual-band.   This also eliminates the need for band steering on the facilities network, which can cause some delays and issues in roaming.  (In fairness to Meraki, I have not seen any problems directly related to band steering across approximately 120 such sites, so I’m inclined to believe Merkai band steering is working appropriately.  Nonetheless, it seemed prudent to remove a potential problem source.)

Alas, this change did not make any difference at all to the reported issues.   

Looking at the 5 GHz band, there is a fair amount of 5 GHz non-DFS interference from multiple cable routers (i.e. neighboring cable modems or local public hotspots).   We have already checked to make sure our own cable router has its Wi-Fi disabled, but we cannot do anything about neighboring businesses. I am also seeing a lot of strong 5 GHz interference surprisingly from SSIDs that correspond to Wi-Fi hotspots inside vehicles.  According to Google Earth, there is a nearby auto-dealership within 200 - 300 feet, although for all I know the APs are simply picking up passing cars, as the facility is located on a fairly busy road in a commercial area.  The resolution and the history of such external APs on the Meraki Dashboard is limited, making further diagnostics difficult.

My knee-jerk reaction at this point was to switch the APs over to DFS, which seemed to have no activity.  I temporarily did this, but then thought better of it.  I have been avoiding the use of DFS channels as it takes devices a lot longer to roam because the device must do a passive scan on the DFS channels (52-64, 100-140) vs. 10x – 20x faster active scans on UNII-1 (36-48) and UNII-3 (149-161).  Thus, I’d rather not use DFS channels at these facilities unless it is totally unavoidable. To verify if DFS was justified, I switched to a spectrum analysis of the APs, which showed somewhat surprisingly that, despite the large number of APs in the area, channel utilization on both bands was actually quite low (< 10% - 15%).   Thus, something else entirely is going on.

The other issue that intermittent client disconnects can be a symptom of is the transmit power of one's own networks being far too high.  We are performing predictive models in Ekahau in order to optimize AP placements, though detailed on-site post-deployment active surveys with Ekahau are not being done, due to both cost and time limitations.  Up until recently, we have not been able to get the wall materials from the customer in advance, so we have generally assumed the interior walls are cinder block.  This has proven to be the most typical indoor material at most sites, making it a reasonably conservative assumption.  If the walls are drywall, signal penetration will be better than expected; conversely, if the walls are poured concrete, signal penetration could be worse than expected. 

If the signal from the other APs on the network are really strong, client devices will hear multiple APs on our own network, all at very good signal levels (>> -67 dBm).   In such environments, if the client’s roaming algorithms are not very smart, a client device can wind up roaming between multiple APs on the network just due to minor signal fluctuations.  Alas, with most client devices, (a) they are fairly dumb when it comes to roaming, (b) roaming behavior can change from one firmware version to the next, and (c) as network engineers we ultimately have no control over how a client device roams.  

Fortunately, the spectrum analysis tool on the Meraki dashboard tells us about all of the other APs being seen, including and especially our own.  From AP01 (closest to the Dispatch room), I’m seeing signal from AP02 and AP04 in the mid -50’s dBm, which is really strong.  From AP02, which is reasonably centered in the facility, I can see all of our other APs on 5 GHz in the low -50’s to high -60’s dBm.




Based on the signal levels of our own APs as well as external APs, I must conclude that the walls at this facility are really thin, at least from an RF perspective.  In the predictive model, changing the internal walls from cinder block to drywall indicates that I could ostensibly cover most of the 5000 sq. ft. facility with an RSSI of -67 dBm or better from a single AP, and we have four of them in this space!  

The transmit power on the APs was already turned down fairly low, but I’ve now gone even lower on both 2.4 GHz and 5 GHz to see if that resolves the issue.  I’ve turned down the auto-power ranges to give 5 dBm +/-3 dB on 2.4 GHz and 8 dBm +/- 3 dB on 5 GHz.   Prior settings had been on 8 dBm +/- 3 dB for 2.4 GHz and 14 dBm +/- 3 dB for 5 GHz.  (Unfortunately, the auto-power algorithm on Meraki has proven to be surprisingly poor, generally driving one AP to its maximum and surrounding APs to their minimum.)  If this does not materially improve things, my next option is to start turning off auto-power entirely by setting fixed power values and switching off one or two of the APs entirely.