[Commotion-dev] Allied Media Conference 2013 MagicNet Deployment Report

Tue Jun 25 22:19:12 UTC 2013

**

*Here's an initial reportback from the AMC conference. A more detailed
report will likely follow. Thanks Andy for the help on this!*
**

*

tl;dr: it had a rough start, but turned out awesome!
<https://docs.google.com/file/d/0B0hY7epZlXFaYjhaRVBPZGpxeUU/edit?usp=sharing>

-Dan

----

*

**

**

**

**

**

*

At the 2013 Allied Media Conference (AMC), the Open Technology Institute
partnered with local Detroit Digital Stewards and Red Hook Digital
Stewards to construct a conference-wide mesh network. The purpose was to
provide network and Internet access to the conference attendees, provide
an additional training opportunity for the Detroit and Red Hook Digital
Stewards, and serve as a testing opportunity for Commotion DR1.1.

*

*

Process

The construction process for the network was reminiscent of the process
from the previous AMC, as detailed in the AMC 2012 Commotion Deployment
Report
<https://docs.google.com/document/d/1rf8VETPlJwbgM9p9cU2evIPPjwqOdhP_S9xlsxdGFFo/edit?usp=sharing>.

  *

    Detroit Digital Stewards spent Wednesday afternoon from around 3PM
    until 5PM installing Commotion DR1.1 on PicoStation M2 (PS) and
    NanoStation M2 (NS) units.

      o

        Basic information was recorded for each node - it's name, mesh
        IP address, and label.

      o

        Several customizations were made - changing the Administrator
        password, changing the splash page timeout to 24 hours, updating
        the splash text, and changing the node name and SSID to match
        it's location. Nodes were left "stock" otherwise.

  *

    Installations were conducted on Thursday, the day prior to the
    official start of the conference sessions. This consisted of a tour
    of the sites to be installed by the team, which included OTI staff
    and Detroit Digital Stewards. In the afternoon the team was expanded
    as the Red Hook Digital Stewards arrived.

      o

        Four nodes were installed in McGregor conference center, two in
        the Education building, two in the Arts building, two in the
        Hilberry Student Center, and one in the Auditorium.

  *

    Jonathan set up a Tidepools instance for the conference that
    included a map, session browser, and twitter feed. It was added to
    the nodes' application portal. It received excellent feedback!

Challenges Discovered

Several challenges were discovered through the process of building the
network.

 1.

    Changing a node's WiFi channel in the presence of other Ad-Hoc nodes
    does not work.

     1.

        This was discovered by Will Hawkins and Dan Staples while
        preparing for the AMC.

     2.

        The problem occurs if you have two nodes meshing on channel A,
        and then try to change one of them to channel B, it will go to
        channel B and then right back to channel A a few seconds later.
        This is due to the way adhoc cells try to converge. We were
        unable to coerce iw into staying on a channel when upping the
        adhoc interface. This is unsolved.

 2.

    We found immediately that the nodes in McGregor, which were all
    connected via wire to the WSU LAN, were not getting DHCP leases from
    the gateway.

     1.

        This issue was solved by increasing the DHCP timeout in the
        /lib/netifd/proto/commotion.sh script. We had to increase it to
        60 seconds in order to get DHCP leases from the WSU gateway. A
        feature has been added in the LuCI interface to  set the DHCP
        timeout .

 3.

    The first major issue discovered during the conference was high
    packet loss and long ping times to the access points.

     1.

        This was first seen when connecting to any of the access points
        in the McGregor building, but was not seen in the other buildings.

     2.

        At first we thought this was a problem of RF interference. We
        first tried turning down the TX power on all of the nodes, and
        sometimes even turning off nodes. This would help for a short
        period of time, but eventually the problems returned.

     3.

        Finally, we turned off the wireless adhoc interfaces on all the
        McGregor nodes, put all their APs on different channels, and had
        them mesh over Ethernet. This solved the latency/packet loss
        issues within McGregor.

 4.

    Another major problem was establishing good mesh wireless links
    between the buildings. This proved to be an extremely challenging
    problem, and consumed several hours of troubleshooting.

     1.

        The initial task was to get two NanoStations in McGregor to
        wirelessly mesh with Picostations in the adjacent buildings. The
        NanoStations were aimed from second floor windows towards the
        adjacent buildings. The signal strength at the neighboring
        buildings was marginal or nonexistent. This was first thought to
        be an aiming problem, or interference between the nodes.

     2.

        To combat the possible interference or RF saturation, one of the
        NS units was powered off. We then optimized the output power of
        the remaining NS and PS units nearby, so that each would show a
        received signal strength of around -40 to -50 dBm (very strong).
        We then replaced the NanoStation with a PicoStation, thinking
        DR1 might have compatibility problems with the NS units, or
        issues with MIMO streams. After this change and the TX power
        optimization, the packet loss and ping times (~5 seconds) were
        still excessively high.

     3.

        Andy Gunn determined that the building's windows through which
        we were trying to mesh were shielded, and were attenuating the
        RF signal. To verify, a NanoStation was extended to the outside
        of the building, and the link immediately improved. One way to
        diagnose this consistently was discussed: look at your cell
        phone's reception as you go from outside the building to inside;
        if reception drops significantly, even next to a window, the
        windows are likely shielded.

     4.

        The underlying problem was linked to the NS or PS node that was
        meshing via Ethernet and Ad-Hoc wireless to bridge from McGregor
        to the adjacent buildings. After meshing on Ethernet was
        disabled, packet loss and latency disappeared. Will Hawkins
        suggested the olsrd-dnssd plugin was also causing problems -
        once that was disabled on allthe nodes, meshing over Ethernet
        was re-enabled and everything worked fine. It is still unclear
        what exactly about the dnssd plugin caused the problems.

 5.

    Since the McGregor nodes were connected to a LAN, the default
    firewall rules blocked incoming connections to their web interface
    from the LAN.

     1.

        Firewall exceptions to allow WAN connections to port 80 needed
        to be manually added in order to access the nodes' web
        interfaces from the WSU LAN. Since ssh connections from the WAN
        zone are allowed by default, it is probably no less secure to
        allow http connections from the WAN zone. It is worth
        considering changing this in the default firewall config.

 6.

    It was not possible to disable the AP, but leave the wireless ad-hoc
    interface in place. However, disabling the Ad-Hoc interface while
    leaving the AP up does work.

     1.

        This is due to when the AP is disabled from the web interface,
        it adds a 'disabled' flag to the AP in /etc/config/wireless,
        then restarts the wifi interfaces. This has the effect of taking
        down wlan0-1 (the Ad-Hoc interface) and leaving wlan0 (the AP),
        the opposite of the intended effect. It also has the effect of
        creating an invalid olsrd config (via commotiond), and the wifi
        interfaces are displayed incorrectly in the web interface.
        Unresolved.

 7.

    One of the nodes would spew tons of duplicate packets when pinged.

     1.

        This may have been a hardware malfunction. It was replaced with
        a newly flashed PS, and the problem disappeared.

 8.

    Some people were confused by the splash page.

     1.

        I got some feedback from some of the more tech-savvy folks at
        the AMC that the port 80-only captive portal was potentially
        confusing, especially if a user has an SSL-encrypted homepage in
        their browser.

Recommendations for future Development

  *

    Fix the adhoc wifi channel hopping issue

  *

    Implement dynamic gain control so nodes don't drown each other out

  *

    Increase default DHCP timeout and add option to change it in the web
    interface (these fixes have already been submitted)

  *

    Check for shielded windows when doing a deployment

  *

    Refactor the olsrd-dnssd plugin

  *

    Add firewall rule to allow port 80 connections from the WAN zone

  *

    Fix the bug preventing disabling the AP and leaving adhoc.

  *

    Captive portal port 443 traffic with a self-signed cert by default.

  * Setup network monitoring before doing a deployment. It would have
    been immensely useful to know how many folks used the MagicNet, and
    if they used the local apps at all. It would also have been useful
    to solicit feedback on the network and what issues the users ran into.

*

-- 
Dan Staples

Open Technology Institute
https://commotionwireless.net

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.chambana.net/pipermail/commotion-dev/attachments/20130625/e0550363/attachment-0001.html>