[Commotion-dev] meshing over ethernet

Dan Staples danstaples at opentechinstitute.org
Wed Jul 9 19:35:53 EDT 2014


Is this something that would affect the olsrd-secure plugin as well?

Dan

On 07/08/2014 10:36 PM, Will Hawkins wrote:
> A fix for the control-message-as-the-first-message-in-an-olsr-packet has
> been conceived and tested. There is now only one thing left to fix:
> 
> When a node sends out a challenge control message (a), if it receives a
> challenge message from (a) before it receives a challenge response
> control message, the whole things goes to pot. This is incredibly common
> in the scenario when nodes are meshing over multiple interfaces. Once
> this problem is resolved, I think we will have a complete solution.
> 
> Will
> 
> On 07/07/2014 06:25 PM, Will Hawkins wrote:
>> Okay!
>>
>> So, please (apparently) disregard my previous messages. The root of the
>> problem is conceptually more simple (although I haven't yet started
>> thinking about the fix):
>>
>> It appears that MDP expects
>> challenge/challenge-response/response-response messages (i.e. MDP
>> control packets) to be the very first message in any OLSR packet. That
>> means that if any of those messages is not first, then they will get
>> missed. This is obviously a problem.
>>
>> The pico stations never seem to append any packets before an MDP control
>> packet. The Buffalo router does. I think it's more of a timing/packet
>> size issue, but the Buffalo router is a good test case because it
>> exercises this little "gem".
>>
>> Now, on to the fix. I am hoping to get something going tonight. I will
>> keep everyone posted!
>>
>> Will
>>
>> On 07/03/2014 06:18 PM, Will Hawkins wrote:
>>> Further debugging seems to indicate that if one of the two nodes is
>>> meshing over a single interface, the other node may be set to mesh over
>>> two interfaces. In other words, the problem seems to exist only when
>>> both nodes are meshing over multiple interfaces.
>>>
>>> Go figure?
>>>
>>> I'm getting more and more flummoxed by what is going on, but we are
>>> working hard at fixing the problem.
>>>
>>> Will
>>>
>>> On 07/02/2014 08:49 PM, Will Hawkins wrote:
>>>> Miles,
>>>>
>>>> We have uncovered the root of the problem and wanted to share the findings.
>>>>
>>>> First of all, thank you for your patience with us as we debugged this
>>>> issue. Without your input, we would never have realized that this was a
>>>> problem.
>>>>
>>>> In cases like yours, olsrd is meshing over two different interfaces.
>>>> There is a primary interface address that labels the node throughout the
>>>> network and there are other, secondary, addresses that label the
>>>> individual interfaces.
>>>>
>>>> In the Serval route signing plugin, we use those labels to index a table
>>>> of timeouts/timestamps. The values from this table are used to locate
>>>> the proper key, the proper timestamp skew, etc.
>>>>
>>>> When a node has multiple interfaces, the plugin gets confused about
>>>> which label to use to index that table. As a result, the skews never
>>>> converge and the routes cannot be signed.
>>>>
>>>> We are going to start looking at possible solutions for this problem as
>>>> soon as possible. We cannot promise a fix before the start of Toorcamp
>>>> next week, but we are going to do our best. We will keep you posted on
>>>> our progress and send you any fixes.
>>>>
>>>> In the meantime, the only way to work around the problem is to mesh on a
>>>> single interface per node.
>>>>
>>>> I hope this information helps. As I said, we will keep you posted!
>>>>
>>>> Thanks again for all the input you've given us!
>>>> Will
>>>>
>>>> On 07/02/2014 05:29 PM, Dan Staples wrote:
>>>>> Hey Miles,
>>>>>
>>>>> That sounds like a good plan B to me, if we can't fix this issue. But we
>>>>> (and by that I mean folks at the office other than me) did some testing
>>>>> today to see if we could figure out the problem you're seeing. Here's
>>>>> what they found:
>>>>>
>>>>> Serval route signing between Buffalo and Ubiquiti routers causes
>>>>> commotiond and olsrd to seg fault (but works fine in Ubiq-only meshes).
>>>>> Debugging it indicates that it's a memory-related architecture-specific
>>>>> problem in commotiond. The hardware we used to replicate the issue were
>>>>> Ubiquiti Picostation and Buffalo WZR-HP-G300NH.
>>>>>
>>>>> We already have one open memory-related fix for commotiond that may or
>>>>> may not solve the problem:
>>>>> https://github.com/opentechinstitute/commotiond/pull/103. We'll do some
>>>>> more testing today and tomorrow and let you know anything else we find.
>>>>>
>>>>> Thanks for your patience with this and hopefully we'll be able to
>>>>> resolve the problem.
>>>>>
>>>>> Dan
>>>>>
>>>>> On 07/02/2014 11:44 AM, Myles wrote:
>>>>>> So plan b for meshing in production is to use WPA on the mesh interface and firewall OLSR to be unreachable from non mesh interfaces. Right?
>>>>>>
>>>>>> Sent from my mobile
>>>>>>
>>>>>>> On Jul 2, 2014, at 7:25 AM, Chris Ritzo <critzo at opentechinstitute.org> wrote:
>>>>>>>
>>>>>>> Miles,
>>>>>>> I was discussing this thread with some other team members this morning,
>>>>>>> and we think you've confirmed a bug that we found in our 1.1rc2
>>>>>>> connectivity tests.
>>>>>>>
>>>>>>> Those tests confirm that two nodes meshed via ethernet will work when
>>>>>>> not signed and fail when signed. Your report that turning off Serval
>>>>>>> signing makes the center Buffalo node work properly.
>>>>>>>
>>>>>>> Our team is still debugging this and will be pushing feedback to Serval
>>>>>>> about it, however in the interim, turning off route signing via Serval
>>>>>>> should solve this for you.
>>>>>>>
>>>>>>> I'm sure Josh and Will can weigh in on more specifics related to the bug.
>>>>>>>
>>>>>>> -Chris
>>>>>>>
>>>>>>>> On Wed 02 Jul 2014 07:06:52 AM EDT, Dan Staples wrote:
>>>>>>>> The current master branch is now using an upgraded version of olsrd,
>>>>>>>> version 0.6.6, but doing a diff b/w the versions doesn't show anything
>>>>>>>> that would affect the route signing. So it should be fully compatible.
>>>>>>>>
>>>>>>>> Is your setup something like this?
>>>>>>>>
>>>>>>>> [ubiquiti]---wifi---[ubiquiti]---ethernet---[buffalo
>>>>>>>> center]---wifi---[buffalo]
>>>>>>>>
>>>>>>>> I can try to recreate a similar setup and test it tomorrow when I have
>>>>>>>> access to a test network. I'm not sure if we've extensively tested mixed
>>>>>>>> wifi/ethernet meshing and route signing together.
>>>>>>>>
>>>>>>>> Did you see any log output from the center or ubiquiti devices when
>>>>>>>> route signing was turned on that could indicated what the problem was?
>>>>>>>>
>>>>>>>> Also CCing a couple other folks that might have some good
>>>>>>>> troubleshooting ideas.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>> On 07/02/2014 03:24 AM, miles wrote:
>>>>>>>>> This is giving me no end of trouble. I've now tested, and with all
>>>>>>>>> firewalls turned off
>>>>>>>>>
>>>>>>>>> 3 ubiquiti nodes will mesh using serval over wifi. As you said, it takes
>>>>>>>>> a few minutes,(but not more than 5) to settle. 
>>>>>>>>> 2 Buffalo nodes will mesh over wifi. 
>>>>>>>>> 1 buffalo node "Center" is connected to one ubiquiti over ethernet.
>>>>>>>>> Turning off serval signing makes everything work as expected through
>>>>>>>>> node Center.
>>>>>>>>>
>>>>>>>>> Turn on serval, and center sees buffalos, but will not communicate with
>>>>>>>>> the ubiquiti device.  
>>>>>>>>>
>>>>>>>>> Thoughts for what to test/debug next? 
>>>>>>>>>
>>>>>>>>> The buffalos were build using master last week. Ubiquitis are 1.1rc2.
>>>>>>>>> Does master play nicely with 1.1 right now?  The next thing I can think
>>>>>>>>> of to try is to rebuild with commotion feed as 1.1 and see if getting
>>>>>>>>> the same olsrd version will magically fix things. 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jul 1, 2014, at 7:48 AM, Dan Staples
>>>>>>>>> <danstaples at opentechinstitute.org
>>>>>>>>> <mailto:danstaples at opentechinstitute.org>> wrote:
>>>>>>>>>
>>>>>>>>>> Serval signed routes will work without a gateway/NTP. However, it will
>>>>>>>>>> definitely take up to 5 minutes for the timestamps to converge. They
>>>>>>>>>> *will* converge though, even if the starting clocks on the nodes are
>>>>>>>>>> days or months apart. Give it a few minutes and see if it starts working
>>>>>>>>>> again.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Commotion-dev mailing list
>>>>>>>>> Commotion-dev at lists.chambana.net
>>>>>>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Commotion-dev mailing list
>>>>>>> Commotion-dev at lists.chambana.net
>>>>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>>
>>>> _______________________________________________
>>>> Commotion-dev mailing list
>>>> Commotion-dev at lists.chambana.net
>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>
>>> _______________________________________________
>>> Commotion-dev mailing list
>>> Commotion-dev at lists.chambana.net
>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>
>> _______________________________________________
>> Commotion-dev mailing list
>> Commotion-dev at lists.chambana.net
>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>
> _______________________________________________
> Commotion-dev mailing list
> Commotion-dev at lists.chambana.net
> https://lists.chambana.net/mailman/listinfo/commotion-dev
> 

-- 
Dan Staples

Open Technology Institute
https://commotionwireless.net
OpenPGP key: http://disman.tl/pgp.asc
Fingerprint: 2480 095D 4B16 436F 35AB 7305 F670 74ED BD86 43A9


More information about the Commotion-dev mailing list