[Commotion-dev] meshing over ethernet

Will Hawkins hawkinsw at opentechinstitute.org
Wed Jul 9 19:39:51 EDT 2014



On 07/09/2014 07:35 PM, Dan Staples wrote:
> Is this something that would affect the olsrd-secure plugin as well?

Yes. All of the "fixes" that I put in my pull request affect the
olsrd-secure plugin too. That's a big ol' :-(

Will

> 
> Dan
> 
> On 07/08/2014 10:36 PM, Will Hawkins wrote:
>> A fix for the control-message-as-the-first-message-in-an-olsr-packet has
>> been conceived and tested. There is now only one thing left to fix:
>>
>> When a node sends out a challenge control message (a), if it receives a
>> challenge message from (a) before it receives a challenge response
>> control message, the whole things goes to pot. This is incredibly common
>> in the scenario when nodes are meshing over multiple interfaces. Once
>> this problem is resolved, I think we will have a complete solution.
>>
>> Will
>>
>> On 07/07/2014 06:25 PM, Will Hawkins wrote:
>>> Okay!
>>>
>>> So, please (apparently) disregard my previous messages. The root of the
>>> problem is conceptually more simple (although I haven't yet started
>>> thinking about the fix):
>>>
>>> It appears that MDP expects
>>> challenge/challenge-response/response-response messages (i.e. MDP
>>> control packets) to be the very first message in any OLSR packet. That
>>> means that if any of those messages is not first, then they will get
>>> missed. This is obviously a problem.
>>>
>>> The pico stations never seem to append any packets before an MDP control
>>> packet. The Buffalo router does. I think it's more of a timing/packet
>>> size issue, but the Buffalo router is a good test case because it
>>> exercises this little "gem".
>>>
>>> Now, on to the fix. I am hoping to get something going tonight. I will
>>> keep everyone posted!
>>>
>>> Will
>>>
>>> On 07/03/2014 06:18 PM, Will Hawkins wrote:
>>>> Further debugging seems to indicate that if one of the two nodes is
>>>> meshing over a single interface, the other node may be set to mesh over
>>>> two interfaces. In other words, the problem seems to exist only when
>>>> both nodes are meshing over multiple interfaces.
>>>>
>>>> Go figure?
>>>>
>>>> I'm getting more and more flummoxed by what is going on, but we are
>>>> working hard at fixing the problem.
>>>>
>>>> Will
>>>>
>>>> On 07/02/2014 08:49 PM, Will Hawkins wrote:
>>>>> Miles,
>>>>>
>>>>> We have uncovered the root of the problem and wanted to share the findings.
>>>>>
>>>>> First of all, thank you for your patience with us as we debugged this
>>>>> issue. Without your input, we would never have realized that this was a
>>>>> problem.
>>>>>
>>>>> In cases like yours, olsrd is meshing over two different interfaces.
>>>>> There is a primary interface address that labels the node throughout the
>>>>> network and there are other, secondary, addresses that label the
>>>>> individual interfaces.
>>>>>
>>>>> In the Serval route signing plugin, we use those labels to index a table
>>>>> of timeouts/timestamps. The values from this table are used to locate
>>>>> the proper key, the proper timestamp skew, etc.
>>>>>
>>>>> When a node has multiple interfaces, the plugin gets confused about
>>>>> which label to use to index that table. As a result, the skews never
>>>>> converge and the routes cannot be signed.
>>>>>
>>>>> We are going to start looking at possible solutions for this problem as
>>>>> soon as possible. We cannot promise a fix before the start of Toorcamp
>>>>> next week, but we are going to do our best. We will keep you posted on
>>>>> our progress and send you any fixes.
>>>>>
>>>>> In the meantime, the only way to work around the problem is to mesh on a
>>>>> single interface per node.
>>>>>
>>>>> I hope this information helps. As I said, we will keep you posted!
>>>>>
>>>>> Thanks again for all the input you've given us!
>>>>> Will
>>>>>
>>>>> On 07/02/2014 05:29 PM, Dan Staples wrote:
>>>>>> Hey Miles,
>>>>>>
>>>>>> That sounds like a good plan B to me, if we can't fix this issue. But we
>>>>>> (and by that I mean folks at the office other than me) did some testing
>>>>>> today to see if we could figure out the problem you're seeing. Here's
>>>>>> what they found:
>>>>>>
>>>>>> Serval route signing between Buffalo and Ubiquiti routers causes
>>>>>> commotiond and olsrd to seg fault (but works fine in Ubiq-only meshes).
>>>>>> Debugging it indicates that it's a memory-related architecture-specific
>>>>>> problem in commotiond. The hardware we used to replicate the issue were
>>>>>> Ubiquiti Picostation and Buffalo WZR-HP-G300NH.
>>>>>>
>>>>>> We already have one open memory-related fix for commotiond that may or
>>>>>> may not solve the problem:
>>>>>> https://github.com/opentechinstitute/commotiond/pull/103. We'll do some
>>>>>> more testing today and tomorrow and let you know anything else we find.
>>>>>>
>>>>>> Thanks for your patience with this and hopefully we'll be able to
>>>>>> resolve the problem.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>> On 07/02/2014 11:44 AM, Myles wrote:
>>>>>>> So plan b for meshing in production is to use WPA on the mesh interface and firewall OLSR to be unreachable from non mesh interfaces. Right?
>>>>>>>
>>>>>>> Sent from my mobile
>>>>>>>
>>>>>>>> On Jul 2, 2014, at 7:25 AM, Chris Ritzo <critzo at opentechinstitute.org> wrote:
>>>>>>>>
>>>>>>>> Miles,
>>>>>>>> I was discussing this thread with some other team members this morning,
>>>>>>>> and we think you've confirmed a bug that we found in our 1.1rc2
>>>>>>>> connectivity tests.
>>>>>>>>
>>>>>>>> Those tests confirm that two nodes meshed via ethernet will work when
>>>>>>>> not signed and fail when signed. Your report that turning off Serval
>>>>>>>> signing makes the center Buffalo node work properly.
>>>>>>>>
>>>>>>>> Our team is still debugging this and will be pushing feedback to Serval
>>>>>>>> about it, however in the interim, turning off route signing via Serval
>>>>>>>> should solve this for you.
>>>>>>>>
>>>>>>>> I'm sure Josh and Will can weigh in on more specifics related to the bug.
>>>>>>>>
>>>>>>>> -Chris
>>>>>>>>
>>>>>>>>> On Wed 02 Jul 2014 07:06:52 AM EDT, Dan Staples wrote:
>>>>>>>>> The current master branch is now using an upgraded version of olsrd,
>>>>>>>>> version 0.6.6, but doing a diff b/w the versions doesn't show anything
>>>>>>>>> that would affect the route signing. So it should be fully compatible.
>>>>>>>>>
>>>>>>>>> Is your setup something like this?
>>>>>>>>>
>>>>>>>>> [ubiquiti]---wifi---[ubiquiti]---ethernet---[buffalo
>>>>>>>>> center]---wifi---[buffalo]
>>>>>>>>>
>>>>>>>>> I can try to recreate a similar setup and test it tomorrow when I have
>>>>>>>>> access to a test network. I'm not sure if we've extensively tested mixed
>>>>>>>>> wifi/ethernet meshing and route signing together.
>>>>>>>>>
>>>>>>>>> Did you see any log output from the center or ubiquiti devices when
>>>>>>>>> route signing was turned on that could indicated what the problem was?
>>>>>>>>>
>>>>>>>>> Also CCing a couple other folks that might have some good
>>>>>>>>> troubleshooting ideas.
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>>> On 07/02/2014 03:24 AM, miles wrote:
>>>>>>>>>> This is giving me no end of trouble. I've now tested, and with all
>>>>>>>>>> firewalls turned off
>>>>>>>>>>
>>>>>>>>>> 3 ubiquiti nodes will mesh using serval over wifi. As you said, it takes
>>>>>>>>>> a few minutes,(but not more than 5) to settle. 
>>>>>>>>>> 2 Buffalo nodes will mesh over wifi. 
>>>>>>>>>> 1 buffalo node "Center" is connected to one ubiquiti over ethernet.
>>>>>>>>>> Turning off serval signing makes everything work as expected through
>>>>>>>>>> node Center.
>>>>>>>>>>
>>>>>>>>>> Turn on serval, and center sees buffalos, but will not communicate with
>>>>>>>>>> the ubiquiti device.  
>>>>>>>>>>
>>>>>>>>>> Thoughts for what to test/debug next? 
>>>>>>>>>>
>>>>>>>>>> The buffalos were build using master last week. Ubiquitis are 1.1rc2.
>>>>>>>>>> Does master play nicely with 1.1 right now?  The next thing I can think
>>>>>>>>>> of to try is to rebuild with commotion feed as 1.1 and see if getting
>>>>>>>>>> the same olsrd version will magically fix things. 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jul 1, 2014, at 7:48 AM, Dan Staples
>>>>>>>>>> <danstaples at opentechinstitute.org
>>>>>>>>>> <mailto:danstaples at opentechinstitute.org>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Serval signed routes will work without a gateway/NTP. However, it will
>>>>>>>>>>> definitely take up to 5 minutes for the timestamps to converge. They
>>>>>>>>>>> *will* converge though, even if the starting clocks on the nodes are
>>>>>>>>>>> days or months apart. Give it a few minutes and see if it starts working
>>>>>>>>>>> again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Commotion-dev mailing list
>>>>>>>>>> Commotion-dev at lists.chambana.net
>>>>>>>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Commotion-dev mailing list
>>>>>>>> Commotion-dev at lists.chambana.net
>>>>>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>>>
>>>>> _______________________________________________
>>>>> Commotion-dev mailing list
>>>>> Commotion-dev at lists.chambana.net
>>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>>
>>>> _______________________________________________
>>>> Commotion-dev mailing list
>>>> Commotion-dev at lists.chambana.net
>>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>>
>>> _______________________________________________
>>> Commotion-dev mailing list
>>> Commotion-dev at lists.chambana.net
>>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>>
>> _______________________________________________
>> Commotion-dev mailing list
>> Commotion-dev at lists.chambana.net
>> https://lists.chambana.net/mailman/listinfo/commotion-dev
>>
> 


More information about the Commotion-dev mailing list