4 years, 11 months ago.

Synchronization bug in libexactLE/hci driver?

After streaming data over BLE for many hours my device ends up in a hung state where the hciDrvWrite routine is stuck spinning on the reading flag.

I don't have source for libexactBLE.a but I've tracked this down to the following lines in the disassembled code (which I have annotated):

    867c:	4b43      	ldr	r3, [pc, #268]	; (878c <hciDrvWrite+0x124>)   r3 <= irqObj
    867e:	6818      	ldr	r0, [r3, #0]                                 ; r0 <= [irqObj]
    8680:	f7fd ff68 	bl	6554 <gpio_irq_disable>
    8684:	4942      	ldr	r1, [pc, #264]	; (8790 <hciDrvWrite+0x128>)   r1 <= reading
    8686:	4b43      	ldr	r3, [pc, #268]	; (8794 <hciDrvWrite+0x12c>)   r3 <= writing
    8688:	680a      	ldr	r2, [r1, #0]                                ;  r2 <= [reading]
    868a:	2a00      	cmp	r2, #0
    868c:	d1fc      	bne.n	8688 <hciDrvWrite+0x20>                   ;  SPIN!
    868e:	681a      	ldr	r2, [r3, #0]
    8690:	2a00      	cmp	r2, #0
    8692:	d1f9      	bne.n	8688 <hciDrvWrite+0x20>
   .
   .
   .
    8790:	200021b0 	.word	0x200021b0
    8794:	20002194 	.word	0x20002194

and gdb shows 0x200021b0 as the reading flag which is set to 1:

(gdb) x/x 0x200021b0
0x200021b0 <reading>:	0x01

So, for some unknown reason, the reading flag is left set and the write routine spins forever.

If I had source I would try to work through this (is it possible to get source for libexactLE.a???) but my first thought is that disabling the interrupt before spinning on the reading flag seems like a bad idea because that could prevent any pending read from finishing.

Could someone at Maxim take a look at the hci driver source and let me know if this condition is possible and whether or not a fix is easy?

Thanks, Kevin

Question relating to:

MAXREFDES100# Health Sensor Platform

I've traced this a little further and it looks like a corrupted/misaligned hdrRx is the root cause. Typically, during normal operation I see the following with a break point set just after the header was read:

Breakpoint 1, 0x00007f46 in hciTrRxIncoming ()
(gdb) x/5bx &hdrRx
0x20002175 <hdrRx>:	0x04	0x13	0x05	0x09	0x00
(gdb) x/b &stateRx
0x20002174 <stateRx>:	0x03
(gdb) p readLen
$2 = 0
(gdb) p/x readPtr
$3 = 0x20002178
(gdb) p readReceived
$4 = 3

whereas when the write is stuck waiting on the reading flag I see:

Program received signal SIGINT, Interrupt.
[Switching to Thread 536879808]
0x00008680 in hciDrvWrite ()
(gdb) x/5b &hdrRx
0x20002175 <hdrRx>:	0x13	0x05	0x01	0x09	0x00
(gdb) x/b &stateRx
0x20002174 <stateRx>:	0x01
(gdb) p readLen
$4 = 0
(gdb) p/x readPtr
$5 = 0x20002178
(gdb) p readReceived
$6 = 3

It looks to me like the hdrRx is offset by one byte, i.e. the 0x04 indicating the packet type was lost, and things fall apart from there.

posted by Kevin Peterson 25 May 2019
Be the first to answer this question.