Widcomm


Summary

One of the things on my to-do list for our Widcomm support is that on some of the device platforms once the device has gone through a power-down-up change the stack needs to be restarted. On my Asus device in this case we find that some operations fail. Once we re-create the Widcomm objects they works again.

What fails when

On my older iPAQ with Widcomm version 1.7.1.1424 after a power-down-up, there is no problem: things works ok.

On my Asus with Widcomm version “1.8.0 build 4800”, the stack needs to be reset. If we don’t restart the stack then Device Discovery (CBtIf::StartInquiry) and Service Discovery (StartDiscovery) fail for instance (and the latter affects BtCli.Connect).

What we tried

So the obvious thing to do in this case is to close the existing Widcomm C++ objects and initialise new ones. So I implemented that: when the Stack-Down event has been seen, the next time we do an operation we dispose our instance of the CBtIf object and ‘new’ a new instance.

What works and doesn’t

On the Asus that works fine, StartInquiry etc work successfully after the dispose/re-create.

On the iPAQ that’s the case also, but only if we leave some time after the restart before using the stack again. Instead if we access the new instance too soon after the restart we get a crash. Well it looks like a crash in that the application just disappears, but it doesn’t look like a crash in that we get no crash dialog nor apparently any crash dump. Why not?? And we can inspect it with the debugger as we can’t have a debugger stay attached over the power-down-up and it takes too long for ActiveSync and/or a debugger to re-attach to be able to catch the crash.

Any suggestions about investigating this crash are welcomed…

Other thoughts

The other thought was to check the IsStackServerUp and IsDeviceReady methods provided by the Widcomm CBtIf class. We would check these and delay things until the stack was in a safe state. However these methods were added in a version after that that the problematic device has, they were added in 1.7.1.2700 but the iPAQ has 1.7.1.1424.  So they’re not available on the very platform that we need them on.

I had hoped anyway that closing the instance and creating a new one would make everything right — internally it would do a wait if necessary.  What we see of course that the new instance is returned, but then something unknown fails…

Events

One begins to consider whether we need to add some some of version check hack: if (version < 1.8.0.xxxx) then delay-new-instance-creation.  Or something like that…

Or, looking at the logs maybe we can maybe use the different events produced to fingerprint the version.

iPAQ log
At 15:50:20: StackStatusChange: Unloaded
At 15:50:27: StackStatusChange: Reloaded

Asus log
At 15:52:44: StackStatusChange: Down
At 15:53:11: StackStatusChange: Unloaded

Note the difference: iPAQ: Unloaded+Reloaded, Asus: Down+Unloaded with no Reloaded.

Out of interest: Firstly, why no Reloaded in the latter case?  Because the stack is detached on that platform??  Secondly the Widcomm docs say: “DEVST_DOWN indicates that the Bluetooth stack server has been shut down and is not expected to be restarted.”, but that’s not the case here.  Hmm, they seem a bit confused… 🙂

I think we’ll have a think about using the Down+Unloaded case to find that the platform needs reset and don’t do the reset in the other cases.  I’ll try and test this on other devices and see what I find.

Other thoughts welcome…

Advertisements

I’ve just checked in support for Widcomm’s COM Port creation classes. Access WidcommSerialPort.CreateClient and store the result and once you’ve finished with the COM port Dispose it, it’ll be Finalized if not referenced.  I’ll look at integrating the functionality into the BluetoothSerialPort class sometime.

I’ve tested it on my Asus WM Classic device and it work fine there. The Widcomm documentation implies that it should work on Win32 too but it doesn’t work on my WinXP Widcomm v3 installation.

Download the library code (revision 85315 or later) from http://32feet.codeplex.com/SourceControl/list/changesets and compile the CF2 project to get the library code to use in your project. You’ll also need the new native Widcomm mapping DLL, get it from http://32feet.codeplex.com/releases/view/61443

Testing shows that if the connected devices go out of range then the connection is lost however (we see DISCONNECT event in the code, and the remote socket server sees close). Presumably we need to implement code to repeatedly retry to reconnect?

I’ve done about a day’s work on that. I’d be glad of a wee ‘reward’ for adding this new feature, particularly if it saves you time on your projects. 🙂 I’ve Amazon wishlist or paypal for instance.

In 2.5 on Win32 we made all calls into the Widcomm API come from one thread, this was due to the note in the Widcomm documentation as discussed at https://32feetnetdev.wordpress.com/2009/12/04/widcomm-single-thread-access/   However since then two people noted in the forums that on Windows Mobile some operations failed when called from an async callback.  One noted that in a DiscoverDevices callback that calling BeginDiscoverDevices failed, and another noted that a Connect failed when called from there.

So what do we learn from this?  If a library doesn’t state its threading requirements then assume the worst.  This is unlike the managed code I’ve created where everything is safe when called from multiple threads and from a callback.  I ensure that all invariants are correct and am holding no locks etc when I raise the callback etc.  So nothing is forbidden, and no special main thread is required.

So, since then in the Widcomm support code on both platforms we’ve made all user callbacks occur on a thread-pool thread.  Hopefully that’ll solve all the threading issues once and for all.  On the downside it could make a small change in behaviour, for instance the callbacks could take a slightly longer period to start after an event, and if there were multiple reads or writes released by a stack event then they could now run in parallel.

Since I produced the 2.4 release there has been an update to the VC++ runtime files, presumably as part of one of the ATL security patches.  This means that the new 32feetWidcomm DLL on Win32 references that updated version.  The manifest in the 2.4 version contains:

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC80.CRT"
version="8.0.50727.762" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b"></assemblyIdentity>
    </dependentAssembly>
  </dependency>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC80.CRT"
version="8.0.50608.0" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b"></assemblyIdentity>
    </dependentAssembly>
  </dependency>
</assembly>

but the new version contains:

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC80.CRT"
version="8.0.50727.4053" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b"></assemblyIdentity>
    </dependentAssembly>
  </dependency>
  <dependency>
    <dependentAssembly>
      <assemblyIdentity type="win32" name="Microsoft.VC80.CRT"
version="8.0.50608.0" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b"></assemblyIdentity>
    </dependentAssembly>
  </dependency>
</assembly>

So we need VC runtime version 8.0.50727.4053 rather than 8.0.50727.762.  If you run this on a machine that doesn’t have that version it will fail with e.g.

Exception creating factory ‘InTheHand.Net.Bluetooth.Widcomm.WidcommBluetoothFactory, ex: System.DllNotFoundException: Unable to load DLL ’32feetWidcomm’: The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail. (Exception from HRESULT: 0x800736B1)

And there should be a couple of events in the event log, like the following (but they’re for a DEBUG version fault) e.g.

Log Name:      Application
Source:        SideBySide
Date:          14/12/2009 09:42:30
Event ID:      33
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      alanlt2w
Description:
Activation context generation failed for “E:\Users\alan\Documents\32feet Merge\32feetWidcomm.DLL”. Dependent Assembly Microsoft.VC80.DebugCRT,processorArchitecture=”x86″,publicKeyToken=”1fc8b3b9a1e18e3b”,type=”win32″,version=”8.0.50727.4053″ could not be found. Please use sxstrace.exe for detailed diagnosis.

Log Name:      Application
Source:        SideBySide
Date:          14/12/2009 09:42:30
Event ID:      33
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      alanlt2w
Description:
Activation context generation failed for “E:\Users\alan\Documents\32feet Merge\32feetWidcomm.DLL”. Dependent Assembly Microsoft.VC80.DebugCRT,processorArchitecture=”x86″,publicKeyToken=”1fc8b3b9a1e18e3b”,type=”win32″,version=”8.0.50727.4053″ could not be found. Please use sxstrace.exe for detailed diagnosis.

So how to fix this.  Install vcredist.exe from “Microsoft Visual C++ 2005 Service Pack 1 Redistributable Package ATL Security Update” http://www.microsoft.com/downloads/details.aspx?familyid=766a6af7-ec73-40ff-b072-9112bab119c2&displaylang=en

After I install that I get five new items in the SxS cache (%windir%\winsxs\):

x86_microsoft.vc80.atl_1fc8b3b9a1e18e3b_8.0.50727.4053_none_d1c738ec43578ea1
x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.4053_none_d08d7da0442a985d
x86_microsoft.vc80.mfcloc_1fc8b3b9a1e18e3b_8.0.50727.4053_none_03ca5532205cb096
x86_microsoft.vc80.mfc_1fc8b3b9a1e18e3b_8.0.50727.4053_none_cbf21254470d8752
x86_microsoft.vc80.openmp_1fc8b3b9a1e18e3b_8.0.50727.4053_none_3b0e32bdc9afe437

The second one being the required component.

So, I found a little time to continue investigations after my debugging session, Investigating Widcomm when MSFT present   Of the possibilities I’ve emailed Broadcom for #1 and also tried #3.  (We’ll leave #2 for now).

So I downloaded version 5.1.0.3101 of the SDK — that’s the last version of the SDK before v6 added the Vista support (wrapping the MSFT stack).  And guess what, it works!  As simple as that…

Running the multi-stack tests in ConsoleMenuTesting is successful.  That’s running a listener on one stack and the client on the other stack, both on the same PC.  Both cases work, listener on Widcomm and listener on MSFT, see the logs below.  Also running two ObexWebrequests to the opposite stack is also successful.

All that’s needed is a new copy of our native DLL, 32feetWidcomm.dll.  I’ve posted a copy to http://32feet.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=37167.  Please download this, replace your current copy and test test test.  Let me know if it’s all successful, or not…

option>12
 1 -- Quit
 2 -- <-Back
 3 -- ListenerOnStack1ClientOnStack2
 4 -- ListenerOnStack2ClientOnStack1
...
option>3
IBtIf using WidcommStBtIf.
BtIf_Create
Num factories: 2, Primary Factory: WidcommBluetoothFactory
BtIf_GetLocalDeviceVersionInfo
BtIf_GetLocalDeviceName
BtIf_IsDeviceConnectableDiscoverable
1)  Radio, address: 00:0A:3A:68:65:BB
Mode: Discoverable
Name: ALANPC1b, LmpSubversion: 777
ClassOfDevice: 0, device: Miscellaneous / service: None
Software: Broadcom,  Hardware: Broadcom, status: Running
Remote: ''2)  Radio, address: 00:80:98:24:4C:A4
Mode: Connectable
Name: ALANPC1, LmpSubversion: 524
ClassOfDevice: 20104, device: DesktopComputer / service: Network
Software: Microsoft,  Hardware: CambridgeSiliconRadio, status: Unknown
Remote: ''IRfCommIf using WidcommStRfCommIf.
RfCommIf_Create
RfCommIf_GetScn
Server GetScn returned port: 10
True; False,False-> NONE
Listening on 000000000000:10
WidcommRfcommPort.Create'd: 1212FC8
NativeMethods.RfcommPort_OpenServer ret: SUCCESS=0x00000000
OpenServer ret: SUCCESS=0x00000000
StartOneNewListenerPort 1212FC8.
Started 1 new port(s).
SdpService_Create
SdpService_AddServiceClassIdList, num: 1, p_service_guids: 0135CC90
SdpService_AddRFCommProtocolDescriptor
Listener active on SCN: 10
BeginAccept Enqueued
Gonna connect to: 000A3A6865BB:10
release to connect!>
HandleEvent: 512=0x200=CONNECTED
1212FC8: CONNECTED (New)CONNECTED 1212FC8; m_state: New; m_arConnect (set), IsCompleted: False.
NOT RfcommPort_IsConnected on our Widcomm SINGLE thread!
RfcommPort_IsConnected on Widcomm callback thread!
Client Connected to : 'ALANPC1b' 000A3A6865BB:10
waiting for Lsnr.Accept completion event.
PortAccepted Dequeued a caller
WidcommRfcommPort.Create'd: 1215EF8
NativeMethods.RfcommPort_OpenServer ret: SUCCESS=0x00000000
OpenServer ret: SUCCESS=0x00000000
StartOneNewListenerPort 1215EF8.
Started 1 new port(s).
HandleEvent: 7224=0x1C38=CTS, DSR, RLSD, CTSS, DSRS, RLSDS
1212FC8: CTS, DSR, RLSD, CTSS, DSRS, RLSDS (Connected)All success

….

1 -- Quit
2 -- <-Back
3 -- ListenerOnStack1ClientOnStack2
4 -- ListenerOnStack2ClientOnStack1
...
Invalid number>4
BtIf_GetLocalDeviceVersionInfo
BtIf_GetLocalDeviceName
BtIf_IsDeviceConnectableDiscoverable
1) Radio, address: 00:0A:3A:68:65:BB
Mode: Discoverable
Name: ALANPC1b, LmpSubversion: 777
ClassOfDevice: 0, device: Miscellaneous / service: None
Software: Broadcom, Hardware: Broadcom, status: Running
Remote: ''2) Radio, address: 00:80:98:24:4C:A4
Mode: Connectable
Name: ALANPC1, LmpSubversion: 524
ClassOfDevice: 20104, device: DesktopComputer / service: Network
Software: Microsoft, Hardware: CambridgeSiliconRadio, status: Unknown
Remote: ''Listener active on SCN: 1
IRfCommIf using WidcommStRfCommIf.
WidcommRfcommPort.Create'd: 1215A78
RfCommIf_Create
Gonna connect to: 008098244CA4:1
release to connect!>
BeginFillInPortState
BeginFillInPort, has port -> Completed Syncronously
NativeMethods.RfcommPort_OpenClient ret: SUCCESS=0x00000000
OpenClient ret: SUCCESS=0x00000000
HandleEvent: 512=0x200=CONNECTED
1215A78: CONNECTED (New)CONNECTED 1215A78; m_state: New; m_arConnect (set), IsCompleted: False.
NOT RfcommPort_IsConnected on our Widcomm SINGLE thread!
RfcommPort_IsConnected on Widcomm callback thread!
HandleEvent: 7736=0x1E38=CTS, DSR, RLSD, CONNECTED, CTSS, DSRS, RLSDS
Client Connected to : 'ALANPC1' 008098244CA4:00000000000000000000000000000000
waiting for Lsnr.Accept completion event.
All success1215A78: CTS, DSR, RLSD, CONNECTED, CTSS, DSRS, RLSDS (Connected)
CONNECTED 1215A78; m_state: Connected; m_arConnect (null), IsCompleted: n/a.
HandleEvent: 512=0x200=CONNECTED
1215A78: CONNECTED (Connected)
CONNECTED 1215A78; m_state: Connected; m_arConnect (null), IsCompleted: n/a.

I’ve previously run the Widcomm stack under a debugger but never with enough time to investigate it thoroughly.  Having changed the threading model and since that didn’t do anything to help Widcomm work when the Microsoft stack is present I wanted to get stuck in to fixing this.

In short what I found was that when the Widcomm stack loads (actually when I create a BluetoothListener on the Widcomm stack) I see calls to the Microsoft stack API and to Winsock!  First it calls the BluetoothFindFirstRadio/etc API and then it calls: socket(AF_BTH, SOCK_STREAM, BTHPROTO_RFCOMM).  Then when the BluetoothListener is active if one breaks into the debugger and lists the threads one thread is sitting waiting on WSAAccept.

So it’s all very clear now.  It’s not some unintentional conflict between Widcomm and the Microsoft Bluetooth stack that breaks some threading or message passing system, instead its fully intentional behaviour on Widcomm’s part.  For Vista instead of porting their whole stack to the new platform they instead just provide a wrapper around the Microsoft stack.  And this is what’s causing the problems here.  We want the Widcomm API to use the Widcomm stack but it’s finding the Microsoft stack, and assuming that’s its own stack isn’t present.  So we need a way to tell Widcomm, “Don’t use the Microsoft stack”.

In order to test whether that’s all that is required, I used the debugger to make it appear like the Microsoft stack wasn’t present.  I set a breakpoint on ws2_32.dll!socket and the breakpoint was hit with Widcomm doing the call above.  I arranged for the socket creation to fail and thus forced Widcomm to use its own stack.  Doing this made the two stacks coexist. 🙂  I was able to get the two stacks to connect to each other on the same PC: a BluetoothClient on Widcomm connecting to a BluetoothListener on Widcomm, and the also with the roles reversed.

So how can we get Widcomm to ignore the Microsoft stack:

  1. Find if there’s a a Registry key or similar that Widcomm provides to disable the wrapping behaviour. 
  2. Do some cunning interception of the socket API call and cause it to fail when Widcomm attempts to open a Bluetooth socket.
  3. Try an older version of the Widcomm SDK to see if it provides the wrapping behaviour and thus the feature isn’t present in an older version.  There’s the danger that we lose some features present in up-to-date SDKs.

Don’t know when I’ll next have some time to do some more investigation so feel free to carry out your own investigations in the meantime.

Following on from Widcomm single thread access, I’ve now moved all operations onto the ‘special thread’ and not just the connection-oriented ones.  So now StartInquiry, and StartDiscovery for instance are run on the main thread.  There’s still some tidying-up to do at some time.

This has been followed by a big round of testing.  I repeated my testing from Widcomm faults in OBEX operations where Win32 stopped sending after many megabytes.  I wasn’t able to reproduce this, so it seemed to have been fixed.  However I then tried repeating the test on the 2.4 version and couldn’t reproduce it either…  Until I switched user (XP’s fast-user switching) and then the data transfer stopped!  So it seems that there’s actually no problem with “buffer-empty” events stopping due to not being thread affinite, but instead that Widcomm keeps state in the current logon session and thus breaks if the session changes.  Yikes!  So be aware of this.  I wonder what happens when calling the API from a NTService…

Also the change doesn’t seem to have had any effect on the conflict with the Microsoft stack.  Oh well…

Next Page »