Siri may be specific to the iPhone 4S, but very little of the actual processing takes place on the iPhone. Rather, Siri streams the voice data to Apple, and then displays the response from Apple and renders it. So, to even simply transcribe your voice into text, Siri needs to be able to contact Apple.
Siri requires authentication to connect to Apple. This is likely to prevent unauthorized use: I’m sure Siri takes a lot of computing power to run, as voice transcription, looking up responses, and bandwidth aren’t free. However, if your goal is to run Siri on a device which isn’t officially supported, you need to bypass this authentication requirement in some way.
The authentication is based on what I’m going to call “tokens”, which are signed by Apple. If I remember correctly (I haven’t looked at this for a month or so, and this is from memory), Siri (through the
assistantd binary) first asks Apple for certificate data. This is then used to sign a blob of data generated by the iPhone and encrypted using AES. That signed data is then sent back to Apple, processed. If that was found to be valid, the device receives the “token” (called
sessionInfo in the code) and an expiration date (the token is generally renewed daily).
The interesting part here is the AES-encrypted and signed data that is submitted to Apple for validation. The code that generates this is obfuscated (similar to FairPlay), but the general gist of what it does is reasonably simple. Firstly,
assistantd calls out to the obfuscated
absinthed, a part of the iPhone’s FairPlay subsystem. That then asks
libMobileGestalt for both the
UniqueDeviceID (the same UDID used for provisioning) and
SerialNumber (the device’s serial number), and reads four bytes from a shared memory region. I currently do not know the source of these four byes (although I suspect the FairPlay daemon) or the purpose. This is then AES encrypted and sent back to
assistantd to send to Apple, and (if valid) is exchanged for the session info.
As Apple can simply blacklist any device ID used for mass distribution of Siri, there is no way for a widespread and popular distribution of Siri to piggyback on one valid iPhone 4S identifier. However, a more distributed approach may be possible. A fork of SiriProxy, available here, allows everyone with an iPhone 4S to run their own proxy for their own devices. Or, it may be possible to replace Siri entirely, using something like Google’s speech “API” for speech transcription and logic like Hubot to create something usable for at least simple tasks like dictation.
The above technical info was discovered through a combination of static and dynamic analysis. If anyone would like to see my (not well documented, sadly) .idb or to contribute more to the investigation, just let me know on IRC. Thanks to anyone who helps, in addition to Steven Troughton-Smith and Aman Gupta.