Twitter Me Xerces! – Kier J. Dugan

Following from the spirit of yesterday’s post, little victories…

Yesterday I managed to download the front page of my website using libcurl. As good as that was as a learning experience, it wasn’t interesting or useful in the slightest. Today, however, I decided to see if I could fetch my status updates from Twitter and display them in a program. So I had a look at the API documentation and it looks quite easy to use, with the exception of OAuth which I’m yet to get my head around. Thankfully, for now, basic authentication is still supported.

The Twitter API uses the REST (REpresentational State Transfer) paradigm which means there’s no concept of a state on the server; i.e., each transaction is considered separately. It also means that it uses HTTP, which is pretty simple to understand. Basically in a REST protocol, the URI’s are objects in the system, and the HTTP verbs are how you interact with them. So a GET on a http://server/article?name=REST resource would download an article named “REST”. Simple eh? Check this article if you’re interested.

Anyway, onto the meat ‘n’ taters. Data in a REST transaction is typically stored as XML or JSON. I considered downloading LibYAML and taking the JSON route but a) I already had Xerces, b) I understand XML more than JSON, and c) I couldn’t be bothered to learn yet another new thing.

Xerces is incredibly well written. If you look at the class listings of Xerces or Xalan you’ll appreciate they’re both enormous and support basically everything. In fact, right out of the box Xerces supports fetching XML documents over the internet using HTTP GET. I chose not to use this purely because I wanted to use libcurl. Thankfully libcurl is surprisingly easy to use:

static size_t _CurlWriteCB (void* ptr, size_t nLen, size_t cbElem,
                            CMemFile* pFile)
{
    size_t cbSizeAtStart;
    size_t cbSizeAtEnd;

    // Write data to file, but measure buffer size before and after.
    cbSizeAtStart = (size_t)pFile->GetLength ();
    pFile->Write (ptr, (UINT)(nLen * cbElem));
    cbSizeAtEnd   = (size_t)pFile->GetLength ();

    // Return the difference in buffer size, i.e. number of bytes written.
    return (cbSizeAtEnd - cbSizeAtStart);
}

BYTE* GetStatusesFromTwitter (char* szUserName, UINT& uiSize)
{
    // Attempt to initialise curl
    CURL* curl = curl_easy_init ();
    if (curl != NULL) {
        // Set up the http target
        CString strFmt;
        strFmt.Format (IDS_TWITTER_STATUS, szUserName);
        curl_easy_setopt (curl, CURLOPT_URL, strFmt.GetString ());

        // Save the result into memory for now.
        CMemFile buffer;
        curl_easy_setopt (curl, CURLOPT_WRITEFUNCTION, _CurlWriteCB);
        curl_easy_setopt (curl, CURLOPT_WRITEDATA,     &buffer);

        // Attempt to grab the data from Twitter.
        CURLcode res = curl_easy_perform (curl);
        curl_easy_cleanup (curl);

        // Return the data.
        if (res == 0) {
            uiSize = (UINT)buffer.GetLength ();
            return buffer.Detach ();
        }
    }

    return NULL;
}

The above listing will download a users tweets and store them in a growable buffer (see CMemFile). But now we have to present this to Xerces in a way that it will understand. Thankfully we can supply an arbitrary InputSource to a DOMParser, including one that will wrap a piece of memory.

bool DoGetStatuses (char* szUserName)
{
    // Query Twitter
    UINT  uiSize;
    BYTE* pbData = GetStatusesFromTwitter (szUserName, uiSize)
    if (pbData == NULL)
        return false;

    // Move the memory into an object Xerces understands.
    MemBufInputSource* pDataSrc = new MemBufInputSource
        (pbData, uiSize, L"TwitterXML", true);

    // Parse the data
    XercesDOMParser parser;
    parser.setValidationScheme (XercesDOMParser::Val_Never);
    parser.setDoNamespaces (false);
    parser.setDoSchema (false);
    parser.setDoValidation (false);
    parser.parse (*(InputSource*)pDataSrc);

    // Get the root node
    DOMDocument* pDoc = parser.getDocument ();

    //...

    // Free memory.
    delete pDataSrc;
    return true;
}

I was especially lazy in that last listing actually, because I told Xerces to adopt my buffer which means it’ll free it for me when it’s finished with it. Curiously though, even a memory object needs a system ID which is the purpose L"TwitterXML" serves. With a DOMDocument in memory it was trivial to add the statuses to a list box.

I was quite surprised at how complex a task I’d achieved given the effort I’d put in; hats off to both Xerces and libcurl. Now that I’d managed to list my tweets, naturally the next step is to try and submit one! So I made a new dialog or the occasion:

Clicking OK causes some magic to happen:

int CPostStatusDlg::DoStatusUpdate ()
{
    static const char* cszUrl =
        "http://api.twitter.com/1/statuses/update.xml";

    // Attempt to initialise cURL.
    CURL* curl = curl_easy_init ();
    if (curl == NULL)
        return 0;

    // Configure the authentication
    CString strFmt;
    strFmt.Format (_T("%s:%s"), m_strUserName, m_strPassword);
    curl_easy_setopt (curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_easy_setopt (curl, CURLOPT_USERPWD,  (char*)strFmt.GetString ());

    // Format the string entire in C form.
    char* szStatus = curl_easy_escape (curl, m_strStatus.GetString (),
        m_strStatus.GetLength ());
    char szPostBody[BUFSIZ];
    sprintf (szPostBody, "status=%s", szStatus);
    curl_free (szStatus);

    // Set up the HTTP connection and use the POST method
    curl_easy_setopt (curl, CURLOPT_POST,           1L);
    curl_easy_setopt (curl, CURLOPT_POSTFIELDSIZE,  strlen (szPostBody));
    curl_easy_setopt (curl, CURLOPT_POSTFIELDS,     szPostBody);

    // Finally, set the callback function and the URL.
    CMemFile buffer;
    curl_easy_setopt (curl, CURLOPT_WRITEFUNCTION, _CurlWriteCB);
    curl_easy_setopt (curl, CURLOPT_WRITEDATA,     &buffer);
    curl_easy_setopt (curl, CURLOPT_URL,           cszUrl);

    // Now we can execute at last!
    int nResponse = 0;
    CURLcode res = curl_easy_perform (curl);
    curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &nResponse);
    curl_easy_cleanup (curl);

    // Check for success
    return nResponse;
}

Boom! Tweet submitted!

In the above listing, the growable buffer and the callback are largely to just eat the output from libcurl because we don’t really care about it. CMemFile will free the memory it allocated when the function returns too, which saves hassle. I originally wrote all the code listings with Unicode in mind which is why they might appear to be a bit odd. libcurl is an ANSI C library so you may need to convert your strings for it to work. Thankfully Xerces includes some basic support because it uses Unicode internally.

Little victory indeed.