Say you work for an online media company that periodically sends email newsletters to subscribers. Your boss asks you to analyze last quarter's web event data in order to optimize user engagement. How do you go about doing this?
Sad truth is, you can't really do what he asks (sorry boss). It's nearly impossible to establish a causal relationships by looking at data retrospectively - there are simply too many external influences that were not controlled for. So while you might like to attribute that upswing in email opens last January to that snazzy new font you chose, you really can't.
So should you just delete three months of GET requests and call it a day. Absolutely not! While you can't determine what exactly contributes to user engagement (or lack thereof) using previous event data, you can develop hypotheses to test more rigorously down the line. When designing experiments, looking back is often the best way to move forwards.
Let's consider a specific example.
Financial News Web Event Analysis
I recently analyzed web event data for a fictitious financial news company to optimize user engagement (you can read the full technical report of this analysis here). The company periodically emailed financial news article links to subscribers, and wanted to maximize their click-through rate (CTR), calculated as:
CTR = Total Number of Unique Clicks / Total Number of Article Links Sent
During the first quarter of 2015, the company emailed 10,738,006 links to 20,000 users. Users clicked on 1,753,249 of these links, yielding a CTR of 16.3%. As shown in the second panel of the below figure, CTR oscillates cyclically between ~15-17%. Could clicks be related to day of week or time of day an article was sent?
No! As shown below, when we split out CTR by day of week or time of day, CTRs remain closely clustered together. If users were more active during specific days and times, we would expect those CTRs to be significantly higher across at least part of the 3-month timeframe.
Let's pivot to explore whether certain types and topics of content are more popular than others. The table below shows CTR split out by article topic (left) and medium (right) for the most commonly-sent article topics and types.
Topic correlates strongly to user engagement, with popular topics such as advertising garnering CTRs around 25-30% while the least popular topics command only 5-10% click-through Also, CTRs for all 8 topics declined by ~2-5% over the three month period. Medium does not appear to affect click-through.
Now that we've completed our descriptive analysis, it's time to look at the patterns we've uncovered and hypothesize what might be driving them.
Here's what I came up with (but your insights may be different):
- CTRs oscillate over time, but do not vary by day of week or time of day. This cycle may indicate the company is sending out their newsletter too frequently, resulting in periods of relatively low user engagement following periods of higher engagement.
- CTRs for the most frequently distributed topics steadily decrease over time. This may indicate users are becoming bored with the most commonly issued content topics.
Now we have a couple hypotheses regarding which factors drive click-through we can experimentally test. For example, we could randomly split the user base into two samples of 10,000 users each, and send one group (the treatment) the newsletter less frequently and the second group (the control) the newsletter with the current standard frequency. Assuming no other changes were made to the newsletter over this period and users were randomly assigned to groups, a positive difference in CTR between the treatment and control groups would indicate decreased newsletter frequency actually increases user engagement. Similarly, the company could also test the effect of newsletter content on engagement by sending newsletters with different topic coverage to different randomly assigned user groups and measuring differences in CTRs between groups.
Retrospective web event data analysis won't provide your organization with the perfect way to increase engagement, retain customers, and increase click-through right off the bat. But it will provide you with a launching pad to develop hypotheses you can rigorously test later on. So before you toss out last year's web data, consider looking back to look forwards.
You can also check out my GitHub repository for this project, including a full technical write-up, here.