(También puedes leer este artículo en castellano aquí)
Update: Since October 2017, there is a partial solution to this developed and announced by Google. Learn more about it here.
Also, Google announced in November 2018 that they had developed a new technology that allows browsers to show the original URL when serving AMP pages from Google’s cache, called «signed exchanges». This is not live yet but would potentially solve the problems detailed in this post
If you have jumped in the AMP wagon and took a look at the metrics, you may be asking yourself why your AMP pages have a higher than normal bounce rate.
Take a look at this example, comparing SEO traffic to blog posts, both AMP and non-AMP (it is in both cases organic traffic, this website does not appear in the «Top News» carrousel). As you can see, the behaviour metrics are much worse in the AMP pages:
Weird, isn’t it? We could try and say that a possible explanation for this is that non-AMP traffic gets more recurring visitors, so the quality of their visits is better. But that’s not the case:
What’s happening is waaaay simpler (and complex, at the same time). Google Analytics (or the analytics tool you are using) is giving you incorrect data: inflation of unique visitors and sessions, higher than normal bounce rates, increase in referral traffic…
AMP carries with it an ecosystem where you are no longer the one serving the content, with a lot of implications in terms of measuring it. In this post, I explain you, with the highest level of detail, what the problem is, what is it caused by and how can it be solved.
Tabla de contenidos
- 1 How AMP works, inside Google
- 2 How AMP works, regarding Analytics?
- 3 Why AMP analytics data lies?
- 4 How to avoid Analytics duplicating users with AMP?
- 5 What happens with historical data?
- 6 How to (partially) solve the problem with the solution provided by Google
- 7 TL/DR
How AMP works, inside Google
What is AMP?
Before going deeper, let’s understand what AMP is all about.
AMP (Accelerated Mobile Pages) is an open-source project, led by Google but supported by other partners (Pinterest, Twitter, Linkedin…) that is trying to define a standard to create faster pages for mobile devices.
It was, in part, an answer to Facebook’s Instant Articles (which is not open-source) and both of the projects have the same goal: when a user clicks on a link, inside their platforms, the content is loaded instantly
As, even with the resources, companies (in general) haven’t really tried to create faster mobile web pages, and the main cause of the slow load speeds are ads and the use of JS, Google (and Facebook) have decided to define new standards with lots of restrictions so, whatever you do, pages will load really fast.
In both cases, besides limiting the html tags and elements that you can include (JS, CSS, ads…), both platforms save a copy of your AMP/Instant Article content so they can serve it directly from their servers.
How are AMP pages served, after performing a search on Google?
In Google’s specific case, when a user clicks on any of the AMP results (those with the icon and the AMP label), the content is loaded inmediatly, without leaving Google’s SERP.
What Google does, is:
- Once the crawler detects in the «common» URL the amphtml tag, it crawls the AMP URL and validate if it complies with the AMP guidelines.
- If that’s the case, then the AMP page is cached and asigned to the «common» URL.
- Besides that, as the AMP URL has (or should has) the canonical tag pointing to the «common» URL, it shouldn’t be indexed separately.
When a user performs a search from a mobile device, if a result doesn’t have AMP version, clicking the result will open the «common» URL in the browser. The domain server is contacted and the client’s browser receives the page (this is the common behaviour for any web page).
If a result has indeed an AMP version, Google highlights that in the result (with the icon and AMP label). When the user clicks on the result, instead of opening the URL in the browser, this is what happens:
1. The SERP’s html already contains a div element that will display the content of any AMP result.
2. After clicking on an AMP result, several elements are loaded inside that div. One of them is an iFrame. This entire layer will load the content directly inside it, without leaving the SERP.
3. That iFrame points to the specific Google’s AMP cache URL for that page (e.g., http://cdn.ampproject.org/v/s/randomtrip.es/tarjeta-sim-portugal-internet-smartphone/amp/?amp_js_v=6 ). Same happens with the elements of the «Top News» carrousel
4. If the user, after opening an AMP result, closes the article using the X up left, the layer is closed and he/she can continue navigating through the results.
5. In the specific case of the «Top News» carrousel, if the user use the swipe function, the next article is loaded (next li with the corresponding iFrame)
6. If the user clicks on any link inside the content of an AMP result, he/she leaves this environment and the «common» URL is loaded inside the browser, as always (the non-AMP URLs are the ones loaded, even if the link has a corresponding AMP version)
How AMP works, regarding Analytics?
Now that we understand how Google serves the AMP pages inside their SERPs, there are two main consequences that we need to explore, in terms of Analytics:
- AMP pages are served by Google, not by our server. Our server doesn’t even know that a page has been served to a user.
- The domain that serves the AMP page is cdn.ampproject.org, whose content is loaded inside an iFrame, in Google’s corresponding domain (e.g., www.google.es)
AMP pages are served by Google, not by our server
If we perform a log analysis filtering AMP URLs in our server, we’ll probably see that the traffic pattern has nothing to do to what Google Analytics shows (or your Analytics tool). In fact, most of the hits will be from Googlebot.
The only ways (right now) to reach an AMP page are (unless you don’t have a responsive / mobile version and you are sending all mobile traffic to your AMP pages):
- Through Google (or other 3rd parties adopting the technology, like LinkedIn, Pinterest…)
- Accessing directly to an AMP URL in your broswer.
- Through links that shouldn’t be out there, pointing to AMP pages (email, social networks, 3rd party websites…)
The first case is the most common (at least by now). In that case, the visit never reachs our server. Google serves the page directly, along with the images and other resources, through cdn.ampproject.org and its static domains:
AMP pages are served from a different domain
As Google is the one serving the pages, from his servers, the domain serving the pages is not our original domain, but one he decides. Right now Google is using cdn.ampproject.org (sometimes also yourwebsite.cdn.ampproject.org), using an iFrame loaded directly inside the SERPs.
This has a lot of implications regarding analytics, as the analytics code is being executed in a different domain and so, it can’t read the cookies of the original domain.
Some issues that happen regarding analytics, creating unrealiable data:
- A user that visits one of our AMP pages through Google, and a non-AMP page through any method, is counted as two visitors.
- A visit that starts in an AMP page through Google and continues with an internal link to a non-AMP page, is counted as two visits.
- The bounce rate (measured by default) of an AMP page is 100% unless there are several AMP results from our domain in the SERP and the user visit several of them in the same session.
- A visit coming from an AMP page through Google to our website (through internal links) is considered «Referral traffic» from cdn.ampproject.org
Below, all of these cases are reproduced and investigated step by step.
Why AMP analytics data lies?
The term «unique visitor» is getting more and more false day by day. Users using different browsers, different devices and private/incognito options (or manually deleting the cookies) makes identifying a real user through a cookie impossible.
This means that, even without AMP, we are suffering «duplication» of unique visitors constantly
So what happens with AMP? There are different cases, depending on where the content is loaded, and, in some cases, the order of visiting each of the scenarios (all of the cases explained here refer to the use of a mobile device).
1. User access our website through his mobile browser. This is the most typical use-case for accesing our website: someone wants to check our website, and types our URL directly in the browser (or through favorites, or through Google Search).
To decide if a user is new or not, Google Analytics checks the «_ga» cookie. If it exists, the JS obtains the «clientID» value from it and send it to GA servers. If not, it creates the cookie along a new «clientID» in order to identify the user in the future interactions.
In this case, while doing the test, we are using a new «user» in Chrome so we don’t have any cookie set yet, and so a new one is created.
Our clientID for this domain:
2. User access directly to an AMP page: this case, generally, will not be common (unless we share/link our AMP URLs, or implement our mobile website directly with AMP).
We are still in our domain (randomtrip.es), and so, the GA code should be able to access the previous cookie (_ga) and know that we are still the same user, right?
Well, and here «the party starts», that’s not the case:
What happens here is that Google Analytics’ AMP JS uses a cookie indeed, but the one it uses (AMP_ECID_GOOGLE) is different than the cookie GA uses in non-AMP pages (_ga).
So, clientID is different in this case, and this two sequential interactions would result in two unique visitors with two individual sessions.
3. User performs a search on Google and clicks an AMP result: This is the most common use-case for AMP
Content is loaded through Google’s AMP Cache, inside Google’s domain (let’s say google.es) through an iFrame that points to cdn.ampproject.org, and the user identification information (the clientID) is saved in the LocalStorage for the google.es domain, instead of in a cookie (it doesn’t use _ga cookie, nor the _AMP_ECID_GOOGLE cookie). The value stored there is unique and, of course, different from the previous values in the cases 1 and 2.
Funny, uh? :D (notice the sarcasm here). So that would be our third interaction with this website, andthis is our third clientID:
So, we (I) are already three unique visitors, with three different sessions!
4. User access directly the Google AMP’s cache URL: this case is starting to be a common one for third party apps adopting AMP (LinkedIn, Pinterest…), or could happen in the strange case a user access directly one of those URLs through any other method. The URL is like this: http://cdn.ampproject.org/v/s/randomtrip.es/barco-flores-lombok-komodo/amp/?amp_js_v=6
In this case, the method used by GA is the same than in the previous case (the LocalStorage, not the cookie) but it uses instead the LocalStorage for the subdomain cdn.ampproject.org, so the value stored for the clientID is, again, a new one, different from all the other cases.
We have our forth clientID, for the same website, using the same browser and without deleting cookies.
5. User access directly to the URL Google shows when clicking an AMP result: this case is common as it happens when a user gets to an AMP page through Google and, when deciding to share it (by email, social networks, etc.) copies the visible URL in their browser or using the native «sent to» option.
Google is trying to avoid this behaviour, but IMO in a ineffective way. The visible URL is still their own ;)
The URL would be like http://www.google.es/amp/s/randomtrip.es/barco-flores-lombok-komodo/amp/ (you can’t access this kind of URL from desktop – you will be redirected to the original one. You can access it from a mobile device).
In this case, we are in the google.es domain, and the clientID value is saved in the LocalStorage.
Now something interesting happens.
If different values for the clientID exist in both the LocalStorage of google.es (or the corresponding Google domain) and cdn.ampproject.org, the second one is the one sent. So, once we access cdn.ampproject.org, cases 3, 4 and this one (5) should send the same clientID value (not bad :D)
In this case, this is the clientID sent to GA servers (same as in case 4):
So: the clientID generated in the case 3, will never be used again, because now the one generated in the case 4 will be the chosen one for both cases. Depending in the order of interaction of the user with our website, both will be generated or not, and we will have more or less duplication of «unique visitors» and visits.
6. User access AMP version through a third party app: There is one last case, that is starting to appear now, which is third party apps.
LinkedIn, Pinterest, Flipboard… are already including AMP version on the links shared by anyone inside their platform instead of the «common» ones, directing the traffic directly to the Google AMP’s cache URL (cdn.ampproject.org or yourwebsite.ampproject.org), which would be same as case 4.
I couldn’t try yet, but as these apps open the link in a html viewer inside the apps themselves, it is possible that each one of them could generate a new clientID. Anyway, some of them use the Chrome engine, so this may not be the case. If you happen to test it, please say so in the comments so we can complete the information for this case! :)
So, we have 6+ scenarios, 4 of which (at least) generate a different clientID, which means that what before AMP was (ideally) one unique visitor, now can be counted as up to 4 unique visitors:
All of this is also explained here by Google itself, and confirmed by the creator and tech lead of AMP as a known issue
As you can imagine, the consequences of this «errors» regarding analytics are that lots of metrics inaccurate, so we need to be careful when we analyze data if AMP traffic is involved.
- Unique visitors: as we have just checked, a user accessing from the same device, using the same browser and withouth deleting cookies, can be counted as up to 4 unique visitors depending on how he/she access our AMP pages.
- Sessions: if a user visits an AMP page, in any of the environments and leaves, there is no problem in terms of sessions. But, if he continue navigating our website, from the AMP page to any other non-AMP page on our website, apart from being counted as a new Unique Visitor, a new session is started, so it is counted as two sessions.
- Bounce rate: due to the previous reasons, when a session is divided in two, the first session would have only one pageview (so it will be counted as a bounce). The second session, if it only checks one page, would also be considered a bounce. So, the bounce rate will be increased incorrectly (the complete session should not be considered a bounce, but it ends being two bounce sessions). And this happens both for AMP and non-AMP pages.
- Pageviews per session: also in consequence of the previous reasons, instead of having 2 pageviews/session, we would have 1 pageview/session.
- Traffic sources: traffic coming from our AMP pages to our non-AMP pages from 3rd parties, will be counted as entry traffic (new sessions) and its source would be «Referral Traffic», from cdn.ampproject.org (or yourwebsite.cdn.ampproject.org)
These are the most common metrics directly affected. As you can imagine, as these ones are affected, so are most of the rest, as most of them depend on these ones to be calculated.
For example: in an e-commerce, if a user enters our website through an AMP page in Google and ends buying something in the same session, the sale would be attributed to the «second user», and the source of that visit would be «Referral Traffic», instead of «Search Engines».
How to avoid Analytics duplicating users with AMP?
Luckily, a solution exists, to avoid clientIDs to be different in any of these cases. The solution is technically complex, but feasible. It is only valid for the data collected after its implementation, not for the already collected data.
It’s unbelievable that Google hasn’t done anything yet to fix this situation (neither they did with the Spam problem in Google Analytics), when AMP has been live on the SERPs for year.
The solution, which I haven’t tested yet, can be found in this post from Simo Ahava:
Basically it consists on creating a proxy in our own domain (we need http, at least in the proxy URL)
In the amp-analytics component configuration, you can define that before sending the data to Google Analytics servers, more configuration information should be obtained from an external URL.
Contrary to most of the AMP resources (which Google cache on their servers and serve them directly), this petition to this URL is not cached, and it is performed every time the page finishes loading, from the client, while the page keeps the focus.
As the petition is to a URL from our own domain, cookies from our domain are sent, so the proxy just checks if the _ga cookie exists, and if that’s the case, gets the clientID value from it to return it to the client. The client receives the «correct» clientID and sends the information to GA servers.
Besides that, the _ga cookie must be updated the same way Google Analytics normally does.
- Not a trivial implementation.
- You need http in your server (at least for the proxy URL)
- It’s not an official solution, so this can lead to problems in the future.
- If you implement it wrong, the results can be worse
- Doesn’t work on Safari, when accessing the content through Google’s AMP cache.
UPDATE (02/03/2017): Malte Ubl, creator and tech lead of the AMP Project, just said on Twitter that Simo’s solution is indeed a good one, with the caveat that won’t work when using Safari and accessing the content through Google’s AMP cache (as Safari blocks 3rd party cookies)
What happens with historical data?
If you manage to implement the previous solution, it will just fix the problem after the implementation is done. The data you already have in your Google Analytics would still be wrong.
What can you do? Not a lot, but if you really need to try and fix the data (at least the basic metrics), you can try with the following, finding an aproximation of how the data should be:
- As sessions who navigate inside an AMP page end on our website, we have them controlled: they all are under the «Referral Traffic» section, from cdn.ampproject.org.
- If you want to correct the bounce rate, you can get from one side the AMP pages visits, on the other side the visits from cdn.ampproject.org and try co re-calculate the value. E.g., if you have 1000 visits to AMP pages, with a 89% bounce rate, and then 200 visits from cdn.ampproject.org, you would have:
- 110 visits (11%) were not bounces, as they visited several AMP pages in the same session.
- 200 visits have been considered bounces, but they are not, as they come from your AMP pages.
- So, you have 310 visits which were not bounces.
- Aprox. of real bounce rate, corrected: 690 bounces (1000-310) out of 1000 (69% instead of 89%)
Based on this, you can try to do the same with other metrics.
How to (partially) solve the problem with the solution provided by Google
Google developed a partial solution to this problem available to all google domains since October 2017. You can read about it here and you can also check my talk about it at Searchmetrics Summit 2017:
It’s important to note that this solution is not included «by default», so if you are using AMP in any of your websites, you need to to the following in order to get this solution working:
1. On all your non-AMP pages, set the value of the parameter ‘useAMPclientID’ to «true»
2. On all your AMP pages, include this tag on the head:
<meta name="amp-google-client-id-api" content="googleanalytics">
3. Use referral exclusion on GA: you need to set up referral exclusion with this domain cdn.ampproject.org
More info about how to implement and debug the solution: https://support.google.com/analytics/answer/7486764?hl=en
Appart from this, Google announced that, using a new technology called Signed Exchanges, they soon will be able to serve AMP pages from Google’s AMP cache showing your original URL, which would potentially solve the problems stated in this post.
We’ve just checked that the same «unique visitor» ends up being counted as up to 4 unique visitors due to how AMP works, generating wrong metrics in our Analytics tool.
Worst thing is that maybe you didn’t even realize, and you may have been worried about the high bounce rate on your AMP pages.
By now, unless you manage to try the solution indicated in this post, there is not much else to do. But it’s important to take this into account before jumping into conclusions with our data, when AMP traffic is involved
Ideally, Google will provide an official solution. By now, there is a task in AMP’s Github, but it’s been there for some months without any specific plans of being implemented. Besides that, if AMP adoption by 3rd parties continue growing, the number of unique visitors generated by the same device can grow even more.
- Accelerated Mobile Pages via Google Tag Manager
- What’s Going On With AMP Analytics?
- How To Set Up Adjusted Bounce Rate on Accelerated Mobile Pages (AMP) Using Google Tag Manager
Hope the post has been useful. If that’s the case, share it! :)