The Web is a tangled mass of interconnected services, where websites import a range of external resources from various third-party domains. However, the latter can further load resources hosted on other domains. For each website, this creates a dependency chain underpinned by a form of implicit trust between the first-party and transitively connected third-parties. The chain can only be loosely controlled as first-party websites often have little, if any, visibility of where these resources are loaded from.
This study (dataset is detailed in our paper ) performs a large-scale study of dependency chains in the Web, to find that around 50% of first-party websites render content that they did not directly load.
Although the majority (84.91%) of websites have short dependency chains (below 3 levels), we find websites with dependency chains exceeding 30.
Using VirusTotal, we show that 1.2% of these third-parties are classified as suspicious --- although seemingly small, this limited set of suspicious third-parties have remarkable reach into the wider ecosystem.
Our paper has to appear in The Web Conference (WWW), May 2019.
A sample of dataset and scripts used in this paper is hosted at on Google Drive.
Muhammad Ikram: Muhammad.Ikram [at] mq.edu.au or engr.ikram [at] gmail.com
This is a collobrative work of: