Information technology architecture and network architecture
Screenshot One uses a rather classical set-up, as depicted in Figure 2: An internet-facing router with in-software firewall, connected to a Local Area Network (LAN) switch on which the physical server is connected.
This server is running Windows 10¹, on top of which the MariaDB and Internet Information Service (IIS) services are run. The latter hosts the Web Application, Certificate, and API Web Services. The former is needed to provide database services.
The firewall is configured to only allow inbound traffic on port 80 for traffic over Hypertext Transfer Protocol (HTTP) and port 443 for traffic over Secure HTTP (HTTPS). Native Screenshot One clients by default connect over HTTPS, which is a best-practice regarding security and privacy.
HTTPS relies on the Certificate Service to enable for traffic encryption using “Advanced Encryption Standard” with a 256-bit key-length (AES-256).
The Web Application serves content displayed when visiting the website located at www.screenshot.one. The API Web Service provides answers to calls from the online, Windows Mobile and Android clients. Both are thus interfacing the internet, not so the database and certificate services.
A MariaDB-based database consists of two light-weight tables. One table is keeping record of requests made by clients, the other is to keep track of websites already rendered as to enable caching functionality.
To ensure data type integrity, all columns have defined data types, and set maximum length constraints. In the interest of continuity, a requests-table keeps track of when and how many requests which client made. This is to enforce a request cap in the rare case a single requester performs a disproportionate number of requests in a defined timespan – e.g. during a Distributed Denial of Service (DDOS) attack.
This rate limitation, applied to the number of generated screenshots allowed per half an hour, is currently set to thirty per user agent per location. The allowed amount may go up in the future as per resources available to the Screenshot One API.
For caching purposes, records are being kept of when which “Uniform Resource Locator” (URL) was requested in which height, width, and file format, and where the respective locally (on-server) cached files are to be found.
Hence, whenever two requests are made within a given timespan for a resource using the same parameters, the API will skip the rendering process and instead immediately return the cached screenshot image, saving time and resources.
Currently, this timespan in set to 15 minutes. This principle is often used, e.g. by Google in their “PageSpeed Insights”² tool.
Because the whole Screenshot One project is modelled based on the three-tier architecture (see Figure 4), only the data-layer of the API has direct access to the database. Additionally, the database user employed by the API has only the rights to execute the stored procedures and cannot perform select, insert, drop, or other SQL operations. This is to protect the database from eventual SQL injection attacks, who cannot succeed without some of the more elevated access rights.
The project is built using a variety of software technologies mainly based on .NET components such as ASP.NET WebForms, Windows Communication Foundation (WCF), ADO.NET, Language Integrated Query (LINQ), Entity Framework, Console Applications, and the Universal Windows Platform (UWP), next to other platforms as Windows, Android and IIS.
PhpMyAdmin on Apache was used to set-up the MariaDB database using Structured Query Language (SQL). Java and XML were used under Android Studio to develop the Android app version.
Client application architecture: Windows Mobile, online and Android apps
As mentioned before, each client application is built using three-tier architecture and thus consists of a presentation, logical and data layer. In the Android, Online and Windows Mobile apps, the data layer makes a request for screenshot data to the remote Screenshot One API using a set of parameters acquired by the logical layer. The logical layer obtained this information by cleaning user input it acquired through the user interface (presentation layer). This “cleaning” consists of accuracy checks, relevance, data validation and so on.
Prior to the actual call to the Screenshot One API, a check for fitness of purpose is performed. This check consists of a request to Google’s SafeBrowsing APIs, in which information concerning malware, unwanted software, potentially harmful applications, and social engineering is requested as to assess whether the application should proceed and request the Screenshot One API for a render of the respective website.
In case a website appears to be flagged by Google’s SafeBrowing APIs, the Screenshot One application will visually notify the end-user of the app’s refusal to render the screenshot. This so to shield the Screenshot One API from potentially harmful or unnecessary requests. Communication to the SafeBrowsing APIs is initiated using a HTTPS POST request. Both request and response are formatted as JSON.
After a Screenshot One app client validates the user’s input in its logical layer, a request is sent to the Screenshot One API (server) by the data layer of the client. The request uses the HTTPS GET method in which it sends URL encoded parameters such as width, height, IP address, user agent hash, URL, desired render type, etc.
The clients are programmed to do this in a multi-threaded fashion so that any processing within the logical and data layers does not block the presentation layer’s thread. In other words, none of the client applications ‘hang’ while requesting and receiving screenshots.
Server application architecture: Screenshot One API
Upon receiving a client’s request, the API’s logical layer will again query the SafeBrowsing API’s to ensure the website is, within reason, assumed to be safe. After all, the request passed through a public medium, the internet, and therefore could have been altered, e.g. by a man-in-the-middle attack (MiM). However, this is very unlikely because of communication going over HTTPS.
Because of this, in its logical layer, the API performs close to all the same data integrity checks as did the client – again. If all is validated, its data layer then will query the database to see if an existing render should be sent back, a new render should be made, if the requester has exceeded the maximum number of requests, etc.
The data layer only feeds requested information from database back to the logical layer, where then will be judged whether rendering can proceed or not. If a new render is needed, the data layer commands the Screenshot One API Renderer – which again consists of 3 layers – to render the screenshot.
The renderer performs Component Object Model (COM) calls to interfaces from the Trident rendering engine inside the Microsoft Windows operating system to generate the screenshot.
Dummy images are returned e.g. case of internal server errors or whenever the Google SafeBrowsing API’s indicated the URL is not deemed ‘safe’ in one sense or another.
After rendering, the API presentation layer responds back to the requesting client using an XML formatted data stream containing the requested image in Base64 form, as to ensure no data was lost during transmission.
The data layer of the Client then decodes this data stream, and saves it to the requested file format in local storage using a user-provided file name reference. Then, the end-user is notified and given further options as to what to do next, e.g. open the image for viewing.
Dummy images, if any, will be recognized by the requesting client and translated by the logical layer into an error message that informs the end-user via the presentation layer that certain content was not rendered, the rate limit was reached, or that an internal error occurred.
About Screenshot One
The purpose of the Screenshot One line of applications is to create full-length, full-page screenshots of webpages that by far surpass the typical screen dimension and resolutions of today’s internet-facing devices.
In short, its goal is to make giant “photos” of websites and webpages that go beyond the normal capabilities and physical boundaries of an end-user’s mobile device. Height and width of web captures are highly customizable. Furthermore, Screenshot One allows the end-user to watermark those captures with a “Portable Network Graphics” (PNG) or “Graphics Interchange Format” (GIF) file from almost anywhere on the web.
Screenshots can be rendered in PNG, GIF, “Joint Photographic Experts Group” (JPEG) and “Tagged Image File Format” (TIFF) output formats. Screenshot One stores these renders in the phone’s picture gallery using a file name reference of the user’s choice.
Image post-processing, and other tasks, are facilitated by Screenshot One offering to open the user’s preferred app for sharing or editing the image afterwards. The app needs a working internet connection because it needs to connect to the screenshot.one Application programming interface (API) for rendering.
1 The reason for Windows 10 being used as server operating system of choice on Screenshot One’s physical server is that at time of development Windows Server 2016 was not yet released.
2 This tool can be found at https://developers.google.com/speed/pagespeed/insights/