Web log analytics

Log analytics is a Python script that lets you import logs from common web servers, like nginx, Apache, IIS and more directly into Piwik PRO. This free software is available under the GPLv3 license and can be found on GitHub and PyPi.

Install the log analytics script

You can install the script using one of the following methods:

pip install piwikpro-log-analytics
curl https://raw.githubusercontent.com/PiwikPRO/log-analytics/master/piwik_pro_log_analytics/import_logs.py > import_logs.py
chmod +x import_logs.py

Set up log import

You need to run the log importer tool with the correct parameters. Some parameters are required, while others are optional.

Sample command:

piwik_pro_log_analytics --url=https://demo.piwik.pro --client-id=*** --client-secret=*** --enable-static --enable-bots --show-progress --idsite=*** --recorders=2 sample.log
./import_logs.py --url=https://demo.piwik.pro --client-id=*** --client-secret=*** --enable-static --enable-bots --show-progress --idsite=*** --recorders=2 sample.log

Parameters

--url= https://demo.piwik.pro

This mandatory parameter specifies the location of your Piwik PRO account.

--client-id=***

This is a part of your API credentials. Where to find it?.

--client-secret=***

This is a part of your API credentials. Where to find it?.

--idsite=***

This is your site or app ID. Where to find it? Example:99e33528-8da4-46d8-be90-a62bfb3a7bba).

There are many other options that can be added to this script, which are described in the following section: Add parameters to log import.

Note: If you plan to import logs regularly, it’s best to set up a scheduled job using a tool like CRON.

Exclude log lines

There are several methods for excluding specific log lines or visitors from tracking:

  • You can exclude specific IP addresses or IP ranges from tracking. To set up excluded IPs, go to Piwik PRO > Administration > Sites & apps > Data collection > Don't collect data from these IP addresses.
  • You can exclude lines from specific IP or IP ranges. To set it up, go to Piwik PRO > Administration > Sites & apps > Data collection > Don't collect data from these IP addresses.
  • You can exclude visitors based on their User-Agent request header by using –useragent-exclude.
  • You can also specify a single hostname for importing logs, which means logs from other hosts will be ignored. Use the –hostname parameter to set this.
  • You can also exclude specific log lines where the URL path matches a given pattern using the –exclude-path option.

Note: If you need to add multiple paths or hostnames, you will need to add these parameters multiple times.

Add parameters to log import

The web log analytics script doesn't track static files, such as JS, CSS, images and the like, and excludes all bot traffic.

Use the following commands to enable tracking of these elements:

  • –enable-bots This allows tracking of search and spam bots in Piwik PRO. Simply add a custom variable with the bot’s name. The User-Agent field is used to identify whether a log line comes from a bot or a real user.
  • –enable-static This enables tracking of all static files, such as images, JS, and CSS, in Piwik PRO.
  • –enable-http-redirects This tracks HTTP redirects as page views, assigning a custom title and variable.
  • –enable-reverse-dns Activates reverse DNS This is used to generate the following report: Analytics > Reports > Location > ISP.

Note: This may lead to a serious drop in performance as reverse DNS is very slow.

  • –recorders=N This sets a specific number of threads. We recommend aligning it with the number of CPU cores in your system.
  • –enable-bulk-tracking This enables bulk tracking mode, where tracking requests are grouped together and sent in a single bulk request.
  • –recorder-max-payload-size=N When the importer uses Piwik PRO’s bulk tracking feature to improve speed, with the –enable-bulk-tracking option, this setting configures the maximum number of tracking requests each bulk request can contain. Adjust the number of page views or log lines to find the optimal performance.

For more information about log import parameters, use the help parameter:

piwik_pro_log_analytics --help
./import_logs.py --help

Import data using both server log analytics and standard JavaScript simultaneously

You can use both the JavaScript tracking client and web server log file analytics at the same time, as long as you record data from each method in separate Piwik PRO site.

To prevent double-counting visits, follow these steps:

  1. Create a new site in Piwik PRO. Example: example.com (log files).
  2. Check the site ID of this new site. The site ID will be used for importing log file data. Where to find it?
  3. In the command line, make sure all requests from log files are recorded under a specific site ID by using the –idsite=X command.

Technical requirements

Here are the technical requirements for running web log analytics:

  • Access to the server or server logs, for example via SSH.
  • Python 3.6 or later. Older versions are not supported. Typically, you'll want to import data directly from the server where it is generated. For this, you'll need to run a Python script on the machine that sends the logs to Piwik PRO.
  • Log analytics script. This Python script sends logs to your Piwik PRO account and is available on GitHub.

Here's a list of supported log formats:

  • All default log formats for Nginx, Apache, IIS and Tomcat.
  • All common log formats like NCSA Common log format, Extended log format, W3C Extended log files or Nginx JSON.
  • Log files of some popular cloud Saas services like Amazon CloudFront logs or Amazon S3 logs.
  • Streaming media server log files such as Icecast.
  • Log files with and without the virtual host will be imported.

Note: This script does not directly support importing logs from aggregation tools like Grafana Loki or ELK. To import logs from these tools, you must first download them to your disk.

Performance considerations and rate limiting

The script requires CPU resources to read and parse log files, but the import speed is often limited by network latency from the Piwik PRO server. To enhance performance, use the –recorders option to specify the number of parallel threads for importing hits into Piwik PRO. By default, one recorder is used, but you can increase this number to improve speed.

If you are a Piwik PRO Core user, ensure you’re not hitting rate limits by using the –sleep-between-requests-ms flag to slow down the import process if needed.