Skip to content →

Jesper Jarlskov's blog Posts

February-2017.php

February is coming to a close, and it’s time for a monthly round-up.

Even though February is the shortest month of the year I doubt it will be the least eventful. Especially the security scene has been on fire this month, and I doubt we’ve seen the final debris of this.

The month gave us two major security related findings from Google.

First, they announced the first practical way to create SHA-1 hash collisions, putting the final nail in the coffin for SHA-1 usage in any security relations.

Later in the month, Google’s security research team, Project Zero, announced how Cloudflare’s reverse proxies would, in certain cases return private data from memory, a bug which came to be known as Cloudbleed. The Google researchers worked with Cloudfare to stop the leak, but according to Cloudfare’s incident report, the issue had been open for a while.

On a slightly different note. Laravel is popular PHP framework. Articles online about the framework seems to be about equal amounts of hype, and belittlement. Earlier this month a critical analysis of Laravel were going its rounds in the Twittersphere. I believe it provides a nice description of the pros and cons of Laravel, without falling for neither the hype nor the hatred that is often displayed in framework discussions in general, and Laravel discussions in particular.

As a lead developer, I spend a lot of time thinking about and making decisions on software architecture. So it’s always nice with some inspiration and new ideas. Even though it’s a rather old article by now, I believe Uncle Bob has some nice points when discussion Screaming Architecture, when he points out that the architecture of a piece of software should make it obvious what the software does, rather than which framework it’s built upon.

Developers seem to find incredible performance gains when upgrading to PHP 7, all from Tumblr reporting more than 50% performance improvement to Badoo saving one million dollars per year in saved hosting and server costs. For the nerds out there, PHP core contributor Julien Pauli did a deep dive into the technical side of PHP 7’s performance improvement.

On the topic of performance, I found Sitespeed.io, a collection of open source performance testing/monitoring tools, that I’d like to look more into.

Want to know more about what’s going on in the PHP community? Here is a nice curated list of PHP podcasts.

Leave a Comment

Laravel file logging based on severity

Per default anything logged in a Laravel application will be logged to the file: storage/logs/laravel.log, this is fine for getting started, but as your application grows you might want something that’s a bit easier to work with.

Laravel comes with a couple of different loggers:

single
Everything it logged to the same file.
daily
Everything is logged to the same file, but the file is rotated on a daily basis so you have a fresh file every day
syslog
Log to syslog, to allow the OS to handle the logging
errorlog
Pass error handling to the web server

Having different handlers included provides some flexibility, but sometimes you need something more specific to your situation. Luckily Laravel has outsourced it’s logging needs to Monolog, which provides all the flexibility you could want.

In our case, logging to files was enough, but having all of our log entries in one file made it impossible to find anything. Monolog implements the Psr-3 specification which defines 8 severity levels, so we decided to split our logging into one dedicated file per severity level.

To add your own Monolog configuration, you just call Illuminate\Foundation\Application::configureMonologUsing() from your bootstrap/app.php.

To add a file destination for each severity level, we simply loop through all available severity levels and assign a StreamHandler to each level.

Simple as that.

Later we decided to also send all errors should also be sent to a Slack channel, so we could keep an eye on it, and react quickly if required. Luckily Monolog ships with a SlackHandler, so this is easy as well.

So with just a few registered handlers, we’ve tailored our error logging to a level that we feel suits our needs at the current time.

Notice how we only log to Slack when we’re not in debug mode, to filter out any errors thrown while developing.

Leave a Comment

HTTPS and HTTP/2 with letsencrypt on Debian and nginx

Introduction

Most web servers today run HTTP/1.1, but HTTP2 support is growing. I’ve written more in-depth about the advantages of HTTP/2. Most browsers which support HTTP2 only supports it over encrypted HTTPS connections. In this article, I’ll go through setting up NGINX to serve pages over HTTPS and HTTP/2. I’ll also talk a bit about tightening your HTTPS setup, to prevent common exploits.

Letsencrypt

HTTPS security utilises public-key cryptography to provide end-to-end encryption and authentication to connections. The certificates are signed by a certificate authority, which can then verify that the certificate holder is who he claims to be.

Letsencrypt is a free and automated certificate authority who provides free certificate signing, which has historically been a costly affair.

Signing and renewing certificates from Letsencrypt is done using their certbot tool, this tool is available in most package managers which mean it’s easy to install. On Debian Jessie, just install the certbot like anything else from apt:

Using certbot you can start generating new signed certificates for the domains of your hosted websites:

For example

Letsencrypt certificates are valid for 90 days, after which they must be renewed by running

To automate this process, you can add this to your crontab, and make it run daily or weekly or whatever you prefer. In my setup it runs twice daily:

Note that Letsencrypt currently doesn’t support wildcard certificates, so if you’re serving your website from both example.com and www.example.com (you probably shouldn’t), you need to generate a certificate for each.

NGINX

NGINX is a free open source asynchronous HTTP server and reverse proxy. It’s pretty easy to get up and running with Letsencrypt and HTTP/2, and it will be the focus of this guide.

HTTPS

To have NGINX serve encrypted date using our previously created Letsencrypt certificate we have to ask it to listen for HTTPS connections on port 443, and tell it where to find our certificates.

Open your site config, usually found in /etc/nginx/sites-available/<site>, here you’ll probably see that NGINX is currently listening for HTTP-connections on port 80:

So to start listening on port 443 as well, we just add another line:

The domain at the end would, of course, be the domain that your server is hosting.

Notice that we’ve added the ssl statement in there as well.

Next, we need to tell NGINX where to look for our new certificates, so it knows how to encrypt the data, we do this by adding

With the default Letsencrypt settings this would translate into something like:

And that’s really all there is to it.

HTTP/2

Enabling HTTP/2 support in NGINX is even easier, just add an http2 statement to each listen line in your site config. So:

Turns into:

Now test out your new, slightly more secure, setup:

If all tests pass, restart NGINX to publish your changes.

Now your website should also be available using the https:// scheme… unless the port is blocked in your firewall (that could happen to anybody). If your browser supports HTTP2, your website should also be served over this new protocol.

Improving HTTPS security

With the current setup, we’re running over HTTPS, which is a good start, but a lot of exploits have been discovered in various parts of the implementation, so there are a few more things we can do to harden the security. We do this by adding some additional settings to our NGINX site config.

Firstly, old versions of TLS is insecure, so we should force the server to not revert:

If you need to support older browsers like IE10, you need to turn on older versions of TLS;

This in effect only turns off SSL encryption, it’s not optimal, but sometimes you need to strike a balance, and it’s better than the NGINX default settings. You can see which browsers supports which TLS versions on caniuse.

When establishing a connection over HTTPS the server and the client negotiates which encryption cypher to use. This has been exploited in some cases, like in the BEAST exploit. To lessen the risks, we disable certain old insecure ciphers:

Normally HTTPS certificates are verified by the client contacting the certificate authority. We can turn this up a bit by having the server download the authority’s response, and supply it to the client together with the certificate. This saves the client the roundtrip to the certificate authority, speeding up the process. This is called OCSP stapling and is easily enabled in NGINX:

Enabling HTTPS is all well and good, but if a man-in-the-middle (MitM) attack actually occurs, the perpetrator can decrypt the connection from the server, and relay it to the client over an unencrypted connection. This can’t really be prevented, but it is possible to instruct the client that it should only accept encrypted connections from the domain; this will mitigate the problem anytime the clients visits the domain after the first time. This is called Strict Transport Security:

BE CAREFUL! with adding this last setting, since it will prevent clients from connecting to your site for one full year if you decide to turn off HTTPS or have an error in your setup that causes HTTPS to fail.

Again, we test our setup:

If all tests pass, restart NGINX to publish your changes (again, consider if you’re actually ready to enable Strict Transport Security).

Now, to test the setup. First we run an SSL test, to test the security of our protocol and cipher choices, the changes mentioned here improved this domain to an A+ grade.

We can also check our HTTPS header security. Since I still havn’t set up Content Security Policies and HTTP Public Key Pinning I’m sadly stuck down on a B grade, leaving room for improvement.

Further reading

Leave a Comment

January-2017.php

As a new feature this year, I’d like to start doing monthly summaries of interesting articles I read throughout the month, that I feel is worth returning back to. The lists will include a bunch of different articles revolving around web development, but will probably mainly focus on PHP. It might also show that I work with Laravel in my day job.

I will include anything that pops up on my radar during the month, even though some of it might be old news if I feel that it’s still relevant.

I believe one of the best ways to learn is to look at how other does the job you’re trying to become better at. I’ve had a hard time finding complete Open Source Laravel projects, but OpenLaravel showcases just that.

Besides diving into full project code bases, reading other peoples’ experiences with diving into specific areas of a code base can also be highly valuable; that’s why I’ve written about diving into Laravel Eloquent’s getter and setter magic. In the same sense, I found it interesting to read a breakdown of hacking a Concrete5 plugin.

Since we’re already on the topic of security, the people from Paragon IE wrote a PHP security pocket guide.

I like to consider myself a pragmatic programmer, who focuses on getting the right things to work in a proper way, which means not necessarily trying to solve every imaginable scenario that might be a problem someday. In other words, I try to practice YAGNI.

Shawn McCool has gathered a tonne of interesting information regarding the Active Records design pattern (the base for Laravel’s Eloquent ORM) in his giant all things Active Record article.

The last link for this month is a short story about why a bug fix is called a patch.

Leave a Comment

Exporting from MySQL to CSV file

We often have to export data from MySQL to other applications; this could be to further analyse the data, gather user emails for a newsletter or similar. Usually these applications, CSV is probably the most common format for data exporting like this.

Luckily SQL select output, with its rows and columns, if well suited for CSV output.

You can easily export straight from the MySQL client to a CSV file, by appending the expected CSV format to the end of the query:

For example:

The last three lines can be customised based on your needs.

NOTE: The default MySQL settings only allows writing files to the /var/lib/mysql-files/ directory.

If you do not have permissions to write files from the MySQL client, the same can be accomplished from the commandline:

For example:

The regex is, of course, courtesy stackoverflow.

Regex Explanation:

s/// means substitute what’s between the first // with what’s between the second //
the “g” at the end is a modifier that means “all instance, not just first”
^ (in this context) means beginning of line
$ (in this context) means end of line
So, putting it all together:

s/’/\’/ replace ‘ with \’
s/\t/\”,\”/g replace all \t (tab) with “,”
s/^/\”/ at the beginning of the line place a ”
s/$/\”/ at the end of the line place a ”
s/\n//g replace all \n (newline) with nothing

Leave a Comment

Lazy loading, eager loading and the n+1 problem

In this article I’ll look into a few data loading strategies, namely:

I’ll talk a bit about the pros and cons of each strategy, and common issues developers might run into. Especially the n+1 problem, and how to mitigate it.

The examples in the post are written using PHP and Laravel syntax, but the idea is the same for any language.

Lazy Loading

The first loading strategy is probably the most basic one. Lazy loading means postponing loading data until the time where you actually need it.

Lazy loading has the advantage that you will only load the data you actually need.

The n+1 problem

The problem is with lazy loading is what is known as the n+1 problem. Because the data is loaded for each user independently, n+1 database queries is required, where n is the number of users. If you only have a few users this will likely not be a problem, but this means that performance will degrade quickly when the number of users grows.

Eager Loading

In the eager loading strategy, data loading is moved to an earlier part of the code.

This will solve the n+1 problem. Since all data is fetched using a single query, the number of queries is independent of the number of items fetched.

One problem with eager loading is that you might end up loading more data than you actually need.

Often data is fetched in a controller and used in a view, in this case, the two will become very tightly coupled, and every time the data requirements of the view changes, the controller needs to change as well requiring maintenance in several different places.

Lazy-eager loading

The lazy-eager loading combines both of the strategies above; loading of data is postponed until it is required, but it is still being prepared beforehand. Let’s see an example.

As usual, a simple example like this only shows part of the story, but the $users list would usually be loaded in a controller, away from the iterator.

In this case, we’re keeping related code close together which means you’ll likely have fewer places to go to handle maintenance, but since the data is often required in templates we might be introducing logic into our templates giving the template more responsibilities. This can both make maintenance and performance testing harder.

Which strategy to choose?

This article has introduced three different data loading strategies, namely; lazy loading, eager loading and lazy-eager loading.

I’ve tried to outline som pros and cons of each strategy, as well as some of the issues related to each of them.

Which strategy to choose depends on the problem you’re trying to solve, since each strategy might make sense in their own cases.

As with most other problems I’d usually start by implementing whichever strategy is simplest and fastest to implement, to create a PoC of a solution to my problem. Afterwards I’d go through a range of performance tests, to see where the bottlenecks in my solution appears, and then look into which alternate strategy will solve that bottleneck.

Leave a Comment

HTTP/2 – What and why?

What is HTTP?

HTTP, HyperText Transfer Protocol, is an application layer network protocol for distributing hypermedia data. It is mainly used for serving websites in HTML format for browser consumption.

HTTP/2 and SPDY

The newest version of HTTP is HTTP/2. The protocol originated at Google under the name SPDY. The specification work and maintenance has later been moved to the IETF.

The purpose of HTTP/2 was to develop a better performing protocol, while still maintaining high-level backward compatibility with previous versions of the HTTP protocol. This means compliance to the same HTTP methods, status codes, etc.

New in HTTP/2

HTTP/2 maintains backward compatibility in the way that applications served over the protocol does not require any changes to work for the new version. But the the protocol contains a range of new performance enhancing features that applications can implement on a case-by-case basis.

Header compression

HTTP/2 supports most of the headers supported by earlier versions of HTTP. As something new, HTTP/2 also support compressing these headers to minimize the amount of data that has to be transferred.

Request pipelining

In earlier versions of HTTP, one TCP connection equaled one HTTP connection. In HTTP/2 several HTTP requests can be sent over the same TCP connection.

This allows HTTP/2 to bypass some of the issues in previous version of the protocol, like the maximum connection limit. It also means that all of the formalities of TCP, like handshakes and path MTU discovery, only has to be done once.

No HOL blocking

Head of Line block occurs when some incoming data must be handled in a specific order, and the first data takes a long time, forcing all of the following data to wait for the first data to be processed.

In HTTP/1 HOL blocking happens because multiple requests over the same TCP connection has to be handled in the order they were received. So if a client makes 2 requests to the same server, the second request won’t be handled before the first request is been completed. If the first request takes a long time to complete, this can will hold up the second request as well.

HTTP/2 allows pipelining requests over a single TCP connection, allowing the second request to be received at the same time, or even before, the first request. This means the server is able to handle the second request even while it’s still receiving the first request.

Server Push

In earlier versions, a typical web page was rendered by in a sequential manner. First the browser would request the page to be rendered. The server would then respond with the main content of the page, typically in HTML format. The browser would then start rendering the HTML from the top and down. Every time the browser encountered an external ressource, like a css file, an external JavaScript file, or an image, a new request would be made to the server to request the additional resource.

HTTP/2 offers a feature called server push. Server push allows the server to respond to a single request with several different resources. This allows the server to push required external CSS and JavaScript files to the user on a normal page request, thus allowing the client to have the resources at hand during the rendering, preventing the render-blocking additional requests that otherwise had to be done during page rendering.

HTTPS-only

The HTTP/2 specification, like earlier versions, allows HTTP to function both over a normal unencrypted connection, but also specifies a method for transferring over a TLS-encrypted connection, namely using HTTPS. The major browser vendors, though, have decided to force higher security standards, by only accepting HTTP/2 over encrypted HTTPS connections.

Leave a Comment

PHP regular expression functions causing segmentation fault

We recently had an issue where generating large exports for users to download would suddenly just stop for no apparent reason. Because of the size of the exports the first thought was that the process would time out, but the server didn’t return a timeout error to the browser, and the process didn’t really run for long enough to hit the time limit set on the server.

Looking through the Laravel and Apache vhost logs where the errors would normally be logged didn’t provide any hints as to what the issue was. Nothing was logged. After some more digging I found out that I could provoke an error in the general Apache log file.

It wasn’t a lot to go on, but at least I had a reproducible error message. A segmentation fault (or segfault) means that a process tries to access a memory location that it isn’t allowed to access, and for that reason it’s killed by the operating system.

After following along the code to try to identify when the segfault actually happened, I concluded it was caused by a call too preg_match().

Finally I had something concrete to start debugging (aka Googling) for, and I finally found out which Stackoverflow question contained the answer.

In short the problem happens because the preg_* functions in PHP builds upon the PCRE library. In PCRE certain regular expressions are matched by using a lot of recursive calls, which uses up a lot of stack space. It is possible to set a limit on the amount of recursions allowed, but in PHP this limit defaults to 100.000 which is more than fits in the stack. Setting a more realistic limit for pcre.recursion_limit as suggested in the Stackoverflow thread solved my current issue, and if the limit should prove to be too low for my system’s requirements, at least I will now get a proper error message to work from.

The Stackoverflow answer contains a more in-depth about the problem, the solution and other related issues. Definitely worth a read.

Leave a Comment

OOP Cheatsheet

This is a small OOP (Object oriented programming) cheatsheet, to give a quick introduction to some of the commonly used OOP terminology.

Line Concept Definition
1 Function A function is defined in the global namespace and can be called from anywhere.
2 Variable The variable is enclosed in the function definition, so it can only be used by that function.
4 Variable access Here the value contained in the variable is accessed.
7 Function call This executes the function.
9 Class definition A class is a blueprint that you can generate objects from. All new objects based on a class will start out with everything that has been defined in the class definition.
10 Property A property is like a variable, but is accessible from the entire object it is defined in. Objects can have different visibilities.
12 Method A method is a function defined inside a class. It is always accessible to all objects of the class, and depending on it’s visibility it might, or might not, be accessible from outside the class.
13 Property usage An object can reach it’s own properties using the $this-> syntax.
17 Object instantiation This is an object is created based on a class definition.
19 Method call The method Something::returnSomethingElse() is called on the newly created object. The method has it’s visibility set to “public”, hence it can be called from outside the object itself.
21 Property access This is how the property Something::$somethingElse is accessed from outside the object. But in this case the property has the visibility protected which means it can’t be accessed from outside the object itself, hence this will cause PHP to fail.
Leave a Comment

Validating email senders

Email is one of the most heavily used platforms for communication online, and has been for a while.

Most people using email expect that they can see the who sent the email by looking at the sender field in their email client, but in reality the Simple Mail Transfer Protocol (SMTP) that defines how emails are exchanged (as specified in rfc 5321) does not provide any mechanism for validating that the sender is actually who he claims to be.

Not being able to validate the origin of an email has proven to be a really big problem, and provides venues for spammers, scammers, phishers and other bad entities, to pretend to be someone that they are not. A couple of mechanisms has later been designed on top of SMTP to try and add this validation layer, and I’ll try to cover some of the more widely used ones here.

Since we’re talking about online technologies, get ready for abbr galore!

Table of contents:

Sender Policy Framework (SPF)

The Sender Policy Framework (SPF) is a simple system for allowing mail exchangers to validate that a certain host is authorised to send out emails from a specific host domain. SPF builds on the existing Domain Name System (DNS).

SPF requires the domain owner to add a simply TXT record to the domain’s DNS records, which specifies which hosts and/or IPs are allowed to send out mails on behalf of the domain in question.

An SPF record is a simple line in a TXT record on the form:

For example:

When an email receiver receives email the process is:

  1. Lookup the sending domain’s DNS records
  2. Check for an spf record
  3. Compare the spf record with the actual sender
  4. Accept / reject the email based on the output of step 3

For more details check out the SPF introduction or check your domain’s setup with the SPF checker.

DomainKeys Identified Mail (DKIM)

DomainKeys Identified Mail (DKIM) is another mechanism for validating whether the sender of an email is actually allowed to send mails on behalf to the sender. Similarly to SPF, DKIM builds on DNS, but DKIM uses public-key cryptography, similar to TLS and SSL which provides the basis for HTTPS.

In practice DKIM works by having the sender add a header to all emails being send which a hashed and encrypted version of a part of the body, as well as some selected headers. The receiver then reads the header, queries the sending domain’s DNS for the public key to decrypt the header, and checks the validity.

For more details, WikiPedia provides a nice DKIM overview.

Domain Message Authentication Reporting & Conformance (DMARC)

Domain Message Authentication Reporting & Conformance (DMARC) works in collaboration with SPF and DKIM. In it’s simplest form, DMARC provides a rules for how to handle messages that fail their SPF and/or DKIM checks. Like the other two, DMARC is specified using a TXT DNS record, on the form

For instance

This specifies that the record contains DMARC rules (v=DMARC1), that nothing special should be done to emails failing validation (p=none), that any forensic reports should be sent to email@example.com (ruf=email@example.com) and that any aggregate reports should be sent to email@example.com (rua=email@example.com).

Putting DMARC in “none”-mode is a way of putting DMARC in investigation mode. In this mode the receiver will not change anything regarding the handling of failing mails, but error reports will be sent to the email adresses specified in the DMARC DNS rules. This allows domain owners to gather data about where emails sent from their domains are coming from, and allows them to make sure all necessary SPF and DKIM settings are aligned properly, before moving DMARC into quarantine mode, where receivers are asked to flag mails failing their checks as spam, or reject mode, where failing emails are rejected straight away.

For a bit more detail, check out the DMARC overview or check the validity of your current setup using the DMARC validator.

Leave a Comment