Configure HTML Purifier Cache directory

Configure HTML Purifier Cache directory

HTML Purifier is a great library for sanitization. It just has a somewhat strange default setting when it comes to caching. It uses an internal cache for the structures it analyses and dumps the cache into files on the disk. The strange thing is the default path if you don't configure one. Because it's in its own library directory. Meaning in the vendor folder. Which of course won't be writable within a common setup for security reasons. It even hints to it in the configuration documentation:

Absolute path with no trailing slash to store serialized definitions in. Default is within the HTML Purifier library inside DefinitionCache/Serializer. This path must be writable by the webserver.

Without the setting, you will get a warning and the cache is simply not used as if you've disabled it intentionally (which will have a performance hit).

I put the whole sanitization logic into SanitizationService and set the path there:

<?php

declare(strict_types=1);

namespace App\Service\Sanitization;

use HTMLPurifier;
use HTMLPurifier_Config;

final class SanitizationService
{
    /** @var string */
    private $cacheDirectory;

    /** @var HTMLPurifier */
    private $htmlPurifier;

    public function __construct(string $cacheDirectory)
    {
    	// Make sure the cache directory exists, as the purifier won't create it for you
        if (!file_exists($cacheDirectory) && !mkdir($cacheDirectory, 0777, true) && !is_dir($cacheDirectory)) {
            throw new \RuntimeException(sprintf('HTML purifier directory "%s" can not be created', $cacheDirectory));
        }
        
        $config = HTMLPurifier_Config::createDefault();
        $config->set('Core.Encoding', 'UTF-8');
        $config->set('Cache.SerializerPath', $cacheDirectory);

        $allowedElements = [
            'p[style]',
            'br',
            'b',
            'strong',
            'i',
            'em',
            's',
            'u',
            'ul',
            'ol',
            'li',
            'span[class|data-custom-id|contenteditable]',
            'table[border|cellpadding|cellspacing]',
            'tbody',
            'tr',
            'td[valign]',
        ];

        $config->set('HTML.Allowed', implode(',', $allowedElements));

        $def = $config->getHTMLDefinition(true);
        $def->addAttribute('span', 'data-custom-id', 'Text');
        $def->addAttribute('span', 'contenteditable', 'Text');

        $this->htmlPurifier = new HTMLPurifier($config);
    }

    public function sanitizeHtml(string $content): string
    {
        return $this->htmlPurifier->purify($content);
    }
}

The cache directory is then injected through the Symfony services configuration:

App\Service\Sanitization\SanitizationService:
  arguments:
    $cacheDirectory: '%kernel.project_dir%/var/cache/html-purifier'