Removing Drupal URL aliases

We use Drupal as the basis for our intranet at work. With time we noticed that it took longer and longer to load individual pages. Part of the problem was tracked down to the number of url aliases in use: around 1500. The majority of these had been generated using the pathauto module.  Apparently the slowdowns associated with url aliases has been fixed in Drupal 4.7 and a patch exists for 4.6, but this was not known and the heavily customised code we use can be difficult to patch. 

I wanted to search for and replace the custom urls across the site with their original Drupal urls. Taking advantage of MySQL's REPLACE command I wrote the following PHP script, designed to run on the command line (PHP-CLI), to get the list of url aliases and systematically replace them in the database. Note that the tables and fields that need to have the replace performed upon them may vary depending on your setup. For instance, we used a custom teaser field in the node table.

BE WARNED! The code below is by no means perfect. I noticed some urls were message up (eg if /node/123 has an alias /science then links with a url of /science/science_links become /node/123/science_links). Now that I know that the problem will go away in Drupal 4.7 I'm not going to bother refining the replace code. I provide her with no warranty just in case someone finds it useful.

<?php
/**
* Command line script to search and replace Drupal url aliases
* @author Andrew Wright <mail@allrite.net>
*/

// Need to add table, field pairs containing the aliases to replace.

$tables = array(
array('table' => 'node', 'field' => 'body'),
array('table' => 'term_data', 'field' => 'description')
)

$rep = new Replace_URL();
$rep->runReplace($tables);
$rep->cleanup();

class Replace_URL
{
var $_db;

function Replace_URL()
{
// Change for your setup
$dbuser = 'username';
$dbpass = 'password';
$dbserver = 'localhost';
$dbname = 'drupal';

$this->_db = mysql_connect($dbserver, $dbuser, $dbpass) or die('Could not connect: ' . mysql_error());
mysql_select_db($dbname) or die('Could not select '.$dbname);
}

function runReplace($tables)
{
$aliases = $this->getAliases();
foreach ($aliases as $alias) {
foreach ($tables as $table) {
$this->replaceInField($alias['dst'], $alias['src'], $table['table'], $table['field']);
}
}
}

function getAliases()
{
$aliases = array();
$sql = 'SELECT src, dst FROM url_alias ORDER BY pid';
$result = mysql_query($sql) or die('Query failed: ' . mysql_error());
while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
$aliases[] = $row;
}
mysql_free_result($result);
return $aliases;
}

function replaceInField($string, $replace, $table, $field)
{
$string = mysql_real_escape_string($string);
$replace = mysql_real_escape_string($replace);
$sql = "UPDATE ".$table." SET ".$field." = REPLACE(".$field.", '".$string."', '".$replace."')";
$result = mysql_query($sql);
if ($result) {
return true;
} else {
echo 'Replace failed: ' . mysql_error()."n";
return false;
}
}

function cleanup()
{
mysql_close($this->_db);
}
}

?>
Filed under: