Converting special characters to HTML entities between code tags

by Matthew James Taylor on 2 August 2008

Converting special characters to HTML entities between code tags in PHP

Anyone who runs a web design blog will soon discover that some posts need to contain HTML examples, but how is this done? If you write HTML in your post like <h1>Heading</h1> then your browser will render the tags as a real heading rather than show the HTML tags as text. Well there are two things you need to do. First, surround your HTML in <code> tags. And secondly, you must convert any special characters to HTML entities. You do this by replacing each special character with its corresponding code as follows.

Special characters and their codes

  • '&' (ampersand) must change to '&amp;'
  • '"' (double quote) must change to '&quot'
  • ''' (single quote) must change to '&#039;'
  • '<' (less than) must change to '&lt;'
  • '>' (greater than) must change to '&gt;'

If your HTML code runs for several lines then you can further surround your <code> tags with <pre> tags. The <pre> tag will preserve the white space and line breaks in your block of HTML code. Here is an example.

<pre>
<code>
<!-- Put your HTML code here -->
</code>
</pre>

Problems with coded special characters in textareas

If the text of your post is saved in a database and is edited in a textarea then you have another problem. All coded special characters lose their coding when placed in a textarea form element, the original character will be displayed instead. This can be a real nightmare because you must manually change every special character back to its HTML entity before you can submit any changes to your database. If you save without changing them, then they will be rendered as HTML instead of displayed as text.

PHP to the rescue

The solution to this problem is to use PHP to convert the special characters between all code tags before we place the post HTML into the textarea. I looked everywhere for a PHP function to do this but I just couldn't find one. There were a few plugins for Wordpress and other blogging engines but because I built my blog from scratch in PHP and MySQL there was nothing that I could use. So I had to write something myself.

My PHP function to convert special characters between code tags

The following PHP function will run through the HTML of a blog post and convert all special characters that are located inside <code> tags to their corresponding HTML entity. The function is in 3 parts: a main function that loops through each character of the HTML, and two helper functions that check for start and end code tags. All three are required for it to work.

function fixcodeblocks($string) {
	// Create a new array to hold our converted string
	$newstring = array();
	
	// This variable will be true if we are currently between two code tags
	$code = false;
	
	// The total length of our HTML string
	$j = mb_strlen($string);
	
	// Loop through the string one character at a time
	for ($k = 0; $k < $j; $k++) {
		// The current character
		$char = mb_substr($string, $k, 1);
		
		if ($code) {
			// We are between code tags
			// Check for end code tag
			if (atendtag($string, $k)) {
				// We're at the end of a code block
				$code = false;
				
				// Add current character to array
				array_push($newstring, $char);
				
			} else {
				// Change special HTML characters
				$newchar = htmlspecialchars($char, ENT_QUOTES);
				
				// Add character code to array
				array_push($newstring, $newchar);
			}
		} else {
			// We are not between code tags
			// Check for start code tag
			if (atstarttag($string, $k)) {
				// We are at the start of a code block
				$code = true;
			}
			// Add current character to array
			array_push($newstring, $char);
		}
	}
	//Turn the new array into a string
	$newstring = join("", $newstring);
	
	// Return the new string
	return $newstring;
}

function atstarttag($string, $pos) {
	// Only check if the last 6 characters are the start code tag
	// if we are more then 6 characters into the string
	if ($pos > 4) {
		// Get previous 6 characters
		$prev = mb_substr($string, $pos - 5, 6);
		
		// Check for a match
		if ($prev == "<code>") {
			return true;
		} else {
			return false;
		}
	} else {
		return false;
	}
}

function atendtag($string, $pos) {
	// Get length of string
	$slen = mb_strlen($string);
	
	// Only check if the next 7 characters are the end code tag
	// if we are more than 6 characters from the end
	if ($pos + 7 <= $slen) {
		// Get next 7 characters
		$next = mb_substr($string, $pos, 7);
		
		// Check for a match
		if ($next == "</code>") {
			return true;
		} else {
			return false;
		}
	} else {
		return false;
	}
}

Function usage

To use the function simply call the fixcodeblocks function with your HTML string as the input parameter like this.

$fixedhtmlstring = fixcodeblocks($htmlstring);

A quick word on efficiency

There might be a more efficient way to do this function but my PHP skills are not advanced enough to simplify it further. I'm not too bothered about this however, as the function is only ever called from my admin area when I'm saving a post and not from the public facing front end. This means that any extra load caused by a slightly inefficient function is negligible. If you know of a better way to do this, please contact me and let me know.


Follow me on Twitter @mattjamestaylor

Enjoy this article?

If you find my website useful, feel free to donate any amount you wish. It will help pay for my hosting! =)

Matthew James Taylor