Download Source Code and Test Project
Introduction
Note: You can try out the code from this article using our online JavaScript Formatter.
Many websites use JavaScript, which runs on the user's browser (client side) to produce a richer browsing experience. While JavaScript can easily be embedded within a web page, larger chunks of scripting source code are often produced in a separate file.
Because these files can grow quite large, it is common practice to compact them by removing unneeded whitespace characters. This can significantly reduce the amount of bandwidth required to download them to the user's browser.
Unfortunately, removing unneeded whitespace also makes the code much more difficult to read. For developers wanting to look at the inner workings of websites they didn't create, compacted JavaScript can make understanding client script much more difficult.
This article will present a C# class to format JavaScript. While it can be used to format any JavaScript, it is probably most useful for those wanting to browse compacted JavaScript source code.
Writing a JavaScript Formatter
Modifying a large block of JavaScript could get very tedious and so this is a perfect task for the computer. However, let me just say that the logic to implement a JavaScript formatter is not trivial. My code starts by extracting tokens from the input script. It then outputs those tokens along with appropriate whitespace characters.
The difficult part is tracking state information. For example, if an open parenthesis follows the plus sign, a space should separate the two. But if it follows a symbol name, it's probably the start of an argument list and no space should be added before the open parenthesis. Also, while braces are normally used wrap an indent block, this is not always the case. And so my code must "unindent" after the first statement in an indent block without braces. That statement might end with a semicolon, it could end with a closing curly brace (which means an open curly brace appeared within the indent block), or (and this is what makes JavaScript so much fun) it might not have either because trailing semicolons are optional in some cases.
While I can imagine some more sophisticated approaches to tracking state information, I tried to keep the code simple as possible. So I just used a couple of variables to track basic state information, and then put tests in the code to try and catch special cases.
The result is more of a brute-force approach and isn't quite as clean as I'd prefer. I spent a fair amount of time testing with JavaScript from a variety of sources. And, yes, I suspect there may be couple of rare constructs out there that don't format exactly right. But, for the most part, the code seems to be working pretty well.
The JavaFormatter Class
Listing 1 shows a partial listing of my JavaFormatter class. The complete source code is too long to list here but is included in the attached test project.
The first thing the code does is break the JavaScript up into tokens. This turned out to be the easy part. As stated, I used a couple of variables to track state information such as the parentheses depth and various flags for the current line. I also used a separate class (the Indents class) to track the current indentation depth, and store flags associated with each indent.
The Format() method shown in the listing is the public method called to format a script. It accepts a JavaScript string as an argument and returns the formatted result. The Format() method creates an instance of the Tokenizer class (not shown), and calls it to extract each token.
The code writes the result to a StringBuilder object, inserting whitespace as needed. When finished, the result is converted to a string and returned to the caller.
Listing 1: Partial Listing of the JavaFormatter Class
/// <summary>
/// Formats the given JavaScript string.
/// </summary>
/// <param name="javascript">JavaScript script to format</param>
/// <returns>The formatted string</returns>
public string Format(string javascript)
{
_builder = new StringBuilder(javascript.Length);
_indents = new Indents();
_parenCount = 0;
_bracketCount = 0;
_lineFlags = LineFlags.None;
_nextLineFlags = LineFlags.None;
Tokenizer tokenizer = new Tokenizer(javascript);
bool endLine = false; // Cause new line
bool isLineStart = true; // Current token is first on line
Token peek = null;
// Process each token in input string
while (tokenizer.GetToken())
{
// Get current token
Token token = tokenizer.Token;
// Test for new line
if (_builder.Length > 0)
{
isLineStart = endLine;
if (endLine)
{
NewLine();
endLine = false;
}
}
// Process this token
switch (token.Type)
{
case TokenTypes.OpenBrace:
if (!isLineStart)
{
if (OpenBraceOnNewLine && _builder.Length > 0)
{
// Put open brace on new line
NewLine();
}
else
{
// Put open brace on same line
if (token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.OpenBracket)
_builder.Append(' ');
}
}
// Write token
_builder.Append(token.Value);
// Start new indent block
peek = tokenizer.PeekToken();
if (peek.Type == TokenTypes.CloseBrace)
{
// Special handling for "{}"
tokenizer.GetToken();
_builder.Append(tokenizer.Token.Value);
peek = tokenizer.PeekToken();
if (peek.Type != TokenTypes.SemiColon &&
peek.Type != TokenTypes.Comma)
{
// Unindent if in conditional block without braces
while (_indents.Current.HasFlag(IndentFlags.NoBraces))
_indents.Unindent();
endLine = true;
}
else if (peek.Type == TokenTypes.Comma)
{
// Normally, we insert a new line after
// a closing brace and comma but not here
tokenizer.GetToken();
_builder.Append(tokenizer.Token.Value);
}
}
else
{
// Increase indentation
IndentFlags flags = IndentFlags.None;
if (_lineFlags.HasFlag(LineFlags.DoKeyword))
flags |= IndentFlags.DoBlock;
else if (_lineFlags.HasFlag(LineFlags.CaseKeyword))
flags |= IndentFlags.CaseBlock;
_indents.Indent(flags);
endLine = true;
}
break;
case TokenTypes.CloseBrace:
// End indent block
if (_indents.Current.HasFlag(IndentFlags.CaseBlock))
{
// Extra unindent if in case/default block
_indents.Unindent();
if (isLineStart)
_indents.StripTrailingIndent(_builder);
}
// Unindent if in conditional block without braces
while (_indents.Current.HasFlag(IndentFlags.NoBraces))
_indents.Unindent();
// Regular unindent
_indents.Unindent();
if (isLineStart)
_indents.StripTrailingIndent(_builder);
else
NewLine();
_builder.Append(token.Value);
// Don't unindent without braces for catch/finally
peek = tokenizer.PeekToken();
if (peek.Value != "catch" &&
peek.Value != "finally" &&
peek.Value != ":")
{
// Unindent if in conditional block without braces
while (_indents.Current.HasFlag(IndentFlags.NoBraces))
_indents.Unindent();
}
if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
_lineFlags |= LineFlags.EndDoBlock;
// Insert new line after code block
if (peek.Type != TokenTypes.SemiColon &&
peek.Type != TokenTypes.CloseParen &&
peek.Type != TokenTypes.CloseBracket &&
peek.Type != TokenTypes.Comma &&
peek.Type != TokenTypes.OpenParen &&
peek.Type != TokenTypes.Colon &&
!_lineFlags.HasFlag(LineFlags.EndDoBlock))
{
endLine = true;
}
break;
case TokenTypes.OpenParen:
if (!isLineStart &&
token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.UnaryPrefix &&
token.PreviousType != TokenTypes.CloseBracket &&
token.PreviousType != TokenTypes.CloseParen &&
token.PreviousType != TokenTypes.CloseBrace &&
(token.PreviousType != TokenTypes.Symbol ||
(_lineFlags.HasFlag(LineFlags.BlockKeyword) &&
_parenCount == 0)))
_builder.Append(' ');
_builder.Append(token.Value);
_parenCount++;
break;
case TokenTypes.CloseParen:
// Append closing parenthesis
_builder.Append(token.Value);
_parenCount = Math.Max(_parenCount - 1, 0);
// Test for indent block start without braces
if (_parenCount == 0 &&
_lineFlags.HasFlag(LineFlags.BlockKeyword))
{
// Examine next token
peek = tokenizer.PeekToken();
if (peek.Type != TokenTypes.OpenBrace)
{
// Single line indent with no conditions or braces
_indents.Indent(IndentFlags.NoBraces);
endLine = true;
}
}
break;
case TokenTypes.OpenBracket:
if (!isLineStart &&
token.PreviousType != TokenTypes.Symbol &&
token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.CloseParen &&
token.PreviousType != TokenTypes.CloseBracket)
_builder.Append(' ');
// Special handling for JSON syntax?
peek = tokenizer.PeekToken();
if (_lineFlags.HasFlag(LineFlags.JsonColon) &&
peek.Type != TokenTypes.CloseBracket &&
peek.Type == TokenTypes.OpenBrace &&
_parenCount == 0)
{
if (OpenBraceOnNewLine)
NewLine();
_indents.Indent(IndentFlags.BracketBlock);
endLine = true;
}
_builder.Append(token.Value);
_bracketCount++;
break;
case TokenTypes.CloseBracket:
_bracketCount = Math.Max(_bracketCount - 1, 0);
if (_indents.Current.HasFlag(IndentFlags.BracketBlock))
{
_indents.Unindent();
if (isLineStart)
{
_indents.StripTrailingIndent(_builder);
_builder.Append(token.Value);
}
else
{
NewLine();
_builder.Append(token.Value);
}
}
else _builder.Append(token.Value);
break;
case TokenTypes.Symbol:
bool blockKeyword = _blockKeywords.Contains(token.Value);
// Special handling for else without if
if (token.Value == "else" &&
tokenizer.PeekToken().Value != "if")
blockKeyword = true;
// Special handling for switch..case..default
if (_indents.Current.HasFlag(IndentFlags.CaseBlock) &&
(token.Value == "case" ||
token.Value == "default"))
{
_indents.StripTrailingIndent(_builder);
_indents.Unindent();
}
if (_parenCount == 0 && blockKeyword)
{
// Keyword that starts an indented block
if (!isLineStart)
_builder.Append(' ');
// Append this symbol
_builder.Append(token.Value);
if (!_lineFlags.HasFlag(LineFlags.EndDoBlock) ||
token.Value != "while")
{
// Test for special-case blocks
if (token.Value == "do")
_lineFlags |= LineFlags.DoKeyword;
// Examine next token
peek = tokenizer.PeekToken();
if (peek.Type == TokenTypes.OpenBrace ||
peek.Type == TokenTypes.OpenParen)
{
// Handle indentation at ')' or '{'
_lineFlags |= LineFlags.BlockKeyword;
}
else
{
// Single line indent with no conditions or braces
IndentFlags flags = IndentFlags.NoBraces;
if (_lineFlags.HasFlag(LineFlags.DoKeyword))
flags |= IndentFlags.DoBlock;
_indents.Indent(flags);
endLine = true;
}
}
}
else
{
// All other symbols
if (!isLineStart &&
token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.OpenBracket &&
token.PreviousType != TokenTypes.UnaryPrefix &&
token.PreviousType != TokenTypes.Dot)
_builder.Append(' ');
// Flag line for case block
if (token.Value == "case" || token.Value == "default")
_lineFlags |= LineFlags.CaseKeyword;
_builder.Append(token.Value);
}
break;
case TokenTypes.String:
case TokenTypes.Number:
case TokenTypes.RegEx:
// Emit constant
if (!isLineStart &&
token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.OpenBracket &&
token.PreviousType != TokenTypes.UnaryPrefix)
_builder.Append(' ');
_builder.Append(token.Value);
break;
case TokenTypes.SemiColon:
_builder.Append(token.Value);
if (_parenCount == 0)
{
// Unindent if in conditional block without braces
while (_indents.Current.HasFlag(IndentFlags.NoBraces))
_indents.Unindent();
if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
_nextLineFlags |= LineFlags.EndDoBlock;
// Determine if end of single-line indent block
peek = tokenizer.PeekToken();
if (peek.Type == TokenTypes.LineComment ||
peek.Type == TokenTypes.InlineComment)
{
bool newLine;
if (peek.Type == TokenTypes.LineComment)
newLine = NewLineBeforeLineComment;
else
newLine = NewLineBeforeInlineComment;
tokenizer.GetToken();
if (newLine)
NewLine();
else
_builder.Append(' ');
_builder.Append(tokenizer.Token.Value);
}
endLine = true;
}
break;
case TokenTypes.Comma:
_builder.Append(token.Value);
// Append newline if it looks like JSON syntax
if (token.PreviousType == TokenTypes.CloseBrace ||
(_lineFlags.HasFlag(LineFlags.JsonColon) &&
_parenCount == 0 &&
_bracketCount == 0 &&
_indents.Count > 0))
endLine = true;
break;
case TokenTypes.Colon:
if (!_lineFlags.HasFlag(LineFlags.CaseKeyword))
{
// Standard colon handling
if (!isLineStart &&
(_lineFlags.HasFlag(LineFlags.QuestionMark) ||
token.PreviousType == TokenTypes.CloseBrace))
_builder.Append(' ');
_builder.Append(token.Value);
// May be JSON syntax
if (!_lineFlags.HasFlag(LineFlags.QuestionMark))
_lineFlags |= LineFlags.JsonColon;
}
else
{
// Special handling for case and default
_builder.Append(token.Value);
_indents.Indent(IndentFlags.CaseBlock);
endLine = true;
}
break;
case TokenTypes.QuestionMark:
_lineFlags |= LineFlags.QuestionMark;
if (!isLineStart)
_builder.Append(' ');
_builder.Append(token.Value);
break;
case TokenTypes.BinaryOperator:
case TokenTypes.UnaryPrefix:
if (!isLineStart &&
token.PreviousType != TokenTypes.OpenParen &&
token.PreviousType != TokenTypes.OpenBracket &&
token.PreviousType != TokenTypes.UnaryPrefix)
_builder.Append(' ');
_builder.Append(token.Value);
break;
case TokenTypes.LineComment:
// Separate line comment from previous token
if (!isLineStart)
{
if (NewLineBeforeLineComment)
NewLine(); // Separate with new line
else
_builder.Append(' '); // Separate with space
}
// Append comment
_builder.Append(token.Value);
// Line comment always followed by new line
endLine = true;
break;
case TokenTypes.InlineComment:
// Separate line comment from previous token
if (!isLineStart)
{
if (NewLineBeforeInlineComment)
NewLine(); // Separate with new line
else
_builder.Append(' '); // Separate with space
}
// Append comment
_builder.Append(token.Value);
// New line after comment
if (NewLineAfterInlineComment)
endLine = true;
break;
default:
_builder.Append(token.Value);
break;
}
}
_builder.AppendLine();
return _builder.ToString();
}
/// <summary>
/// Emits a new line to the output string.
/// </summary>
protected void NewLine()
{
_builder.AppendLine();
_builder.Append(_indents.ToString());
_bracketCount = _parenCount = 0;
_lineFlags = _nextLineFlags;
_nextLineFlags = LineFlags.None;
}
To use the code, simply create an instance of the JavaFormatter class and call its Format() method.
There are also four Boolean properties that affect how the script is formatted. OpenBraceOnNewLine determines if a new line should be inserted before an opening curly brace. NewLineBeforeLineComment determines if a new line should be inserted before a line comment. And NewLineBeforeInlineComment and NewLineAfterInlineComment determine if a new line should be inserted before and after an inline comment.
Conclusion
I haven't gone into great detail about how the code works. The fact is that there really wasn't any slick algorithm employed here. The core logic is just based on various state variables and tests for special conditions.
The test project download includes all the source code and a comprehensive test project. You can use the code as is, or you can browse the source code if you want a closer look at how it works.
Update History
12/2/2012: Added support for exponential notation and fixed issues with regular expressions. Thanks to feedback from Eric Lawrence
4/8/2014: Corrected an issue where support for exponential notation caused problems with the 'e' in hexadecimal numbers.
End-User License
Use of this article and any related source code or other files is governed
by the terms and conditions of
.
Author Information
Jonathan Wood
I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.