String Functions: Absolutely Essential to a data Professional’s Toolbox

The ability to format and manipulate Excel character strings is extremely important for data analysis. SQL String functions are useful for querying and manipulating data. Luckily for you, this article will solidify your string function knowledge in both applications.

Transactional SQL, T-SQL is Microsoft’s version of SQL. Its minor syntax differences from SQL won’t be covered in depth here, but we will encounter a couple of examples.

I’m very excited to share string functions with my audience because this function has so much carryover from Excel to SQL.

The ExceltoSQL series aims to help Excel users learn SQL. String functions effectively bridge the knowledge between these applications.

So What are String functions?

String functions are data handling functions used to format or manipulate data. They make data more approachable for users and are a crucial part of the data analyst’s toolbox.

Microsoft supplies a hefty toolbox of String functions for our disposal but we will review a small sample to solidify our foundation.

Check out the Comprehensive list from Microsoft Support below if you have extra time:

https://docs.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql?view=sql-server-ver15

Here are six of the most frequently used string functions from Excel. Let’s see the effect of each one in Excel and in SQL.

CONCAT – Connects multiple text fields together

UPPER – Converts lower case string to Uppercase

LEFT RIGHT MID – Returns the leftmost, Rightmost, or a specified starting point from a data string

 

LEN – tells us how many characters are in a string

SUBSTITUTE – Replaces a specified character with another one

**In SQL Use replace for the same purpose

TRIM – Removes excessive Spaces in a text field

String Functions are most effective when combined with their other string function friends.

Now let’s check out a couple of string function combos!

LEFT/RIGHT/MID + LEN

What if we want to get the Name and title out of the string below? We can’t simply extract it with RIGHT since our target string varies in length.

 477-251-6544, Jeremy, Chef

However, the leading numbers are fixed so we can direct Excel to pull the rightmost contents of D2 after skipping the first fifteen characters with this string.

=RIGHT(D2,LEN(D2)-15)

Let’s Pretend we have an SQL table with Various contacts. This is the statement to ignore the first fifteen characters, just like we did in Excel.

SELECT RIGHT(TEXT,LEN(TEXT)-15) ABBRV_COL FROM dbo.Comments where PostId = 53071

Notice the text ABBRV_COL. This is simply an alias so that the results of our calculated String function don’t read (no column name).

Also, the full Statement reveals I specified table name and the Post ID number so that only one result was returned.

TEXT is the field name residing in the Comments table.

OK, let’s put it all together and combine even more string functions to clean up a really messy data string.

There are all kinds of Issues with this data:

To use this data meaningfully at any point some serious cleaning is required.

The first part of being an effective Problem Solver is identifying the problem. Data formatting issues are no exception.

List out each data issue individually:

Some Records run into a second column. Really? How gross is that?
Inconsistent Characters
Phone Numbers have Parenthesis, dashes, and asterisks
Comma Usage isn’t consistent
The mixture of Upper and Lowercase Characters
Extra Garbage spacing in our Data

 

Now the most important question you can ever ask yourself…

No, it’s not “What are we having for lunch?”

How about…

“Should we use Excel or SQL to clean up this data?”

How about both?

It’s time to attack all of our data issues one at a time.
Data Running Multiple Columns?
Use the CONCAT function to combine our two ugly columns into the one really ugly column.
(CONCAT(D12:E12)


Inconsistent Characters
(UPPER(CONCAT(D12:E12)), “(“, “”), “)”, “”), “*”, “”), “,”, “”),”-“,””)


Use the Substitute function for each unique character removed. To fix our example, SUBSTITUTE will be needed five times. ( “(”, “)”, “*”, “-“, “,”)

 

Substitute(“*”, “”) This replaces the asterisk with the contents between the quotes. In this, you are replacing an asterisk with Nothing.

Continue and Nest the rest of the characters you want to replace.

(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(UPPER(CONCAT(D12:E12)), “(“, “”), “)”, “”), “*”, “”), “,”, “”),”-“,””))

Many times our environment is not Case Sensitive so using the UPPER function is a matter of formatting preference. If the environment is case-sensitive, UPPER becomes much more important.

The environment we are dealing with is not Case Sensitive but we are aiming to get everything into the same format. Insert an UPPER command into our sequence to convert everything.

And finally, for those funky extra spaces, wrap the entire expression in a nice TRIM command.

Excel
=TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(UPPER(CONCAT(D12:E12)), “(“, “”), “)”, “”), “*”, “”), “,”, “”),”-“,””))

Lets see what we did?

That looks really clean

How do we accomplish the same feat in SQL? Let’s pull a SELECT QUERY STATEMENT and find out.

Start with:
SELECT TRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(UPPER(CONCAT(TEXT, TEXT2)), ‘(‘, ”), ‘)’, ”), ‘*’, ”), ‘,’, ”),’-‘,”))

Remember the simple SELECT ALL FROM TABLE format that shows us all the fields in a table. For this example, we are only concerned with the two fields we are displaying, TEXT and TEXT2.

The rest of the String Function tells us how to format the results of our two text fields.

Did you notice the difference in our Statement from the Excel Version?

How about this part?

(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

And this part…

‘(‘, ”), ‘)’, ”), ‘*’, ”), ‘,’, ”),’-‘,”

Our SQL statement had two minor exceptions from the Excel syntax.

1)SQL uses single quotes to encapsulate text where Excel would use double-quotes.

2) REPLACE function is needed instead of SUBSTITUTE. Excel has a replace function but it works a little differently.

Here is our entire statement pointed towards a real table. Notice the FROM tells us the table name and the WHERE condition specifies we only want data from 3 different Posts.


SELECT TRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(UPPER(CONCAT(TEXT,TEXT2)), ‘(‘, ”), ‘)’, ”), ‘*’, ”), ‘,’, ”),’-‘,”))
As FORMATTED_TXT
FROM dbo.comments WHERE PostID IN (‘53068’, ‘53070’, ‘53071’)

See what this does.

Holy smokes, it did the same thing as in Excel!

You just learned how to do something really awesome using two different languages. Now go tell all your friends.

keepersecurity.com

Latest Blog Posts

Learn about SQL joins using Excel

Learn about SQL joins using Excel

Why do we join tables in SQL? Tables that logically represent information are the foundation of relational databases. The ancient rules of database normalization remind us to arrange database tables to reduce redundancy and increase flexibility. An adequately...

Super Bowl Squares Template

Updated for Superbowl 57 This Version is unlocked, so be careful with the formulas. You have to press F9 to Regenerate the squares because calculations are set to manual. This spreadsheet is for educational purposes only. Please check your local gambling restrictions...

How to Return Multiple Matches in one Cell Using a lookup?

VLOOKUP is great for looking up a value and returning another corresponding value. But what happens when the first match just isn't enough. I've been tasked with finding a way to put return not just the first but the second, and third match. At first, I tried to...

How to Delete every other Row in Excel

How to Delete every other Row in Excel

Have you ever needed to delete every other row in Excel? You are not alone.   There are a few ways to accomplish it. It's a pretty good candidate for a VBA solution, but I'm going to show you how to quickly delete every other row without using VBA or a plugin. I...

Select the Email Service Provider that is right for you.

Select the Email Service Provider that is right for you.

  Excelshortcut.com commits to continuous improvement. I've wondered if our Communication distribution process is lacking. Is there a more effective tool to help us share valuable content? Mailchimp has been ExcelShortcut.com's tool for email creation and...

Planning Software

Fiverr Freelancers

Share This