SQL Major Concepts
CASE and COALESCE
Perhaps the most commonly used function, which is classified as a system function, is CASE. There are two general uses of CASE. The first one is used to replace occurrences of one value with other values, as specified by the programmer. Syntax for this flavor of CASE is as follows:
SELECT column_name = CASE
WHEN column_name = 'a' THEN 'b'
...
ELSE 'c'
END
For instance, suppose that we want to specify the salary level for each job category within my publishing company. Using the pubs database, we could write the following query:
SELECT job_desc, salary_level =
CASE
WHEN job_desc = 'New Hire - Job Not Specified' THEN '25K'
WHEN job_desc = 'Chief Executive Officer' THEN '500K'
WHEN job_desc = 'Business Operations Manager' THEN '350K'
WHEN job_desc = 'Chief Financial Officier' THEN '400K'
WHEN job_desc = 'Publisher' THEN '75K'
WHEN job_desc = 'Managing Editor' THEN '65K'
WHEN job_desc = 'Marketing Manger' THEN '55K + commissions'
ELSE 'you get the idea'
END
FROM jobs
Results:
job_desc |
salary_level |
New Hire—job not specified |
25K |
Chief Executive Officer |
500K |
Business Operations Manager |
350K |
Chief Financial Officer |
400K |
Publisher |
75K |
Managing Editor |
65K |
Marketing Manager |
You get the idea |
Public Relations Manager |
You get the idea |
The other variation of CASE, which is sometimes referred to as the searched CASE, evaluates a Boolean expression and returns different values accordingly. For instance, we could use the searched CASE to categorize the price of titles as cheap, affordable, expensive, or unknown, as follows:
SELECT title_id, price, category= CASE
WHEN price IS NULL THEN 'unknown'
WHEN price < = 7 THEN 'cheap'
WHEN price BETWEEN 7.1 AND 15 THEN 'affordable'
ELSE 'expensive'
END
FROM titles
Results:
title_id |
price |
Category |
BU1032 |
19.99 |
Expensive |
BU1111 |
11.95 |
Affordable |
BU2075 |
2.99 |
Cheap |
BU7832 |
19.99 |
Expensive |
MC2222 |
19.99 |
Expensive |
MC3021 |
2.99 |
Cheap |
MC3026 |
NULL |
Unknown |
PC1035 |
22.95 |
Expensive |
PC8888 |
20 |
Expensive |
PC9999 |
NULL |
Unknown |
PS1372 |
21.59 |
Expensive |
PS2091 |
10.95 |
Affordable |
PS2106 |
7 |
Cheap |
PS3333 |
19.99 |
Expensive |
PS7777 |
7.99 |
Affordable |
TC3218 |
20.95 |
Expensive |
TC4203 |
11.95 |
Affordable |
TC7777 |
14.99 |
Affordable |
The COALESCE function hardly qualifies as a system function because it doesn't retrieve any system information. COALESCE simply returns the first value out of a list that is not NULL. COALESCE is a powerful tool if you are returning numerous values to a user and want to substitute occurrences of NULL with values from a different column or with an expression.
For instance, suppose that we need to return a list of cities and states. If a publisher is located in a country that does not have a state, we'll substitute a NULL value with an expression:
SELECT
city,
COALESCE(state, 'no state specified') AS state
FROM publishers
Results:
City |
State |
Boston |
MA |
Washington |
DC |
Berkeley |
CA |
Chicago |
IL |
Dallas |
TX |
München |
no state specified |
New York |
NY |
Paris |
no state specified |
The only requirement of COALESCE is that all expressions in the list must have compatible data types. For instance, you can't coalesce DATETIME and INTEGER. Notice that the COALESCE function can accept multiple values.
ISNULL, NULLIF, and GETANSINULL
The ISNULL function is similar to COALESCE, but accepts only two parameters. The first parameter will be checked, and if NULL value is found, it will be replaced with the second parameter. Furthermore, ISNULL requires that both parameters have the same (not just similar) data type.
For instance, we can return 0 instead of NULL for titles that do not have any royalties associated with them, as follows:
SELECT
TOP 10
title_id,
ISNULL(royalty, 0) AS royalty
FROM titles
Results:
title_id royalty
-------- -----------
BU1032 10
BU1111 10
BU2075 24
BU7832 10
MC2222 12
MC3021 24
MC3026 0
PC1035 16
PC8888 10
PC9999 0
The NULLIF function returns a NULL value if the two parameters it accepts are equivalent. NULLIF can be thought of as an opposite of ISNULL; for instance, we could use NULLIF if we wanted to return NULLs whenever royalty is 10:
SELECT
TOP 10
title_id,
NULLIF(royalty, 10) AS royalty
FROM titles
Results:
title_id royalty
-------- -----------
BU1032 NULL
BU7832 NULL
MC2222 12
MC3021 24
MC3026 NULL
PC1035 16
The GETANSINULL function provides a quick way of checking whether column nullability is determined according to the ANSI 92 standard. (I have not found much use for this function; you can refer to online documentation if you want to learn more about GETANSINULL).
CAST and CONVERT
The CAST and CONVERT functions are very similar: Both translate a value from one data type to another. Although their performance is also similar, their syntax and potential usage is slightly different.
Both CAST and CONVERT have a straightforward syntax:
CAST(expression AS new_data_type)
CONVERT(new_data_type, expression, [style])
The expression must already have a data type that is translatable into the new_data_type. For instance, you can't convert an alphanumeric string into an integer.
NOTE
CONVERT has an optional parameter: style. This parameter is allowed only for cases when working with date and time values. SQL Server supports numerous formats for presenting date and time values; the style parameter is used to specify such format.
For example, suppose that we want to retrieve order dates from the sales table. However, we don't care about the time portion; all we need to know is the order date. We could use either CAST or CONVERT to do this, as in the following queries:
SELECT TOP 1 CAST(ord_date AS VARCHAR(12)) FROM sales
or
SELECT TOP 1 CONVERT(VARCHAR(12), ord_date, 109) FROM sales
Both return the same results:
------------
Sep 14 1994
In this example, we retrieved a date value in its default format (on my server). In fact, CONVERT(VARCHAR(12), ord_date) would bring the same result. Now, suppose that we need a date value in which month, day, and year are separated by dashes. In such a case, we have no choice but to resort to CONVERT with the style 110:
SELECT TOP 1 CONVERT(VARCHAR, ord_date, 110) FROM sales
Results:
------------
09-14-1994
IDENTITY- and GUID-Related Functions
The @@IDENTITY, IDENTITY, SCOPE_IDENTITY, IDENT_CURRENT(), and NEWID() functions deal with IDENTITIY values or globally unique identifiers.
The NEWID() function must be used if you want to provide a default value for a column with the UNIQUEIDENTIFIER data type. It returns a new GUID each time it is executed.
The IDENTITY function has a limited use; in rare cases, when you use SELECT…INTO syntax you can supply identity values for the newly created table using the IDENTITY function. For instance, suppose we want to add an identity column to the sales table (within a temporary table). We could use the following statement to copy all rows from sales table into #new_sales and add an identity column, all in one shot:
SELECT IDENTITY(INT, 1,1) AS sales_key, *
INTO #new_sales
FROM sales
The other three IDENTITY-related functions deserve more attention. You will often need to populate multiple related tables, perhaps within a single transaction. For instance, you could be populating the order and order_details tables in one transaction. If the order table has an identity column, you'll have to look up the identity value just inserted into the order table before you can add a related record in order_details. The @@IDENTITY, SCOPE_IDENTITY(), and IDENT_CURRENT() functions help you look up the last identity value inserted, but their behavior is slightly different from each other, as follows:
· @@IDENTITY returns the last IDENTITY value inserted on the current connection. Suppose that you have an INSERT trigger on the order table that populates the audit_trail table, which also has an IDENTITY column. In such a case, the @@IDENTITY function will return the last identity value inserted, which would be the identity inserted in the audit_trail table instead of the identity value added to the order table. Therefore, if you try populating order_details with a value returned by the @@IDENTITY function, your data integrity will be compromised.
· The IDENT_CURRENT() function accepts a table name as the parameter and returns the last identity value generated in that table by any connection. If you're trying to populate order_details with the last identity value inserted into the order table, you might think that IDENT_CURRENT('order') would suffice. Well, it would if you were the only user of the system, but it is possible that some other user might add a row to the order table a few milliseconds after you added the row to the same table. Therefore, using IDENT_CURRENT() in a multiuser system might also compromise your data integrity.
· The SCOPE_IDENTITY() function takes no parameters and returns the last identity value inserted within the current scope. What this means is that if an INSERT statement populating the order table fires a trigger and adds a row to the audit_trail table, SCOPE_IDENTITY() will return the last value added to the order table, whereas @@IDENTITY will return the last value added to audit_trail. These functions have appropriate uses, but they're often used incorrectly
USER- and HOST-Related Functions
The CURRENT_USER, SYSTEM_USER, USER_NAME(), and SESSION_USER functions return information about the current user of the system. The output of these functions is useful for enforcing security—you can allow reading or modifying certain data only to authorized users. These functions are also handy when you want to create an audit trail of data changes: Every time someone modifies the data, you can store the user name in the updated_by column, or perhaps even insert a row in the audit tail table.
CURRENT_USER, SESSION_USER and USER_NAME() are equivalent—all three return the current database user name. Notice, however, that the USER_NAME() function accepts a user_id as the parameter, defaulting to the current user if no parameter is passed. All these functions can also be successfully used for auditing purposes. USER_NAME can also be used to look up a database user name by its identifier.
SYSTEM_USER returns the SQL Server or Windows login name, depending on the authentication method used.
HOST_ID() and HOST_NAME() return workstation identifier and name, respectively. They also can be used to troubleshoot problems specific to a particular computer.
Data Type Validation Functions
The ISDATE and ISNUMERIC functions are similar to each other: They let you know whether the parameter passed is of a valid data type (date and time data type in case of ISDATE and numeric data type in case of ISNUMERIC). Both return a BIT value, as in the following:
SELECT
ISDATE('february 39, 2002') AS 'february 39, 2002',
ISDATE('1/1/2002') AS '1/1/2002',
ISNUMERIC('abc') AS 'abc',
ISNUMERIC('123') AS '123'
Results:
february 39, 2002 |
1/1/2002 |
abc |
123 |
0 |
1 |
0 |
1 |
Other System Functions
The CURRENT_TIMESTAMP function works exactly the same way as GETDATE(): It returns current date and time. Oddly enough, CURRENT_TIMESTAMP is classified as a system function, whereas GETDATE falls into the date and time category.
The DATALENGTH function is similar to the LEN function, which returns the length of a particular string expression. DATALENGTH returns the number of bytes used to represent an expression of any data type.
The @@TRANCOUNT function returns the number of open transactions on a particular connection. If you experience locking problems or SQL Server complains about transactions being opened, you can check the value of @@TRANCOUNT to troubleshoot such issues. @@TRANCOUNT can also be used for error handling; if @@TRANCOUNT returns anything other than 0, something must have gone wrong and you have uncommitted transactions. (Refer to my article about transactions and locking for more information on this topic.)
The @@ERROR function returns the number of the last error encountered on the current connection. If there are no errors, @@ERROR returns 0. Not surprisingly, @@ERROR is used for error handling.
The @@ROWCOUNT function returns the number of rows affected by your last query. This function can be used effectively to find out whether the number of rows modified is the same as what you intended to modify. If the two values are different, something went wrong.
The ROWCOUNT_BIG() function does the same thing as @@ROWCOUNT, but returns a BIGINT instead of an INT data type.
NOTE
It's easy to confuse the functionality of ROWCOUNT and @@ROWCOUNT. The former advises SQL Server to affect only a specified number of rows (similar to the TOP keyword); the latter simply counts the number of rows affected, as shown here:
/* first limit the output to 2 rows */
SET ROWCOUNT 2
/* this query will affect only 2 rows */
SELECT title_id, title FROM titles
/* now use functions to count the number of
affected rows */
SELECT @@ROWCOUNT AS '@@rowcount_output',
ROWCOUNT_BIG() AS 'rowcount_big_output'
Results:
title_id title
-------- ---------------------------------------------------------------
PC1035 But Is It User Friendly?
PS1372 Computer Phobic AND Non-Phobic Individuals: Behavior Variations
@@rowcount_output rowcount_big_output
----------------- --------------------
2 2
The APP_NAME() function returns a string with the name of the application that initiated the database connection. APP_NAME() can be helpful if you're troubleshooting a connection and want to know which app initiated the offending process.
>> Download document for full notes.