SQL extra

  • This section includes various topics that did not fit in SQL basics sets
  • Some of the topics can be useful with course exercise (for example indexes)

Indexes

  • Column specific indexes can be defined in tables
  • The idea of index is to increase performance of data retrieval
  • There are two kinds of indexes:
    • Indexes
    • Unique indexes
  • Examples of using these indexes are presented below

Example 1

Create a new index for postal_code column in Person table.

CREATE INDEX i_postal_code ON Person(postal_code);

Example 2

Create a new unique index for PersonID column in Person table.

CREATE UNIQUE INDEX i_personID ON Person(PersonID);
  • Index can be removed with DROP statement like shown in the example below
DROP INDEX i_postal_code ON Person;

Advantages of indexing

  • Increase performance on SELECT, JOIN, GROUP BY and ORDER BY
  • Forces to identify the rows

Disadvantages of indexing

  • Affects the performance of changes in data (INSERT, UPDATE, DELETE) since changes are made for both the database table and the corresponding index

What you should index

  • Primary and foreign keys
  • Columns which are queried and sorted frequently and which have a great amount of data (thousands of rows at minimum)

What you shouldn't index

  • Columns that are not frequently present in queries
  • Columns containing only a few different values (for example a department column)

Index effect on performance

  • In this section you find a demo where index performance is measured using a bigger data amount

Example

Firstly we create a new table called Person on our database called testing. At this point unique index is defined for the email column.

MariaDB [testing]> CREATE TABLE Person (
ID varchar(6) not null,
lastname varchar(128) not null,
firstname varchar(128) not null,
address varchar(255) not null,
phone varchar(64) not null,
email varchar(128),
PRIMARY KEY (ID),
UNIQUE INDEX (email)
);

Next we show index properties from Person table.

MariaDB [testing]> SHOW INDEX FROM Person;

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Person |          0 | PRIMARY  |            1 | ID          | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| Person |          0 | email    |            1 | email       | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Definitions for these columns:

  • Table: Table name
  • Non_unique: Defines whether the information is unique (0 → index cannot contain duplicate values, 1 → index can contain duplicate values)
  • Key_name: Index name (if this is a primary key it will be marked as PRIMARY)
  • Seq_in_index: Column sequential number in index starting from 1
  • Column_name: Column name
  • Collation: Tells how column will be ordered in index (A → Ascending order, NULL = No order)
  • Cardinality: Unique value count in index
  • Sub_part: Index prefix, which tells the amount of characters that is indexed in column (number → the amount of indexed characters in column, NULL → the whole column is indexed)
  • Packed: Tells how the key information is compressed (NULL → no compression for the key information)
  • Null: Tells whether there can be NULL values in column (YES → can have NULL values, "" → cannot have NULL values)
  • Index_type: Indexing type that is in use (BTREE, FULLTEXT, HASH, RTREE)
  • Comment: Additional information for index

Next we'll write a script which prepares an SQL file having one million INSERT statements.

def createFile(rows):
    f = open("file.sql","a+")
    for i in range(rows):
        f.write("INSERT INTO Person VALUES ({},'{}','{}','{}','{}','{}');".format(str(i),'last'+str(i+10),'first'+str(i+20),'addr'+str(i+30),'000'+str(i),str(i)+'@company.com')+"\n")

# Let's create one million rows
createFile(1000000)

Data will be some nonsense data for each line, but it will be adequate for our test. Now if we run this script like shown below, we should have file.sql file created with desired lines.

user@ubuntu:~# python3 add_lines.py

Let's print the first and last 10 lines from the file to verify the content.

user@ubuntu:~# head file.sql
INSERT INTO Person VALUES (0,last10,first20,addr30,0000,0@company.com);
INSERT INTO Person VALUES (1,last11,first21,addr31,0001,1@company.com);
INSERT INTO Person VALUES (2,last12,first22,addr32,0002,2@company.com);
INSERT INTO Person VALUES (3,last13,first23,addr33,0003,3@company.com);
INSERT INTO Person VALUES (4,last14,first24,addr34,0004,4@company.com);
INSERT INTO Person VALUES (5,last15,first25,addr35,0005,5@company.com);
INSERT INTO Person VALUES (6,last16,first26,addr36,0006,6@company.com);
INSERT INTO Person VALUES (7,last17,first27,addr37,0007,7@company.com);
INSERT INTO Person VALUES (8,last18,first28,addr38,0008,8@company.com);
INSERT INTO Person VALUES (9,last19,first29,addr39,0009,9@company.com);

user@ubuntu:~# tail file.sql
INSERT INTO Person VALUES (999990,last1000000,first1000010,addr1000020,000999990,999990@company.com);
INSERT INTO Person VALUES (999991,last1000001,first1000011,addr1000021,000999991,999991@company.com);
INSERT INTO Person VALUES (999992,last1000002,first1000012,addr1000022,000999992,999992@company.com);
INSERT INTO Person VALUES (999993,last1000003,first1000013,addr1000023,000999993,999993@company.com);
INSERT INTO Person VALUES (999994,last1000004,first1000014,addr1000024,000999994,999994@company.com);
INSERT INTO Person VALUES (999995,last1000005,first1000015,addr1000025,000999995,999995@company.com);
INSERT INTO Person VALUES (999996,last1000006,first1000016,addr1000026,000999996,999996@company.com);
INSERT INTO Person VALUES (999997,last1000007,first1000017,addr1000027,000999997,999997@company.com);
INSERT INTO Person VALUES (999998,last1000008,first1000018,addr1000028,000999998,999998@company.com);
INSERT INTO Person VALUES (999999,last1000009,first1000019,addr1000029,000999999,999999@company.com);

Next we will run the script file for our database. We'll use mysql command line client on Ubuntu on this demo.

user@ubuntu:~# mysql -u root -p -h localhost testing < file.sql
Enter password:

With this script root user was used, password will be checked (parameter -p), host (parameter -h) is localhost meaning that database is running on local computer and database called testing is used. Running this script took around 30 minutes to complete.

After the successful import of the data we want to check that there is adequate amount of data in the database.

MariaDB [testing]> SELECT COUNT(*) FROM Person;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.423 sec)

Let's run our first test by selecting one row from lastname column which has not been indexed.

MariaDB [testing]> SELECT lastname FROM Person where lastname = "last255555";
+------------+
| lastname   |
+------------+
| last255555 |
+------------+
1 row in set (0.593 sec)

As can be seen from the example it took a bit over half a second to retrieve the result. Now let's create an index for this column.

MariaDB [testing]> CREATE INDEX i_lastname ON Person(lastname);
Query OK, 0 rows affected (1.480 sec)
Records: 0  Duplicates: 0  Warnings: 0

The existence of this index can now be verified using SHOW INDEX command.

MariaDB [testing]> SHOW INDEX FROM Person;
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Person |          0 | PRIMARY    |            1 | ID          | A         |     1000000 |     NULL | NULL   |      | BTREE      |         |               |
| Person |          0 | email      |            1 | email       | A         |     1000000 |     NULL | NULL   | YES  | BTREE      |         |               |
| Person |          1 | i_lastname |            1 | lastname    | A         |     1000000 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

As can be seen from the output, i_lastname has been added as a non-unique index for the Person table. Let's run the previous query again to see the effect of the index.

MariaDB [testing]> SELECT lastname FROM Person WHERE lastname = "last255555";
+------------+
| lastname   |
+------------+
| last255555 |
+------------+
1 row in set (0.003 sec)

The performance increase is really noticeable with this as the query retrieval took previously 0.593 seconds and now only 0.003 seconds.

Regular Expression

  • Regular expression makes it possible to execute more complex queries against the SQL database.
  • Regular expression is used with REGEXP clause followed by the string pattern.
  • Wildcards can be utilized as a part of the pattern.
  • Possible wildcards are presented in the table below.
Wildcard Description
% Zero or more characters
_ One character
[character list] List of characters
[^character list] or [!character list] List of characters that will not be included
  • Two tables are presented below that describe some of the wildcards that can be used with REGEXP.
Wildcard Description
* Zero or more characters (similar to %)
+ Search string with a length of one or more characters
^ Start of the string
$ End of the string
. Any individual character (similar to _)
[...] Any character or character range inside square brackets (for example a-z, A-Z or 0-9)
[^...] Any character not listed inside square brackets
  • In addition, the type of character can be taken into account.
  • Possible options for this are shown in the table below.
Wildcard Description
[[:alnum:]] Any letter or number (same as [a-z], [A-Z] and [0-9])
[[:alpha:]] Any letter
[[:blank:]] Space or tab
[[:cntrl:]] ASCII Control Character (see list here)
[[:digit:]] Any number
[[:graph:]] Any character except space
[[:lower:]] Any small letter ([a-z])
[[:print:]] Any printable character
[[:punct:]] Punctuations not belonging to [[:cntrl:]] or [[:alnum:]] groups
[[:space:]] Any white space character (space, tab, new line ...)
[[:upper:]] Any uppercase letter ([A-Z])
[[:xdigit:]] Any hexadecimal value ([A-F], [a-f] and [0-9])
  • Below are some examples of using REGEXP clause as a part of SELECT statement.

Example 1

Select employees whose firstname includes a string 'sa'.

SELECT * FROM EMPS WHERE gname REGEXP 'sa' ORDER BY gname;

Example 2

Select employees whose firstname starts with letter a, b, c or d.

SELECT * FROM EMPS WHERE gname REGEXP '^[abcd]' ORDER BY gname;

Example 3

Select employees whose firstname does not start with letters m, n, o and p.

SELECT * FROM EMPS WHERE gname REGEXP '^[^m-p]' ORDER BY gname;

Example 4

Select employees whose firstname has 'a' as second last letter and last letter between m-z.

SELECT * FROM EMPS WHERE gname REGEXP 'a[m-z]$' ORDER BY gname;

Example 5

Select employees whose address has number as a type of the third letter.

SELECT * FROM EMPS WHERE address REGEXP '^..[[:digit:]]' ORDER BY address;

Composite key

  • Composite key makes it possible to create one primary key from two or more key values.
  • Composite key can be created also using key values from other tables.
  • Below is an example where customer bank account ownership is formed from customerID and account number.

Example

Customer bank account data needs to be stored into the database. This example has the following tables:

  • Customer: Bank customers
  • Bank_account: Bank accounts
  • Ownership: Information about who has the ownership for the bank account
Composite key example

In order to insert a new row into the Ownership table, both CustomerID and Account_number forming the composite key must exist. The creation of Customer, Bank_account and Ownership tables is executed as presented below.

Customer table
CREATE TABLE Customer (
  CustomerID INT NOT NULL AUTO_INCREMENT,
  Firstname VARCHAR(30),
  Lastname VARCHAR(50),
  Address VARCHAR(30),
  Postal_code INT,
  City VARCHAR(30),
  Email VARCHAR(50),
  Phone VARCHAR(12),
  PRIMARY KEY (CustomerID)
);
Bank account
CREATE TABLE Bank_account (
  Account_number VARCHAR(20) NOT NULL,
  Type CHAR(1),
  Conditions VARCHAR(1000),
  PRIMARY KEY (Account_number)
);
Ownership
CREATE TABLE Ownership (
CustomerID INT NOT NULL,
Account_number VARCHAR(20) NOT NULL,
Creation_date TIMESTAMP,
Additional_information VARCHAR(2000),
PRIMARY KEY (CustomerID,Account_number),
UNIQUE INDEX (CustomerID,Account_number),
CONSTRAINT fk_customerID FOREIGN KEY (CustomerID)
REFERENCES Customer(CustomerID),
CONSTRAINT fk_account_number FOREIGN KEY (Account_number)
REFERENCES Bank_account(Account_number)
);

Next we'll add example data for all three tables.

INSERT INTO Customer (firstname, lastname, address, postal_code, city, email, phone)
VALUES ('Jaakko','Seppänen','Luhtatie 3 B 10','10900','Hanko','jaakko.seppanen@company.com','+3581111111'),
('Jyrki','Malinen','Puistokatu 10','10900','Hanko','jykke@company.com','+3582222222'),
('Salla','Lehtonen','Matarakatu 1 A 1','40100','Jyväskylä','salla.lehtonen@company.com','+3584444222');
INSERT INTO Bank_account (account_number, type, conditions)
VALUES ('FI201010103345','B','Savings account. Each withdraw will be charged.'),
('FI202020111111','A','Deposit account. Three free withdraws allowed for each month.'),
('FI202020323232','A','Deposit account. Three free withdraws allowed for each month.');
INSERT INTO Ownership (customerID, account_number, creation_date, additional_information)
VALUES (1,'FI202020111111','2016-03-15 10:36:45','Deposit account of Jaakko'),
(2,'FI202020323232','2016-03-15 11:12:12','Deposit account of Jyrki'),
(3,'FI201010103345','2016-03-15 12:12:12','Savings account of Salla');

Let's verify the content of Ownership table at this point.

SELECT * FROM Ownership;
+------------+----------------+---------------------+---------------------------+
| CustomerID | Account_number | Creation_date       | Additional_information    |
+------------+----------------+---------------------+---------------------------+
|          1 | FI202020111111 | 2016-03-15 10:36:45 | Deposit account of Jaakko |
|          2 | FI202020323232 | 2016-03-15 11:12:12 | Deposit account of Jyrki  |
|          3 | FI201010103345 | 2016-03-15 12:12:12 | Savings account of Salla  |
+------------+----------------+---------------------+---------------------------+

As can be seen from the output, all three rows are stored in the joint table. Each combination of CustomerID and Account_number should be unique meaning that one customer cannot have same account number stored twice. In other words, CustomerID and Account_number columns are now considered as the composite key. We can verify this by trying to insert a new row with existing combination of CustomerID and Account_number column values.

INSERT INTO Ownership (CustomerID,account_number,additional_information) VALUES (3,'FI201010103345','Another savings account of Salla');
ERROR 1062 (23000): Duplicate entry '3-FI201010103345' for key 'PRIMARY'

When trying to insert this row, error message is returned indicating that primary key '3-FI201010103345' already exists in Ownership table. In practice this key value is a combination of CustomerID value of '3' and Account_number value of 'FI201010103345'.

Database system commands

  • There are many useful system commands that can be used to retrieve information from database structure.
  • This information is retrieved with queries like any other data.
  • Below is a table describing some useful system commands.
Command Description
SHOW TABLES Database tables
SHOW COLUMNS FROM Table columns
SHOW INDEX Table indexes
SHOW STATUS Information about the status of the database server
SHOW TABLE STATUS Technical information about the tables in database
DESCRIBE Table columns
SHOW CREATE TABLE CREATE TABLE query of the table