Data Types

Choosing the correct data type can lead to a better performing database, for example comparing numeric types takes less time than comparing character strings types because character string types have character set and collation considerations, also the smaller the data the faster it will be processed and less I/O is used making queries perform even better.

MySQL includes many of the ISO SQL 2003 standard data types and adds more data types of its own, the data types can be categorized by the following

I am only going to give a quick discussion on each as it is best to refer to the latest MySQL documentation or to a specific version that you have, I will try and give examples but I do not have examples for every possibility.

MySQL Tables

Before we begin a quick review on how to create a table in MySQL, which can come in a number of flavors

Standard table create table table_name ( ... );
Temporary table create temporary table_name ( ... );
Memory table create table table_name ( ... ) ENGINE = MEMORY;

You can also display how the table was created

Display table construction show create table table_name;

Just some things that you should be aware of regarding tables

Lastly on tables is the contraints which I discuss in my indexes section.

Data Types

We start with character types, I have already discussed data types in my Oracle data types section if you wish to compare.

String Type Character string type
Size

Example

Fixed-width strings character(length)
create table string_type (
  fixwidth1 CHARACTER(10) NOT NULL DEFAULT '',
  fixwidth2 CHAR(10) NOT NULL DEFAULT'',
  fixwidth3 CHAR(10) CHARACTER SET utf16 COLLATE utf16_general_ci,
  varilength1 CHARACTER VARYING(10) NOT NULL DEFAULT '',
  varilength2 CHAR VARYING(10) NOT NULL DEFAULT '',
  varilength3 VARCHAR(10) NUL NULL DEFAULT ''
);
char(length)
0-255
variable-length strings character varying(length)
char varying(length)
varchar(length)
0-65,535
tinytext
0-255
 
text
0-64Kb
 
meduimtext
0-16Mb
 
longtext
0-4Gb
 
Character objects character largeobject
 
char large object
 
CLOB
 

Remember that the length of the string does not mean the size of the string as this depends on the character set used, for example latin1 character set uses one byte to store each character whereas the utf8 character set uses up to four bytes per character, double check the MySQL version that you are using as the number of bytes may vary and could have an impact on the size of your database. The system parameter pad_char_to_full_length changes the behavior of retrieving fixed-width strings from a table, if set the string is retrieved is the full length of the field.

find the length of a string and checking the padding select concat('/',fixwidth1,'/'), char_length(fixwidth1) from string_type;

National character sets are pretty much the same as above except that the character set uses the utf8

String Type Character string type
Size

Example

Fixed-width strings national character(length)
create table string_type (
  fixwidth1 NATIONAL CHARACTER(10) NOT NULL DEFAULT '',
  fixwidth2 NCHAR(10) NOT NULL DEFAULT'',
  varilength1 NCHAR(10) NOT NULL DEFAULT '',
);
national char(length)  
nchar(length)
0-255
variable-length strings national character varying(length)
national char varying(length)
nchar(length)
0-65,535
Character objects national character largeobject
 
nchar large object
 
NCLOB
 

Binary large objects is the least restrictive data type, they are byte strings and are ordered by each bytes value.

Binary type
Size

Example

binary 0-255 create table string_type (
  bin1 BINARY(100) NOT NULL DEFAULT '',
  blob1 BLOB(100) NOT NULL DEFAULT'',
);
varbinary 0-65,532
tinyblob 0-255
blob 0-65,532
meduimblob 0-16Mb
longblob 0-4Gb

Numeric types is probably the most vast, as there are many different types, these types all affect data precision and depending on the application type this category is the one that you will send more time on, I will categorize them in exact number type and approximate numeric type

Numeric Type Type signed range unsigned range
Size (bytes)
Example
Exact numeric(g,f) maximum limits depend on the hardware and O/S
max for g = 65, max for f = 30
depends
create table num_test (
  bt1 BIT(10),
  fl1 FLOAT(10),
  dl DECIMAL(5,4),
  uty1 TINYINT(10) unsigned NOT NULL AUTO_INCREMENT
);
decimal(g,f) maximum limits depend on the hardware and O/S
max for g = 65, max for f = 30
depends
smallint -32,768 to 32,767 0 to 65,535
2
integer

-2,147,483,648 to
2,147,483,647

0 to 4,294,967,295
4
bigint -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615
8
tinyint -128 to 127 0 to 255
1
meduimint -8,388,608 to 8,388,607 0 to 16,777,215
3
bit(x)
1 to 64
1-64
serial
alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE KEY
n/a
approximate float(p) maximum limits depend on the hardware and O/S
max for p = 24
4
real
alias for DOUBLE or FLOAT
n/a
double(g,f) maximum limits depend on the hardware and O/S
max for g = 53
max for f = 30
8

Boolean data types have two options the standard BOOLEAN and the MySQL version called BOOL which is an alias for TINYINT(1);

Datetime types have the following, you can specify any of the valid dates as below

you can also put microseconds in as well

Datetime type Range
Size (bytes)
Zero value
date '1000-01-01' to '9999-12-31'
3
'0000-00-00'
datetime '1000-01-01 00:00:00' to '9999-12-31 23:59:59'
8
'0000-00-00 00:00:00'
timestamp '1970-01-01 00:00:00' to '2038-01-18 22:14:07'
4
'0000-00-00 00:00:00'
time '-838:59:59' to '838:59:59'
3
'00:00:00'
year(2) 00 to 99
1
'00'
year(4) 1901 to 2155
1
'0000'

You need to be aware of a number of date, time and numeric conversion issues, here I point you to the MySQL documentation.

Lastly we come to the interval data types, these strictly speaking are not data types that can be stored in a table, they are used as part of an expression in a time based calculation.

Interval expressions

# below would translate to '2012-04-08'
select date_add('2012-03-08', interval 1 year_month) as test;

# below would translate to 2017-07-08
select date_add('2012-03-08', interval '5-4' year_month) as test;

Interval data types
Name
Format
Example
microsecond
n
INTERVAL 5 MICROSECOND
second
n
INTERVAL 5 SECOND
minute
n
INTERVAL 5 MINUTE
hour
n
INTERVAL 5 HOUR
day
n
INTERVAL 5 DAY
week
n
INTERVAL 5 WEEK
month
n
INTERVAL 5 MONTH
quarter
n
INTERVAL 5 QUARTER
year
n
INTERVAL 5 YEAR
second_microsecond
'n.n'
INTERVAL '5.4' SECOND_MICROSECOND
minute_microsecond
'n.n'
INTERVAL '5.4' MINUTE_MICROSECOND
minute_second
'n:n'
INTERVAL '5:4' MINUTE_SECOND
hour_microsecond
'n.n'
INTERVAL '5.4' HOUR_MICROSECOND
hour_second
'n:n:n'
INTERVAL '5:4:3' HOUR_SECOND
hour_minute
'n:n'
INTERVAL '5:4' HOUR_MINUTE
day_microsecond
'n.n'
INTERVAL '5.4' DAY_MICROSECOND
day_second
'n n:n:n'
INTERVAL '5 4:3:2' DAY_SECOND
day_minute
'n n:n'
INTERVAL '5 4:3' DAY_MINUTE
day_hour
'n n'
INTERVAL '5 4' DAY_HOUR
year_month
'n-n'
INTERVAL '5-4' YEAR_MONTH

ENUM and SET types

MySQL has two data types that are very different to the above data types, they are similar to foreign keys in that the values inserted must be one of the field values set

ENUM this is a enumerated list of 1 to 65,535 strings which indicate the allowed values for the field, only one of the values can be stored in the list
SET this is a enumerated list of 1 to 64 strings which indicate the allowed values for the field, any combination of the strings in the enumerated list can be stored as a comma-delimited list.

All trailing spaces will be removed from enum and set data types, the system parameter sys_mode can affect the data types, see below for more information on sql modes.

sql_mode system parameter

 

This parameter affects both the enum and set data types, if the parameter is set to any of the below then an error will occur if a duplicate entry during a field creation, otherwise you will get a warning

  • TRADITIONAL
  • STRICT_ALL_TABLES
  • STRICT_TRANS_TABLE

here are some examples on how to use these data types

ENUM example

Example One
=====================================================
CREATE TABLE Countries (
  name char(30),
  continent ENUM('Asia', 'Europe', 'North America', 'Africa', 'Oceania','Antarctica', 'South America')
);

INSERT INTO Countries (name,continent) VALUES('Kenya','Africa');
INSERT INTO Countries (name,continent) VALUES('England','Europe');
INSERT INTO Countries (name,continent) VALUES('Brazil','South America');

select name,continent,continent+0 'Index position' from countries;

Example Two
=====================================================
CREATE TABLE Bike
(
  ID SMALLINT UNSIGNED,
  Model VARCHAR(40),
  Color ENUM('red', 'blue', 'green', 'yellow'),
  Options SET('rack', 'light', 'helmet', 'lock')
);

insert into bike values ('0001', 'chopper', 'red', 'rack,light');
insert into bike values ('0002', 'tomahawk', 'blue', 'lock,helmet,rack');
insert into bike values ('0003', 'grifter', 'green', 'light,helmet,rack');
insert into bike values ('0004', 'chopper', 'red', 'rack,light');
insert into bike values ('0005', 'grifter', 'red', 'rack,light,helmet,lock');

select id, model, color, options from bike;

SET example CREATE TABLE allergy (symptom SET('sneezing','runny nose','stuffy head','red eyes') );

INSERT INTO allergy (symptom) VALUES('');
INSERT INTO allergy (symptom) VALUES('stuffy head');
INSERT INTO allergy (symptom) VALUES('sneezing,red eyes');
INSERT INTO allergy (symptom) VALUES('sneezing,stuffy head,red eyes');

select symptom,symptom+0 'Index position' from allergy;

There are number of things that you should understand about these data types

Not many developers use enum and set mainly because they are hard to alter and do require table rebuilds if altered, most developers will use foreign keys but I reference them here so that you know that you have other options.

Finally the below is a quick table detailing what attributes a data type can use, also check the MySQL documentation of the version you are using.

Data Type Attributes that can be used
Character Strings NOT NULL, NULL, DEFAULT and BINARY
National Character Strings NOT NULL, NULL, DEFAULT and BINARY
Binary NOT NULL, NULL and BINARY (only binary can use this)
Numeric data NOT NULL, NULL, DEFAULT, AUTO_INCREMENT, SIGNED, UNSIGNED, ZEROFILL, SERIAL DEFAULT VALUE
DateTime NOT NULL, NULL and DEFAULT
Enum and Sets NOT NULL, NULL and DEFAULT

SQL Modes

MySQL has a system parameter that allows you to change the default behavior of data validation and allowed SQL syntax, it can be set on a global level or on a session level, by default the sql_mode is set to '' (blank).

sql_mode

# You can set it at a global or sessions level

show global variables like 'sql_mode';
show session variables like 'sql_mode';

SET GLOBAL sql_mode=<value>,<value>,<value>....
SET SESSION sql_mode=<value>,<value>,<value>....

# There are sort of four groups to choose from

    Getting rid of silent features, silent conversions and silently allowing invalid data

  • ALLOW_INVALID_DATES - any date and time is allowed
  • ERROR_FOR_DIVISION_BY_ZERO - division or modulo 0 functions return NULL with no warnings
  • NO_AUTO_CREATE_USER - a GRANT statement only creates a new user automatically if a non-empty password is specified
  • NO_AUTO_VALUE_ON_ZERO - a numeric data type with the AUTO_INCREMENT property will issue the next number in the sequence if 0 or a NULL is inserted
  • NO_ENGINE_SUBSTITION - an ALTER TABLE or CREATE TABLE that specifies a disabled or unavailable storage engine throws an error
  • NO_ZERO_DATE - a warning is generated if a date field has a zero date inserted or updated
  • NO_ZERO_IN_DATE - partial or total zero dates are allowed
  • STRICT_ALL_TABLES - invalid data values are reject in all tables, an error is thrown
  • STRICT_TRANS_TABLES - invalid data values are rejected in transactional tables only, an error is thrown

    Other changes to default behavior
  • HIGH_NOT_PROCEDENCE - the NOT operator has a higher precedence
  • IGNORE_SPACE - allows a space between a built-in function and the open parenthesis
  • NO_BACKSLASH_ESCAPES - the backslash becomes a ordinary character
  • NO_DIR_IN_CREATE - ignores options DATA DICTIONARY and INDEX DICTIONARY when using the CREATE TABLE statement
  • PAD_CHAR_TO_FULL_LENGTH - the behavior of retrieving a fixed-width string from a table is changed

    Portability
  • ANSI QUOTES - a double quote is now treated as an identifier quote
  • NO_FIELD_OPTIONS - the output of SHOW CREATE TABLE will not display any field definition options and properties
  • NO_KEY_OPTIONS - the output of SHOW CREATE TABLE will not display any index definition options and properties
  • NO_TABLE_OPTIONS - the output of SHOW CREATE TABLE will not display any table definition options and properties
  • NO_UNSIGNED_SUBSTRACTION - the result of a subtraction is always a signed numeric value
  • ONLY_FULL_GROUP_BY - a GROUP BY query requires the fields in the SELECT and HAVING clauses to contain only aggregated fields and the fields in the GROUP BY clause
  • PIPES_AS_CONCAT - || is set as an alias for OR
  • REAL_AS_FLOAT - REAL is an alias for DOUBLE

    Act like other database systems (this are self explanatory)
  • ANSI
  • DB2
  • MAXDB
  • MSSQL
  • MYSQL323
  • MYSQL40
  • ORACLE
  • POSTGRESQL
  • TRADITIONAL

By default MySQL will allow you to enter invalid data and will change it to make it valid, this of course means that is may invalidate data integrity, lets see an example

invalid data entry

create table test_sql_mode (
   sm1 tinyint not null
);

## Lets enter some valid data
insert into test_sql_mode (sm1) values (126);
insert into test_sql_mode (sm1) values (127);

## Lets enter some invalid data, remember a tinyint values are -128 to 127
insert into test_sql_mode (sm1) values (128);

## Did we get any warning
show warnings;

## Lets see whats in the table, as you can see the last value we inserted was truncated
select * from test_sql_mode

As you can see in the example above, values that are too large to fit in a field are truncated to the closest value that does fit, similarly for string data if they are too large. So what you need MySQL is to throw an error instead of a warning this were the sql_mode comes into play.

Using NULL

You have to be careful when using NULL values in tables, something's to beware of are

Remember that NULL means no value has been determined it does not mean zero or an empty field, it means unknown.

Optimal Data Types

Using the PROCEDURE ANALYSE statement you can obtain the table information regarding used space on each field, this will be displayed in the optimal_fieldtype field of the output, you may then want to reduce certain columns to optimize them, when analyzing tables make sure that you have a good sample size.

optimal data types

# The PROCEDURE ANALYSE takes two options

  • The first argument is the maximum numbers of values to consider
  • The second argument is the amount of memory in bytes to use per field to determine an optimal data type

## Lets create a simple table using a high value for varchar thus we will be wasting space
create table accounts (
  accountid int,
  accountname varchar(255)
);

## Now insert some data, the data will be well within the limits of the varchar
insert into accounts values (1,'Company one LTD');
insert into accounts values (2,'Company two LTD');
insert into accounts values (3,'Company three LTD');
insert into accounts values (4,'Company four LTD');
insert into accounts values (5,'Company five LTD');
insert into accounts values (6,'Company six LTD');
insert into accounts values (7,'Company seven LTD');
insert into accounts values (8,'Company eight LTD');
insert into accounts values (9,'Company nine LTD');
insert into accounts values (10,'Company ten LTD');

insert into accounts values (11,'top shop inc 11');
insert into accounts values (12,'top shop inc 12');
insert into accounts values (13,'top shop inc 13');
insert into accounts values (14,'top shop inc 14');
insert into accounts values (15,'top shop inc 15');
insert into accounts values (16,'top shop inc 16');
insert into accounts values (17,'top shop inc 17');
insert into accounts values (18,'top shop inc 18');
insert into accounts values (19,'top shop inc 19');
insert into accounts values (20,'top shop inc 20');

insert into accounts values (21,'top shop ltd');
insert into accounts values (22,'top shop plc');
insert into accounts values (23,'top shop inc');
insert into accounts values (24,'top shop partners');
insert into accounts values (25,'top shop corp');
insert into accounts values (26,'bottom shop ltd');
insert into accounts values (27,'bottom shop plc');
insert into accounts values (28,'bottom shop partners');

## Lets insert some NULL's
insert into accounts values (29,NULL);
insert into accounts values (30,NULL);

## Lets insert some empty fields
insert into accounts values (31,'');
insert into accounts values (32,'');

## Now run the procedure analyse and see what we get, from the output we can see the NULL,s and the
## empties fields, also the analyzer has picked up that the size of the accountname fields is too large
## and recommended's we shrink the size (see the Optimal_fieldtype). Normally I would add on a few bytes
## to make sure just in case a longer company name comes along. Other information is the min and max
## values

## One note is that if you increase the first argument to the procedure analyze eventually MySQL
## recommended's that you use a enum data type, this is because you are asking for more values to
## consider


select * from accounts procedure analyse(5,256) \G

*************************** 1. row ***************************
Field_name: datadisk.accounts.accountid
Min_value: 1
Max_value: 32
Min_length: 1
Max_length: 2
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 16.5000
Std: 9.2331
Optimal_fieldtype: TINYINT(2) UNSIGNED NOT NULL

*************************** 2. row ***************************
Field_name: datadisk.accounts.accountname
Min_value: bottom shop ltd
Max_value: top shop plc
Min_length: 12
Max_length: 20
Empties_or_zeros: 2
Nulls: 2
Avg_value_or_avg_length: 14.1667
Std: NULL
Optimal_fieldtype: VARCHAR(20)
2 rows in set (0.03 sec)

This tool can be useful if you are planning to migrate to a new system and want to reduce table column sizes so that not to waste space, if you already have a system you can always alter the table providing it is not to large or you have an enough time to alter the table. Be careful when using the procedure analyse function it does not always get it right, sometimes you are the best person to know what is right for your data optimization as you would know the data.

Table Maintenance

There are a number of tasks that you can perform on an tables and indexes which are available in MySQL

check table

check table tablename [option]

## You can use one of the following options

  • EXTENDED - performs a full lookup on all keys and indexes, checking for 100% consistency
  • MEDIUM - for evey table calculate a checksum for the indexes on each data row, comparing the final result to the checksum of the index rows, also verify that deleted liniks are valid
  • CHANGED - only check a table if it has changed since the last time is was checked, or of the table was not closed properly, if checks are performed they are the same as MEDUM
  • FAST - only check if the table was not closed properly, if checks are performed they are the same a MEDIUM
  • QUICK - for evey table calculate a checksum for the indexes on each data row, comparing the final result to the checksum of the index rows, without the verifcation for deleted links
  • FOR UPGRADE - checks to see if the table is out of date due to a server upgrade, a MEDIUM check will be performed
repair table

repair table tablename [option]

## You can use one of the following options

  • QUICK - only repair of the index tree is attempted
  • EXTENDED - instead of attempting to fix indexes by doing a REPAIR BY SORT on one entire index at a time, the index is rebuilt one row at a time
  • USE_FRM - uses the .frm file to rebuilt the index, disregarding the existing .MYI index file
checksum table

checksum table tablename [option]

## You can use one of the following options

  • QUICK - returns the live checksum if supported by the table, otherwise returns NULL
  • EXTENDED - calculates a checksum of the table even if the live checksum is supported
analyze table analyze table tablename
optimize table optimize table tablename