Title Page
Title Page
Copyright
Copyright
MySQL 8 for Big Data
MySQL 8 for Big Data
Credits
Credits
About the Authors
About the Authors
About the Reviewers
About the Reviewers
www.PacktPub.com
www.PacktPub.com
Why subscribe?
Why subscribe?
Customer Feedback
Customer Feedback
Preface
Preface
What this book covers
What this book covers
What you need for this book
What you need for this book
Who this book is for
Who this book is for
Conventions
Conventions
Reader feedback
Reader feedback
Customer support
Customer support
Downloading the example code
Downloading the example code
Downloading the color images of this book
Downloading the color images of this book
Errata
Errata
Piracy
Piracy
Questions
Questions
Introduction to Big Data and MySQL 8
Introduction to Big Data and MySQL 8
The importance of Big Data
The importance of Big Data
Social media
Social media
Politics
Politics
Science and research
Science and research
Power and energy
Power and energy
Fraud detection
Fraud detection
Healthcare
Healthcare
Business mapping
Business mapping
The life cycle of Big Data
The life cycle of Big Data
Volume
Volume
Variety
Variety
Velocity
Velocity
Veracity
Veracity
Phases of the Big Data life cycle
Phases of the Big Data life cycle
Collect
Collect
Store
Store
Analyze
Analyze
Governance
Governance
Structured databases
Structured databases
Basics of MySQL
Basics of MySQL
MySQL as a relational database management system
MySQL as a relational database management system
Licensing
Licensing
Reliability and scalability
Reliability and scalability
Platform compatibility
Platform compatibility
Releases
Releases
New features in MySQL 8
New features in MySQL 8
Transactional data dictionary
Transactional data dictionary
Roles
Roles
InnoDB auto increment
InnoDB auto increment
Supporting invisible indexes
Supporting invisible indexes
Improving descending indexes
Improving descending indexes
SET PERSIST
SET PERSIST
Expanded GIS support
Expanded GIS support
The default character set
The default character set
Extended bit-wise operations
Extended bit-wise operations
InnoDB Memcached
InnoDB Memcached
NOWAIT and SKIP LOCKED
NOWAIT and SKIP LOCKED
Benefits of using MySQL
Benefits of using MySQL
Security
Security
Scalability
Scalability
An open source relational database management system
An open source relational database management system
High performance
High performance
High availability
High availability
Cross-platform capabilities
Cross-platform capabilities
Installing MySQL 8
Installing MySQL 8
Obtaining MySQL 8
Obtaining MySQL 8
MySQL 8 installation
MySQL 8 installation
MySQL service commands
MySQL service commands
Evolution of MySQL for Big Data
Evolution of MySQL for Big Data
Acquiring data in MySQL
Acquiring data in MySQL
Organizing data in Hadoop
Organizing data in Hadoop
Analyzing data
Analyzing data
Results of analysis
Results of analysis
Summary
Summary
Data Query Techniques in MySQL 8
Data Query Techniques in MySQL 8
Overview of SQL
Overview of SQL
Database storage engines and types
Database storage engines and types
InnoDB
InnoDB
Important notes about InnoDB
Important notes about InnoDB
MyISAM
MyISAM
Important notes about MyISAM tables
Important notes about MyISAM tables
Memory
Memory
Archive
Archive
Blackhole
Blackhole
CSV
CSV
Merge
Merge
Federated
Federated
NDB cluster
NDB cluster
Select statement in MySQL 8
Select statement in MySQL 8
WHERE clause
WHERE clause
Equal To and Not Equal To
Equal To and Not Equal To
Greater than and Less than
Greater than and Less than
LIKE
LIKE
IN/NOT IN
IN/NOT IN
BETWEEN
BETWEEN
ORDER BY clause
ORDER BY clause
LIMIT clause
LIMIT clause
SQL JOINS
SQL JOINS
INNER JOIN
INNER JOIN
LEFT JOIN
LEFT JOIN
RIGHT JOIN
RIGHT JOIN
CROSS JOIN
CROSS JOIN
UNION
UNION
Subquery
Subquery
Optimizing SELECT statements
Optimizing SELECT statements
Insert, replace, and update statements in MySQL 8
Insert, replace, and update statements in MySQL 8
Insert
Insert
Update
Update
Replace
Replace
Transactions in MySQL 8
Transactions in MySQL 8
Aggregating data in MySQL 8
Aggregating data in MySQL 8
The importance of aggregate functions
The importance of aggregate functions
GROUP BY clause
GROUP BY clause
HAVING clause
HAVING clause
Minimum
Minimum
Maximum
Maximum
Average
Average
Count
Count
Sum
Sum
JSON
JSON
JSON_OBJECTAGG
JSON_OBJECTAGG
JSON_ARRAYAGG
JSON_ARRAYAGG
Summary
Summary
Indexing your data for High-Performing Queries
Indexing your data for High-Performing Queries
MySQL indexing
MySQL indexing
Index structures
Index structures
Bitmap indexes
Bitmap indexes
Sparse indexes
Sparse indexes
Dense indexes
Dense indexes
B-Tree indexes
B-Tree indexes
Hash indexes
Hash indexes
Creating or dropping indexes
Creating or dropping indexes
UNIQUE | FULLTEXT | SPATIAL
UNIQUE | FULLTEXT | SPATIAL
Index_col_name
Index_col_name
Index_options
Index_options
KEY_BLOCK_SIZE
KEY_BLOCK_SIZE
With Parser
With Parser
COMMENT
COMMENT
VISIBILITY
VISIBILITY
index_type
index_type
algorithm_option
algorithm_option
lock_option
lock_option
When to avoid indexing
When to avoid indexing
MySQL 8 index types
MySQL 8 index types
Defining a primary index
Defining a primary index
Primary indexes
Primary indexes
Natural keys versus surrogate keys
Natural keys versus surrogate keys
Unique keys
Unique keys
Defining a column index
Defining a column index
Composite indexes in MySQL 8
Composite indexes in MySQL 8
Covering index
Covering index
Invisible indexes
Invisible indexes
Descending indexes
Descending indexes
Defining a foreign key in the MySQL table
Defining a foreign key in the MySQL table
RESTRICT
RESTRICT
CASCADE
CASCADE
SET NULL
SET NULL
NO ACTION
NO ACTION
SET DEFAULT
SET DEFAULT
Dropping foreign keys
Dropping foreign keys
Full-text indexing
Full-text indexing
Natural language fulltext search on InnoDB and MyISAM
Natural language fulltext search on InnoDB and MyISAM
Fulltext indexing on InnoDB
Fulltext indexing on InnoDB
Fulltext search in Boolean mode
Fulltext search in Boolean mode
Differentiating full-text indexing and like queries
Differentiating full-text indexing and like queries
Spatial indexes
Spatial indexes
Indexing JSON data
Indexing JSON data
Generated columns
Generated columns
Virtual generated columns
Virtual generated columns
Stored generated columns
Stored generated columns
Defining indexes on JSON
Defining indexes on JSON
Summary
Summary
Using Memcached with MySQL 8
Using Memcached with MySQL 8
Overview of Memcached
Overview of Memcached
Setting up Memcached
Setting up Memcached
Installation
Installation
Verification
Verification
Using of Memcached
Using of Memcached
Performance tuner
Performance tuner
Caching tool
Caching tool
Easy to use
Easy to use
Analyzing data stored in Memcached
Analyzing data stored in Memcached
Memcached replication configuration
Memcached replication configuration
Memcached APIs for different technologies
Memcached APIs for different technologies
Memcached with Java
Memcached with Java
Memcached with PHP
Memcached with PHP
Memcached with Ruby
Memcached with Ruby
Memcached with Python
Memcached with Python
Summary
Summary
Partitioning High Volume Data
Partitioning High Volume Data
Partitioning in MySQL 8
Partitioning in MySQL 8
What is partitioning?
What is partitioning?
Partitioning types
Partitioning types
Horizontal partitioning
Horizontal partitioning
Vertical partitioning
Vertical partitioning
Horizontal partitioning in MySQL 8
Horizontal partitioning in MySQL 8
Range partitioning
Range partitioning
List partitioning
List partitioning
Hash partitioning
Hash partitioning
Column partitioning
Column partitioning
Range column partitioning
Range column partitioning
List column partitioning
List column partitioning
Key partitioning
Key partitioning
Sub partitioning
Sub partitioning
Vertical partitioning
Vertical partitioning
Splitting data into multiple tables
Splitting data into multiple tables
Data normalization
Data normalization
First normal form
First normal form
Second normal form
Second normal form
Third normal form
Third normal form
Boyce-Codd normal form
Boyce-Codd normal form
Fourth normal form
Fourth normal form
Fifth normal form
Fifth normal form
Pruning partitions in MySQL
Pruning partitions in MySQL
Pruning with list partitioning
Pruning with list partitioning
Pruning with key partitioning
Pruning with key partitioning
Querying on partitioned data
Querying on partitioned data
DELETE query with the partition option
DELETE query with the partition option
UPDATE query with the partition option
UPDATE query with the partition option
INSERT query with the partition option
INSERT query with the partition option
Summary
Summary
Replication for building highly available solutions
Replication for building highly available solutions
High availability
High availability
MySQL replication
MySQL replication
MySQL cluster
MySQL cluster
Oracle MySQL cloud service
Oracle MySQL cloud service
MySQL with the Solaris cluster
MySQL with the Solaris cluster
Replication with MySQL
Replication with MySQL
Benefits of replication in MySQL 8
Benefits of replication in MySQL 8
Scalable applications
Scalable applications
Secure architecture
Secure architecture
Large data analysis
Large data analysis
Geographical data sharing
Geographical data sharing
Methods of replication in MySQL 8
Methods of replication in MySQL 8
Replication using binary logs
Replication using binary logs
Replication using global transaction identifiers
Replication using global transaction identifiers
Replication configuration
Replication configuration
Replication with binary log file
Replication with binary log file
Replication master configuration
Replication master configuration
Replication slave configuration
Replication slave configuration
Replication with GTIDs
Replication with GTIDs
Global transaction identifiers
Global transaction identifiers
The gtid_executed table
The gtid_executed table
GTID master's side configurations
GTID master's side configurations
GTID slave's side configurations
GTID slave's side configurations
MySQL multi-source replication
MySQL multi-source replication
Multi-source replication configuration
Multi-source replication configuration
Statement-based versus row-based replication
Statement-based versus row-based replication
Group replication
Group replication
Requirements for group replication
Requirements for group replication
Group replication configuration
Group replication configuration
Group replication settings
Group replication settings
Choosing a single master or multi-master
Choosing a single master or multi-master
Host-specific configuration settings
Host-specific configuration settings
Configuring a Replication User and enabling the Group Replication Plugin
Configuring a Replication User and enabling the Group Replication Plugin
Starting group replication
Starting group replication
Bootstrap node
Bootstrap node
Summary
Summary
MySQL 8 Best Practices
MySQL 8 Best Practices
MySQL benchmarks and configurations
MySQL benchmarks and configurations
Resource utilization
Resource utilization
Stretch your timelines of benchmarks
Stretch your timelines of benchmarks
Replicating production settings
Replicating production settings
Consistency of throughput and latency
Consistency of throughput and latency
Sysbench can do more
Sysbench can do more
Virtualization world
Virtualization world
Concurrency
Concurrency
Hidden workloads
Hidden workloads
Nerves of your query
Nerves of your query
Benchmarks
Benchmarks
Best practices for MySQL queries
Best practices for MySQL queries
Data types
Data types
Not null
Not null
Indexing
Indexing
Search fields index
Search fields index
Data types and joins
Data types and joins
Compound index
Compound index
Shorten up primary keys
Shorten up primary keys
Index everything
Index everything
Fetch all data
Fetch all data
Application does the job
Application does the job
Existence of data
Existence of data
Limit yourself
Limit yourself
Analyze slow queries
Analyze slow queries
Query cost
Query cost
Best practices for the Memcached configuration
Best practices for the Memcached configuration
Resource allocation
Resource allocation
Operating system architecture
Operating system architecture
Default configurations
Default configurations
Max object size
Max object size
Backlog queue limit
Backlog queue limit
Large pages support
Large pages support
Sensitive data
Sensitive data
Restrict exposure
Restrict exposure
Failover
Failover
Namespaces
Namespaces
Caching mechanism
Caching mechanism
Memcached general statistics
Memcached general statistics
Best practices for replication
Best practices for replication
Throughput in group replication
Throughput in group replication
Infrastructure sizing
Infrastructure sizing
Constant throughput
Constant throughput
Contradictory workloads
Contradictory workloads
Write scalability
Write scalability
Summary
Summary
NoSQL API for Integrating with Big Data Solutions
NoSQL API for Integrating with Big Data Solutions
NoSQL overview
NoSQL overview
Changing rapidly over time
Changing rapidly over time
Scaling
Scaling
Less management
Less management
Best for big data
Best for big data
NoSQL versus SQL
NoSQL versus SQL
Implementing NoSQL APIs
Implementing NoSQL APIs
NoSQL with the Memcached API layer
NoSQL with the Memcached API layer
Prerequisites
Prerequisites
NoSQL API with Java
NoSQL API with Java
NoSQL API with PHP
NoSQL API with PHP
NoSQL API with Python
NoSQL API with Python
NoSQL API with Perl
NoSQL API with Perl
NDB Cluster API
NDB Cluster API
NDB API for NodeJS
NDB API for NodeJS
NDB API for Java
NDB API for Java
NDB API with C++
NDB API with C++
Summary
Summary
Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop
Case study: Part I - Apache Sqoop for exchanging data between MySQL and Hadoop
Case study for log analysis
Case study for log analysis
Using MySQL 8 and Hadoop for analyzing log
Using MySQL 8 and Hadoop for analyzing log
Apache Sqoop overview
Apache Sqoop overview
Integrating Apache Sqoop with MySQL and Hadoop
Integrating Apache Sqoop with MySQL and Hadoop
Hadoop
Hadoop
MapReduce
MapReduce
Hadoop distributed file system
Hadoop distributed file system
YARN
YARN
Setting up Hadoop on Linux
Setting up Hadoop on Linux
Installing Apache Sqoop
Installing Apache Sqoop
Configuring MySQL connector
Configuring MySQL connector
Importing unstructured data to Hadoop HDFS from MySQL
Importing unstructured data to Hadoop HDFS from MySQL
Sqoop import for fetching data from MySQL 8
Sqoop import for fetching data from MySQL 8
Incremental imports using Sqoop
Incremental imports using Sqoop
Loading structured data to MySQL using Apache Sqoop
Loading structured data to MySQL using Apache Sqoop
Sqoop export for storing structured data from MySQL 8
Sqoop export for storing structured data from MySQL 8
Sqoop saved jobs
Sqoop saved jobs
Summary
Summary
Case study: Part II - Real time event processing using MySQL applier
Case study: Part II - Real time event processing using MySQL applier
Case study overview
Case study overview
MySQL Applier
MySQL Applier
SQL Dump and Import
SQL Dump and Import
Sqoop
Sqoop
Tungsten replicator
Tungsten replicator
Apache Kafka
Apache Kafka
Talend
Talend
Dell Shareplex
Dell Shareplex
Comparison of Tools
Comparison of Tools
MySQL Applier overview
MySQL Applier overview
MySQL Applier installation
MySQL Applier installation
libhdfs
libhdfs
cmake
cmake
gcc
gcc
FindHDFS.cmake
FindHDFS.cmake
Hive
Hive
Real-time integration with MySQL Applier
Real-time integration with MySQL Applier
Organizing and analyzing data in Hadoop
Organizing and analyzing data in Hadoop
Summary
Summary