Menu

Spark SQL analyzes Nginx access logs

{{ product.price_format }}
{{ product.origin_price_format }}
Quantity:
SKU:{{ product.sku }}
Model: {{ product.model }}

{{ variable.name }}

{{ value.name }}

Introduction: In this lesson, the core knowledge of Spark SQL will be systematically explained, and in a practical way, practical examples of Nginx access logs will be analyzed

Chapter 1 Course Introduction

This chapter introduces the position and role of Spark SQL in the Spark ecosystem, allowing everyone to have a general outline of the overall knowledge framework of Spark SQL.

 

1-1 Course Introduction

Chapter 2 Understanding Spark SQL

This chapter introduces the operating principle of Spark SQL, the mutual transformation and application scenarios of DataFrame, DataSet and RDD, and explains the usage of Parquet.

 

2-1 Spark SQL and Hive

 

2-2 Operating Principles of Spark SQL

 

2-3 The Connection between DataFrame, DataSet and RDD

 

2-4 Parquet columnar storage

Chapter 3 Common Operations of DataFrame/DataSet

This chapter introduces the filtering, grouping, sorting operations of DataFrame, as well as the addition, deletion, and modification of column values, and the optimization of Join.

 

3-1 General Operations: Search and Filter

 

3-2 Aggregation Operations: groupBy and agg

 

3-3 Single Table Operations: Column Addition, deletion, modification and null value handling

 

3-4 Multi-table Operations: join (1)

 

3-5 Multi-table Operations: join (2)

Chapter 4 Custom Functions and Windowing Functions

This chapter explains the use of custom functions and the application of windowing functions in group calculation of TopN.

 

4-1 Custom Function: UDF

 

4-2 Custom Aggregation Function: UDAF

 

4-3 Windowing Function: row_number()

Chapter 5: Comprehensive Practical Analysis of Nginx Access Logs

This chapter will explain the project of analyzing Nginx access logs using Spark SQL. It includes data cleaning, storage, monitoring and optimization.

 

5-1 Project Scenario Introduction and Analysis

 

5-2 First data cleaning: Format the original log data

 

5-3 Second data cleaning: Parse the data and store it in Parquet format on a daily basis

 

5-4 Batch write the analysis results into MySql

 

5-5 Performance Monitoring and Optimization

Chapter 6 Course Summary

This chapter will summarize all the knowledge of this course in the form of a mind map and highlight the key points again.

 

6-1 Course Summary