Python数据处理
透过 Python 让读者有能力处理数据,读者掌握数据表达的重要性,进而将数据以更浅显易懂的方式,透过视觉的方式来呈现数据所代表的特性。
第 1 章Python 基础
References
- Python 文档目录,https://docs.python.org/zh-cn/3.7/contents.html
- PEP 397 – Python launcher for Windows, https://www.python.org/dev/peps/pep-0397/
- 什么是 Python Launcher?, https://blog.csdn.net/wuShiJingZuo/article/details/103535381
- Getting Started with Python in VS Code, https://code.visualstudio.com/docs/python/python-tutorial
- Using Python environments in VS Code, https://code.visualstudio.com/docs/python/environments
- 浅拷贝与深拷贝,https://zhuanlan.zhihu.com/p/56741046
- 求最大公因数的几种方法,https://so.html5.qq.com/page/real/search_news?docid=70000021_7675e858c2215914
第 2 章Python 数据工具
2-02 数组索引与切片方法
References
- NumPy 教程,https://www.runoob.com/numpy/numpy-tutorial.html
- Python 3 教程,https://www.runoob.com/python3/python3-tutorial.html
- pandas documentation,https://pandas.pydata.org/pandas-docs/stable/index.html
- Installing pandas, https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
- Python Package Index, https://pypi.org/
- Python下opencv库的安装过程与一些问题汇总,https://www.cnblogs.com/BIXIABUMO/p/12440634.html
- Links for opencv-python, https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/opencv-python/
第 3 章数据处理
3-02 数据加载
3-03 数据清洗与合并
数据预处理包含了数据清洗 (data cleansing) 与特征工程 (feature engineering) ,本节主要介绍的是数据清洗部份,主要目的是将原始数据转换成整洁的、组织合理的形式以供后续的特征工程使用。而数据清洗的工作内容很多,举例来说:
- 基础运算 (basic) - 选择、过滤、删除重复项。
- 取样 (Sampling) - 基于绝对、相对或是概率。
- 数据划分 (Data Partitioning) - 将数据集划分为训练、验证、测试数据集。
- 装箱 (Binning) - 这是用于减少微小观测误差影响的技术,常见的应用如直方图 (Histograms)。
- 转换 (Transformations) - 如标准化,标准化,缩放,旋转。
- 数据替换 (Data Replacement) - 剪切、拆分、合并。
- 插补 (Imputation) - 使用统计算法替换缺失的观察值。
- 加权 (Weighting) - 属性加权。
本节将会介绍基础运算中的过滤、找出缺失值、删除重复项以及数据替换中的剪切、拆分、合并。
References
- Pandas 中文教程,https://www.w3cschool.cn/hyspo/
- Pandas cookbook,https://github.com/jvns/pandas-cookbook
- pandas.read_csv函数参数详解,https://zhuanlan.zhihu.com/p/129858983
- Data Preprocessing vs. Data Wrangling in Machine Learning Projects, https://www.infoq.com/articles/ml-data-processing/
- Data Preparation, https://rapidminer.com/products/studio/feature-list/#data_prep
- Titanic: Machine Learning from Disaster, https://www.kaggle.com/c/titanic
- DataFrame - pandas 1.1.4 documentation, https://pandas.pydata.org/pandas-docs/stable/reference/frame.html
- Merge, join, concatenate and compare, https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
第 4 章Python 数据可视化
4-03 Pandas
References
- Pandas Visualization, https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
- Matplotlib: Visualization with Python, https://matplotlib.org/
- seaborn: statistical data visualization, http://seaborn.pydata.org/
- Top 50 matplotlib Visualizations – The Master Plots, https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/
- Datasets collected from R packages, https://github.com/selva86/datasets
- Midwest demographics, https://ggplot2.tidyverse.org/reference/midwest.html#midwest-demographics
- midwest: Midwest demographics, https://rdrr.io/github/SahaRahul/ggplot2/man/midwest.html
- mtcars: mtcars, https://rdrr.io/github/matthewhirschey/bespokelearnr/man/mtcars.html
- seaborn dataset, https://github.com/mwaskom/seaborn-data
- seaborn: statistical data visualization, https://github.com/mwaskom/seaborn
第 5 章Python 项目移植
当使用者完成一个 Python 项目的时后,要将这个代码移交给他人可能会遇到的问题有以下三种情况:
- Python 解释器:有无安装或版本不同。
- 相关包: 代码中有需要使用的包。
- 操作系统: Windows, Mac OS, Linux等不同操作环境。
References
- PEP 0 – Index of Python Enhancement Proposals (PEPs), https://www.python.org/dev/peps/
- Virtual Environment, https://book.pythontips.com/en/latest/virtual_environment.html
- Installing packages using pip and virtual environments, https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
- venv — Creation of virtual environments, https://docs.python.org/3/library/venv.html
- PEP 405 – Python Virtual Environments, https://www.python.org/dev/peps/pep-0405/
- PyInstaller, http://www.pyinstaller.org/
- How to Install PyInstaller, https://pyinstaller.readthedocs.io/en/latest/installation.html
- PyInstaller Manual, https://pyinstaller.readthedocs.io/en/stable/
- GUI应用, https://pythonguidecn.readthedocs.io/zh/latest/scenarios/gui.html
- Usage - Matplotlib 2.0.2 documentation, https://matplotlib.org/faq/usage_faq.html
- Packaging PyQt5 & PySide2 applications for Windows, with PyInstaller, https://www.learnpyqt.com/tutorials/packaging-pyqt5-pyside2-applications-windows-pyinstaller/
- The Hitchhiker’s Guide to Python!, https://docs.python-guide.org/en/latest/
- Installing Tk on Windows, https://tkdocs.com/tutorial/install.html#installwin
- Freezing Your Code, https://docs.python-guide.org/shipping/freezing/
- Install Docker Desktop on Window, https://docs.docker.com/docker-for-windows/install/
- python Docker Official Images, https://hub.docker.com/_/python?tab=tags&page=1&ordering=last_updated
- 适用于 Linux 的 Windows 子系统安装指南 (Windows 10), https://docs.microsoft.com/zh-cn/windows/wsl/install-win10#step-4—download-the-linux-kernel-update-package
- The base command for the Docker CLI, https://docs.docker.com/engine/reference/commandline/docker/
- Docker 命令大全, https://www.runoob.com/docker/docker-command-manual.html
- [Day 15] Docker (1), https://ithelp.ithome.com.tw/articles/10206556
- Windows上做Python开发太痛苦?Docker了解一下, https://zhuanlan.zhihu.com/p/50864774
- Day5: 實作撰寫第一個 Dockerfile, https://ithelp.ithome.com.tw/articles/10191016
- AWS Lambda, https://aws.amazon.com/cn/lambda/?nc1=h_ls
- Python 中的 AWS Lambda 部署程序包, https://docs.aws.amazon.com/zh_cn/lambda/latest/dg/python-package.html#python-package-venv
- 构建具有依赖项的应用程序, https://docs.aws.amazon.com/zh_cn/serverless-application-model/latest/developerguide/serverless-sam-cli-using-build.html
- Creating New AWS Lambda Layer For Python Pandas Library, https://medium.com/@qtangs/creating-new-aws-lambda-layer-for-python-pandas-library-348b126e9f3e