机器学习入门1-译文-机器学习是什么以及它的重要性(machine learning--what it is and why it matters)

曹伟泽
2023-12-01

1、前言

1.0侵删

1.0.1 20200920首发

1.0.2 良心翻译

譬如:

Machine learning is a method of data analysis that automates analytical model building

机器翻译:机器学习是一种使分析模型构建自动化的数据分析方法

第n遍:机器学习是一种能够自动构建分析模型用于数据分析的方法。

翻译未完,待更

1.1 此系列说明

  • 我将陆续更新机器学习等相关方面的文章翻译,翻译可能采取三种方式
    1. 直译
    2. 意译
    3. 直译+旁白意译
  • 本文为意译,或有不对或不当欢迎指正
  • 标题1基本没什么东西,就随便聊聊。标题2的为译文,标题3的为原文
  • 译文-初衷,说来惭愧,鄙人英语有点渣,借此自省并与诸君共勉。

1.2 聊点机器学习

1.2.1 机器学习的定义

  1. 译文中的定义

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

机器学习是一种能够自动构建分析模型用于数据分析的方法。它是人工智能的一个分支,(人工智能是指系统在人工尽可能少干预的情况下,能够从数据中进行学习后,识别模式(比如图片识别)、或做出决策)。
2. What is Machine Learning? A definition

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

机器学习是人工智能(AI)的一种应用,它使系统能够自动学习并从经验中进行改进,而无需进行明确的编程。机器学习专注于计算机程序的开发,该程序可以访问数据并自主学习。
3. coursera machine-learning

Machine learning is the science of getting computers to act without being explicitly programmed.

机器学习是一门让计算机在没有明确编程的情况下运行的科学。
4. Wikipedia

Machine learning (ML) is the study of computer algorithms that improve automatically through experience.[1][2] It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to do so.[3] Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

机器学习(ML)是对计算机算法的研究,这些算法会根据经验自动提高。它被视为人工智能的子集。机器学习算法基于样本数据(称为“ 训练数据 ”)建立数学模型,以便进行预测或决策而无需明确地编程。机器学习算法被广泛用于许多应用中,例如电子邮件过滤和计算机视觉,在这些应用中,很难或不可行地开发常规算法来执行所需的任务。

1.2.2 知识发现、机器学习、人工智能之间的异同

Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.[5][6] In its application across business problems, machine learning is also referred to as predictive analytics.

等更

2、机器学习是什么以及它的重要性

机器学习是一种能够自动构建分析模型用于数据分析的方法。它是人工智能的一个分支,(人工智能是指系统在人工尽可能少干预的情况下,能够从数据中进行学习后,识别模式(比如图片识别)、或做出决策)。

机器学习的演变史

由于采用了新的计算技术(计算机的发展、算法的发展、数据的积累),因此今天的机器学习已今非昔比。它源于模式识别和计算机无需编程即可执行特定任务的理论。对人工智能感兴趣的研究人员想要知道计算机是否可以像人一样从数据中学习到东西。迭代对机器学习而言是很重要的一个层面,因为当模型接受新数据时,它们能够独立适应。他们从先前的一次次计算中进行学习,以得出可靠,可重复的决策和结果。这不是一门新科学,而是一种全新的动力。
尽管许多机器学习算法已经存在很长时间了,但最近又出现了一种能够将复杂的数学运算自动地,反复地,越来越快地应用于大数据的技术。以下是一些众所周知的机器学习实例:

  • (无人驾驶)大肆宣传的google自动驾驶汽车,尽展机器学习的本质
  • (个性化推荐)来自Amazon和Netflix等平台的在线推荐,日常生活中的机器学习应用程序。
  • (NLP和产生式系统)知道客户在Twitter上对您的评价是什么?机器学习与语言规则相结合。
  • (不懂,邮件过滤?图像识别?)欺诈识别?当今世界上最明显,重要的用途之一。

为什么机器学习很重要

对机器学习的兴趣之所以降低,是因为相同的因素使数据挖掘和贝叶斯分析比以往更受欢迎。这些因素诸如不断增长的数量和可用数据的种类,更便宜,更强大的计算处理以及可负担的数据存储之类的事情。

所有这些都意味着有可能快速而自动地生成可以分析更大,更复杂的数据并提供更快,更准确的结果的模型,甚至是非常大规模的模型。,一个组织可以通过建立精确的模型更好地识别获利的机会或避免未知的风险。

当今世界的机器学习

都谁在使用这门技术?

大多数处理大量数据的行业已经认识到机器学习技术的价值。组织经常从实时数据中收集见解,从而可以更有效地工作或获得超越竞争对手的优势。

  • 金融服务
    金融行业中的银行和其他企业使用机器学习技术有两个主要目的:识别数据中的重要见解和防止欺诈。这些见解可以识别投资机会,或帮助投资者知道何时进行交易。数据挖掘还可以识别具有高风险个人资料的客户,或使用网络监视来确定欺诈的警告信号。

  • 政府
    政府机构(例如公共安全和公共事业)对机器学习有特殊的需求,因为它们有多种数据源可以挖掘以获取见识。例如,分析传感器数据可确定提高效率和节省资金的方法。机器学习还可以帮助检测欺诈并最大程度地减少身份盗用。

  • 卫生保健
    由于可穿戴设备和传感器的出现,可以使用数据实时评估患者的健康状况,因此机器学习是医疗保健行业中快速发展的趋势。该技术还可以帮助医学专家分析数据,以识别可能导致改进诊断和治疗的趋势或危险信号。

  • 零售
    网站根据先前的购买来推荐您可能喜欢的商品,正在使用机器学习来分析您的购买历史。零售商依靠机器学习来捕获数据,对其进行分析并将其用于个性化购物体验,实施营销活动,价格优化,商品供应计划以及获得客户见解。

  • 油和气
    寻找新能源。分析地下的矿物。预测炼油厂传感器故障。简化石油分配,使其更高效,更具成本效益。这个行业的机器学习用例数量众多,并且还在不断增加。

  • 运输
    分析数据以识别模式和趋势是运输行业的关键,这取决于提高路线的效率并预测潜在问题以提高盈利能力。机器学习的数据分析和建模方面是交付公司,公共交通和其他运输组织的重要工具。

有哪些流行的机器学习方法?

机器学习方法中最为广泛采用是监督学习和无监督学习,当然还有其他机器学习方法。以下是最受欢迎的四种机器学习类型的概述。

  • 监督学习 使用标记的示例(例如已知所需输出的输入)训练算法。例如,一台设备可能具有标记为“ F”(失败)或“ R”(运行)的数据点。学习算法接收一组输入以及相应的正确输出,并且该算法通过将其实际输出与正确输出进行比较来学习以发现错误。然后,它会相应地修改模型。通过分类,回归,预测和梯度增强等方法,监督学习使用模式来预测其他未标记数据上的标记值。有监督的学习通常用于历史数据预测未来可能发生的事件的应用程序中。例如,它可以预期何时信用卡交易可能是欺诈的,或者哪个保险客户可能提出索赔。
  • 无监督学习用于没有历史标签的数据。系统没有被告知“正确答案”。该算法必须弄清楚所显示的内容。目的是探索数据并在其中找到一些结构。无监督学习在事务数据上效果很好。例如,它可以识别具有相似属性的客户细分,然后在营销活动中对其进行类似对待。或者,它可以找到将客户群彼此分开的主要属性。流行的技术包括自组织映射,最近邻映射,k均值聚类和奇异值分解。这些算法还用于细分文本主题,推荐项目并识别数据异常值。
  • 半监督学习用于与监督学习相同的应用程序。但是,它同时使用标记和未标记的数据进行训练-通常是少量标记数据和大量未标记数据(因为未标记数据的价格较低,并且获取所需的工作较少)。这种类型的学习可以与分类,回归和预测之类的方法一起使用。当与标签相关的成本太高而无法进行完全标签的培训过程时,半监督学习将非常有用。早期的示例包括在网络摄像头上识别人的脸。
  • 强化学习通常用于机器人技术,游戏和导航。通过强化学习,该算法可以通过反复试验发现哪些动作产生了最大的回报。这种类型的学习具有三个主要组成部分:代理(学习者或决策者),环境(代理与之交互的所有内容)和动作(代理可以做的事情)。代理的目标是选择在给定的时间内最大化预期回报的操作。遵循良好的政策,代理将更快地达到目标。因此,强化学习的目标是学习最佳策略。

人们通常可以每周创建一个或两个良好的模型。机器学习每周可以创建数千个模型。
Thomas H. Davenport,《华尔街日报》分析思想领袖
摘录

机器学习怎么玩

为了从机器学习中获得最大价值,您必须知道如何将最佳算法与正确的工具和流程结合在一起。SAS将统计和数据挖掘中丰富,复杂的遗产与新的架构改进相结合,以确保您的模型即使在大型企业环境中也能尽快运行。
算法:SAS图形用户界面可帮助您构建机器学习模型并实施迭代的机器学习过程。您不必是高级统计学家。我们提供全面的机器学习算法选择,可帮助您从大数据中快速获取价值,并且已包含在许多SAS产品中。SAS机器学习
算法包括

  1. 神经网络
  2. 决策树
  3. 随机森林
  4. 关联和序列发现
  5. 梯度提升和装袋
  6. 支持向量机
  7. 最近邻居映射
  8. k均值聚类
  9. 自组织图
  10. 本地搜索优化技术(例如,遗传算法)
  11. 期望最大化
  12. 多元自适应回归样条
  13. 贝叶斯网络
  14. 内核密度估计
  15. 主成分分析
  16. 奇异值分解
  17. 高斯混合模型
  18. 顺序覆盖规则建立

工具和流程:到目前为止,我们不仅知道算法。最终,从大数据中获得最大价值的秘诀在于将最佳算法与手头任务配对:

  1. 全面的数据质量和管理
  2. 用于构建模型和流程的GUI
  3. 交互式数据探索和模型结果可视化
  4. 比较不同的机器学习模型以快速确定最佳模型
  5. 自动化集成模型评估以识别最佳绩效
  6. 易于模型部署,因此您可以快速获得可重复的,可靠的结果
  7. 集成的端到端平台,用于数据到决策过程的自动化

一些框框内的

机器学习与人工智能的区别

人工智能(AI)是模仿人类能力的广泛科学,而机器学习是AI的特定子集,可以训练机器学习方法。观看此视频,以更好地了解AI与机器学习之间的关系。您将看到这两种技术的工作原理,并附有有用的示例和一些有趣的辅助信息。

创建良好的机器学习系统需要什么?

  • 数据准备功能。
  • 算法-基本和高级。
  • 自动化和迭代过程。
  • 可扩展性。
  • 集成建模。

拓展

  • 在机器学习中,目标称为标签。
  • 在统计中,目标称为因变量。
  • 统计数据中的变量称为机器学习中的功能。
  • 统计的转换称为机器学习中的特征创建。

数据挖掘,机器学习和深度学习之间有什么区别?

尽管所有这些方法都具有相同的目标-提取可用于决策的见解,模式和关系-但它们具有不同的方法和能力。

  • 数据挖掘
    数据挖掘可以被视为从数据中提取见解的许多不同方法的超集。它可能涉及传统的统计方法和机器学习。数据挖掘应用来自许多不同领域的方法来从数据中识别出以前未知的模式。这可以包括统计算法,机器学习,文本分析,时间序列分析和其他分析领域。数据挖掘还包括数据存储和数据处理的研究和实践。
  • 机器学习
    机器学习的主要区别在于,就像统计模型一样,目标是理解数据的结构–使理论分布适合于易于理解的数据。因此,对于统计模型,该模型背后存在一种理论,该理论在数学上得到了证明,但这也要求数据也必须满足某些强有力的假设。机器学习是基于使用计算机探测结构数据的能力而开发的,即使我们对结构的外观没有任何理论。机器学习模型的检验是对新数据的验证错误,而不是证明无效假设的理论检验。由于机器学习通常使用迭代方法从数据中学习,因此学习可以轻松实现自动化。将遍历数据直到找到可靠的模式。
  • 深度学习
    深度学习将计算能力的进步与特殊类型的神经网络相结合,以学习大量数据中的复杂模式。深度学习技术目前是用于识别图像中的对象和声音中的单词的最新技术。现在,研究人员正在寻求将这些成功的模式识别应用到更复杂的任务中,例如自动语言翻译,医学诊断以及许多其他重要的社会和商业问题。

3、machine learning–what it is and why it matters

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

Evolution of machine learning

Because of new computing technologies, machine learning today is not like machine learning of the past. It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks; researchers interested in artificial intelligence wanted to see if computers could learn from data. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that has gained fresh momentum.
While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicized examples of machine learning applications you may be familiar with:

  • The heavily hyped, self-driving Google car? The essence of machine learning.
  • Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life.
  • Knowing what customers are saying about you on Twitter? Machine - learning combined with linguistic rule creation.
  • Fraud detection? One of the more obvious, important uses in our world today.

Why is machine learning important?

Resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. Things like growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data storage.

All of these things mean it’s possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results – even on a very large scale. And by building precise models, an organization has a better chance of identifying profitable opportunities – or avoiding unknown risks.

Machine learning in today’s world

By using algorithms to build models that uncover connections, organizations can make better decisions without human intervention. Learn more about the technologies that are shaping the world we live in.

Who’s using it?

Most industries working with large amounts of data have recognized the value of machine learning technology. By gleaning insights from this data – often in real time – organizations are able to work more efficiently or gain an advantage over competitors.

  • Financial services
    Banks and other businesses in the financial industry use machine learning technology for two key purposes: to identify important insights in data, and prevent fraud. The insights can identify investment opportunities, or help investors know when to trade. Data mining can also identify clients with high-risk profiles, or use cybersurveillance to pinpoint warning signs of fraud.

  • Government
    Government agencies such as public safety and utilities have a particular need for machine learning since they have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways to increase efficiency and save money. Machine learning can also help detect fraud and minimize identity theft.
    Health care
    Machine learning is a fast-growing trend in the health care industry, thanks to the advent of wearable devices and sensors that can use data to assess a patient’s health in real time. The technology can also help medical experts analyze data to identify trends or red flags that may lead to improved diagnoses and treatment.

  • Retail
    Websites recommending items you might like based on previous purchases are using machine learning to analyze your buying history. Retailers rely on machine learning to capture data, analyze it and use it to personalize a shopping experience, implement a marketing campaign, price optimization, merchandise supply planning, and for customer insights.

  • Oil and gas
    Finding new energy sources. Analyzing minerals in the ground. Predicting refinery sensor failure. Streamlining oil distribution to make it more efficient and cost-effective. The number of machine learning use cases for this industry is vast – and still expanding.

  • Transportation
    Analyzing data to identify patterns and trends is key to the transportation industry, which relies on making routes more efficient and predicting potential problems to increase profitability. The data analysis and modeling aspects of machine learning are important tools to delivery companies, public transportation and other transportation organizations.

What are some popular machine learning methods?

Two of the most widely adopted machine learning methods are supervised learning and unsupervised learning – but there are also other methods of machine learning. Here’s an overview of the most popular types.

  • Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known. For example, a piece of equipment could have data points labeled either “F” (failed) or “R” (runs). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.
  • Unsupervised learning is used against data that has no historical labels. The system is not told the “right answer.” The algorithm must figure out what is being shown. The goal is to explore the data and find some structure within. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.
  • Semisupervised learning is used for the same applications as supervised learning. But it uses both labeled and unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person’s face on a web cam.
  • Reinforcement learning is often used for robotics, gaming and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.

Humans can typically create one or two good models a week; machine learning can create thousands of models a week.
Thomas H. Davenport, Analytics thought leader
excerpt from The Wall Street Journal

How it works

To get the most value from machine learning, you have to know how to pair the best algorithms with the right tools and processes. SAS combines rich, sophisticated heritage in statistics and data mining with new architectural advances to ensure your models run as fast as possible – even in huge enterprise environments.

Algorithms: SAS graphical user interfaces help you build machine learning models and implement an iterative machine learning process. You don’t have to be an advanced statistician. Our comprehensive selection of machine learning algorithms can help you quickly get value from your big data and are included in many SAS products. SAS machine learning algorithms include:

  1. Neural networks
  2. Decision trees
  3. Random forests
  4. Associations and sequence discovery
  5. Gradient boosting and bagging
  6. Support vector machines
  7. Nearest-neighbor mapping
  8. k-means clustering
  9. Self-organizing maps
  10. Local search optimization techniques (e.g., genetic algorithms)
  11. Expectation maximization
  12. Multivariate adaptive regression splines
  13. Bayesian networks
  14. Kernel density estimation
  15. Principal component analysis
  16. Singular value decomposition
  17. Gaussian mixture models
  18. Sequential covering rule building

Tools and Processes: As we know by now, it’s not just the algorithms. Ultimately, the secret to getting the most value from your big data lies in pairing the best algorithms for the task at hand with:

  1. Comprehensive data quality and management
  2. GUIs for building models and process flows
  3. Interactive data exploration and visualization of model results
    Comparisons of different machine learning models to quickly identify the best one
    Automated ensemble model evaluation to identify the best performers
    Easy model deployment so you can get repeatable, reliable results quickly
    An integrated, end-to-end platform for the automation of the data-to-decision process
    Do you need some basic guidance on which machine learning algorithm to use for what? This blog by Hui Li, a data scientist at SAS, provides a handy cheat sheet

extend

Machine Learning and Artificial Intelligence

While artificial intelligence (AI) is the broad science of mimicking human abilities, machine learning is a specific subset of AI that trains a machine how to learn. Watch this video to better understand the relationship between AI and machine learning. You’ll see how these two technologies work, with useful examples and a few funny asides.

What’s required to create good machine learning systems?

  • Data preparation capabilities.
  • Algorithms – basic and advanced.
  • Automation and iterative processes.
  • Scalability.
  • Ensemble modeling.

Did you know?

  • In machine learning, a target is called a label.
  • In statistics, a target is called a dependent variable.
  • A variable in statistics is called a feature in machine learning.
  • A transformation in statistics is called feature creation in machine learning.

What are the differences between data mining, machine learning and deep learning?

Although all of these methods have the same goal – to extract insights, patterns and relationships that can be used to make decisions – they have different approaches and abilities.

  • Data Mining
    Data mining can be considered a superset of many different methods to extract insights from data. It might involve traditional statistical methods and machine learning. Data mining applies methods from many different areas to identify previously unknown patterns from data. This can include statistical algorithms, machine learning, text analytics, time series analysis and other areas of analytics. Data mining also includes the study and practice of data storage and data manipulation.
  • Machine Learning
    The main difference with machine learning is that just like statistical models, the goal is to understand the structure of the data – fit theoretical distributions to the data that are well understood. So, with statistical models there is a theory behind the model that is mathematically proven, but this requires that data meets certain strong assumptions too. Machine learning has developed based on the ability to use computers to probe the data for structure, even if we do not have a theory of what that structure looks like. The test for a machine learning model is a validation error on new data, not a theoretical test that proves a null hypothesis. Because machine learning often uses an iterative approach to learn from data, the learning can be easily automated. Passes are run through the data until a robust pattern is found.
  • Deep learning
    Deep learning combines advances in computing power and special types of neural networks to learn complicated patterns in large amounts of data. Deep learning techniques are currently state of the art for identifying objects in images and words in sounds. Researchers are now looking to apply these successes in pattern recognition to more complex tasks such as automatic language translation, medical diagnoses and numerous other important social and business problems.
 类似资料: