注册 登录  
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭







2011-08-03 08:52:34|  分类: 读书笔记 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

4 Perspectives

4 看法


Pattern recognition deals with discovering, distinguishing, detecting or characterizing patterns present in the surrounding world. It relies on extraction and representation of information from the observed data, such that after integration with background knowledge, it ultimately leads to a formulation of new knowledge and concepts. The result of learning is that the knowledge already captured in some formal terms is used to describe the present interdependencies such that the relations between patterns are better understood (interpreted) or used for generalization. The latter means that a concept,e.g. of a class of objects, is formalized such that it can be applied to unseen examples of the same domain, inducing new information, e.g. the class label of a new object. In this process new examples should obey the same deduction process as applied to the original examples.



In the next subsections we will first recapitulate the elements of logical reasoning that contribute to learning. Next, this will be related to the Platonic and Aristotelian scientific approaches discussed in Section 2. Finally, two novel pattern recognition paradigms are placed in this view.



4.1 Learning by Logical Reasoning

4.1 逻辑推理性学习


Learning from examples is an active process of concept formation that relies on abstraction (focus on important characteristics or reduction of detail)and analogy (comparison between different entities or relations focussing on some aspect of their similarity). Learning often requires dynamical, multilevel(seeing the details leading to unified concepts, which further build higher level concepts) and possibly multi-strategy actions (e.g. in order to support good predictive power as well as interpretability). A learning task is basically defined by input data (design set), background knowledge or problem context and a learning goal [52]. Many inferential strategies need to be synergetically integrated to be successful in reaching this goal. The most important ones are inductive, deductive and abductive principles, which are briefly presented next. More formal definitions can be sought in the literature on formal logic,philosophy or e.g. in [23, 40, 52, 83].



Inductive reasoning is the synthetic inference process of arriving at a conclusion or a general rule from a limited set of observations. This relies on a formation of a concept or a model, given the data. Although such a derived inductive conclusion cannot be proved, its reliability is supported by empirical observations. As along as the related deductions are not in contradiction with experiments, the inductive conclusion remains valid. If, however, future observations lead to contradiction, either an adaption or a new inference is necessary to find a better rule. To make it more formal, induction learns a general rule (concerning and B) from numerous examples of andB. In practice, induction is often realized in a quantitative way. Its strength relies then on probability theory and the law of large numbers, in which given a large number of cases, one can describe their properties in the limit and the corresponding rate of convergence.



Deductive reasoning is the analytic inference process in which existing knowledge of known facts or agreed-upon rules is used to derive a conclusion. Such a conclusion does not yield ‘new’ knowledge since it is a logical consequence of what has already been known, but implicitly (it is not of a greater generality than the premises). Deduction, therefore, uses a logical argument to make explicit what has been hidden. It is also a valid form of proof provided that one starts from true premises. It has a predictive power, which makes it complementary to induction. In a pattern recognition system, both evaluation and prediction rely on deductive reasoning. To make it more formal, let us assume that is a set of observations, is a conclusion and is a general rule. Let be a logical consequence of and R, i.e. (A ∧ R|B, where |= denotes entailment. In a deductive reasoning, given and using the rule R,the consequence is derived.

演译推理是分析性推理过程,通过已知事实的现存知识或一致被认可的规则推导出一个结论。既然这样的结论是从已知的知识中进行逻辑推导的结果,所以不能算是“新”知识,但具有隐含性(它比前提条件不更具一般性)。演译,即是运用一套逻辑方法把隐藏在背后的知识清晰起来。它也是一个从真实的前提进行实证的有效形式。它具有预言性功能,能弥补归纳方法的不足。更形式化地表示可以这样:假设A是一组观察数据,B是一个结论,R是一个一般性规则,则B是A和R的逻辑推导结果,如(A ∧ R|B |=表示蕴涵关系,演译推理过程中,给定A,运用规则R,结论B就能就此被推论出来。


Abductive reasoning is the constructive process of deriving the most likely or best explanations of known facts. This is a creative process, in which possible and feasible hypotheses are generated for a further evaluation. Since both abduction and induction deal with incomplete information, induction may be viewed in some aspects as abduction and vice versa, which leads to some confusion between these two [23, 52]. Here, we assume they are different. Concerning the entailment (A ∧ R)|B, having observed the consequence in the context of the rule Ris derived to explain B.

溯因推理是构建推理过程,从最象或最具有解释性的已知事实中推理出结论。这是一个创造性过程,可能或可行性假设是因进一步需要推定而产生的。既然溯因推理和归纳推理是在不完整的信息中进行推理,从某些方面可能可以把归纳推理看成溯因推理,反之亦然,这样在二者之间会导致些混淆。这里,我们假定他们是不一样的,看这样的蕴涵关系:(A ∧ R)|B,表示从规则R的上下文中观察出B结论,A被用来解释B


In all learning paradigms there is an interplay between inductive, abductive and deductive principles. Both deduction and abduction make possible to conceptually understand a phenomenon, while induction verifies it. More precisely, abduction generates or reformulates new (feasible) ideas or hypotheses,induction justifies the validity of these hypothesis with observed data and deduction evaluates and tests them. Concerning pattern recognition systems,abduction explores data, transforms the representation and suggests feasible classifiers for the given problem. It also generates new classifiers or reformulates the old ones. Abduction is present in an initial exploratory step or in the Adaptation stage; see Fig. 1. Induction trains the classifier in the Generalization stage, while deduction predicts the final outcome (such as label) for the test data by applying the trained classifier in the Evaluation stage.



Since abduction is hardly emphasized in learning, we will give some more insights. In abduction, a peculiarity or an artifact is observed and a hypothesis is then created to explain it. Such a hypothesis is suggested based on existing knowledge or may extend it, e.g. by using analogy. So, the abductive process is creative and works towards new discovery. In data analysis,visualization facilitates the abductive process. In response to visual observations of irregularities or bizarre patterns, a researcher is inspired to look for clues that can be used to explain such an unexpected behavior. Mistakes and errors can therefore serve the purpose of discovery when strange results are inquired with a critical mind. Note, however, that this process is very hard to implement into automatic recognition systems as it would require to encode not only the detailed domain knowledge, but also techniques that are able to detect ‘surprises’ as well as strategies for their possible use. In fact,this requires a conscious interaction. Ultimately, only a human analyst can interactively respond in such cases, so abduction can be incorporated into semi-automatic systems well. In traditional pattern recognition systems, abduction is usually defined in the terms of data and works over pre-specified set of transformations, models or classifiers.



4.2 Logical Reasoning Related to Scientific Approaches

4.2 与科学研究方法有关的逻辑推理


If pattern recognition (learning from examples) is merely understood as a process of concept formation from a set of observations, the inductive principle is the most appealing for this task. Indeed, it is the most widely emphasized in the literature, in which ‘learning’ is implicitly understood as ‘inductive learning’. Such a reasoning leads to inferring new knowledge (rule or model)which is hopefully valid not only for the known examples, but also for novel,unseen objects. Various validation measures or adaptation steps are taken to support the applicability of the determined model. Additionally, care has to be taken that the unseen objects obey the same assumptions as the original objects used in training. If this does not hold, such an empirical generalization becomes invalid. One should therefore exercise in critical thinking while designing a complete learning system. It means that one has to be conscious which assumptions are made and be able to quantify their sensibility, usability and validity with the learning goal.



On the other hand, deductive reasoning plays a significant role in the Platonic approach. This top-down scenario starts from a set of rules derived from expert knowledge on problem domain or from a degree of belief in a hypothesis. The existing prior knowledge is first formulated in appropriate terms.These are further used to generate inductive inferences regarding the validity of the hypotheses in the presence of observed examples. So, deductive formalism(description of the object’s structure) or deductive predictions (based on the Bayes rule) precede inductive principles. A simple example in the Bayesian inference is the well-known Expectation-Maximization (EM) algorithm used in problems with incomplete data [13]. The EM algorithm iterates between the E-step and M-step until convergence. In the E-step, given a current (or initial)estimate of the unknown variable, a conditional expectation is found, which is maximized in the M-step and derives a new estimate. The E-step is based on deduction, while the M-step relies on induction. In the case of Bayesian nets, which model a set of concepts (provided by an expert) through a network of conditional dependencies, predictions (deductions) are made from the (initial) hypotheses (beliefs over conditional dependencies) using the Bayes theorem. Then, inductive inferences regarding the hypotheses are drawn from the data. Note also that if the existing prior knowledge is captured in some rules, learning may become a simplification of these rules such that their logical combinations describe the problem.

在另一方面,演译推理在柏拉图式科学研究中起着重要的角色。这是自顶向下的过程,起始于一组规则,这些规则从某个领域的专家知识中得到,或从假设的可信度中得到。首先先验知识被以某种表示方法形式化,形式化后的知识就可以被用来在现有的观察数据中运用归纳推导法检验假设的有效性。所以演译形式(描述对象结构)或演译预测(基于贝叶斯法则)是在归纳法则之前的过程。在贝叶斯推导中一个简单的例子是大家都知道的最大期望算法(Expectation-Maximization (EM)),这种算法用在数据不完整的问题中。EM算法在E步骤和M步骤之间循环进行直到能够被收敛。在E步骤中,给定一个未知变量的当前(或初始)估值,找到一个条件期望值,期望值在M步骤中被最大化并得到一个新估值。E步骤是基于演译方法,M步骤是运用归纳方法。在贝叶斯网络中,通过条件依赖的贝叶斯网络为一个概念(由专家提供)集进行建模,运用贝叶斯理论得到的(初始)假设(建立在条件依赖上的把握)进行预测(归纳)。然后,归纳推导从数据上进行检验。也要注意的是如果已存在的先验知识是在一些法则中得到,学习可能是对这些法则的简化过程,这样形成的逻辑组合被用来描述所要解决的问题。


In the Aristotelian approach to pattern recognition, observation of particulars and their explanation are essential for deriving a concept. As we already know, abduction plays a role here, especially for data exploration and characterization to explain or suggest a modification of the representation or an adaptation of the given classifier. Aristotelian learning often relies on the Occam’s razor principle which advocates to choose the simplest model or hypothesis among otherwise equivalent ones and can be implemented in a number of ways [8].



In summary, the Platonic scenario is dominantly inductive-deductive,while the Aristotelian scenario is dominantly inductive-abductive. Both frameworks have different merits and shortcomings. The strength of the Platonic approach lies in the proper formulation and use of subjective beliefs, expert knowledge and possibility to encode internal structural organization of objects. It is model-driven. In this way, however, the inductive generalization becomes limited, as there may be little freedom in the description to explore and discovery of new knowledge. The strength of the Aristotelian approach lies in a numerical induction and a well-developed mathematical theory of vector spaces in which the actual learning takes place. It is data-driven. The weakness,however, lies in the difficulty to incorporate the expert or background knowledge about the problem. Moreover, in many practical applications, it is known that the implicit assumptions of representative training sets, identical and identically distributed (iid) samples as well as stationary distributions do not hold.



4.3 Two New Pattern Recognition Paradigms

4.3 两个新的模式识别模式


Two far-reaching novel paradigms have been proposed that deal with the drawbacks of the Platonic and Aristotelian approaches. In the Aristotelian scenario, Vapnik has introduced transductivelearning [73], while in the Platonic scenario, Goldfarb has advocated a new structural learning paradigm[31, 32]. We think these are two major perspectives of the science of pattern recognition.



Vapnik [73] formulated the main learning principle as: ‘If you possess a restricted amount of information for solving some problem, try to solve the problem directly and never solve a more general problem as an intermediate step.’ In the traditional Aristotelian scenario, the learning task is often transformed to the problem of function estimation, in which a decision function is determined globally for the entire domain (e.g. for all possible examples in a feature vector space). This is, however, a solution to a more general problem than necessary to arrive at a conclusion (output) for specific input data. Consequently, the application of this common-sense principle requires a reformulation of the learning problem such that novel (unlabeled) examples are considered in the context of the given training set. This leads to the transductive principle which aims at estimating the output for a given input only when required and may differ from an instance to instance. The training sample, considered either globally, or in the local neighborhoods of test examples, is actively used to determine the output. As a result, this leads to confidence measures of single predictions instead of globally estimated classifiers. It provides ways to overcome the difficulty of iid samples and stationary distributions. More formally, in a transductive reasoning, given an entailment A |= (B ∪ C), if the consequence is observed as the result of A, then the consequence becomes more likely.

Vapnik提出主要的学习法则是:如果用于解决问题的信息有限,则应当试着寻找直接解决问题的方法,不要去解决更为通用的问题,如中间问题。在传统的亚里士多德式研究过程中,学习的任务经常转为函数估计问题,其中的决策函数用于全局地决定整个问题域(如为了解决特征向量空间中所有的可能用例),然而,这是一个为了解决更为通用问题的方法,不是为特定输入数据而达到的解决方法(输出)。结果,运用这种普通法则的应用需要对学习问题重新进行形式化,这样新的(未标识)用例要被考虑进已有训练集的上下文中。这导致了转化推理的产生,这种方法是只有在需要的时候才从输入数据来估计输出数据,可能会在实例与实例之间进行比较,对于训练样本,可以是较为全面的测试用例,也可以是局部的相邻部分,决定了决策结果。所以,这里用的是对每个决策的信心度量,而不是对分类器进行全局性估量。这样可以克服样本要具有同一同分布和固定分布的困难。转化推理更为形式化的表示可以是这样的关系:A |= (B ∪ C),如果B被观察出来是A,则C因和B相似也被认为是A。


The truly transductive principle requires an active synergy of inductive,deductive and abductive principles in a conscious decision process. We believe it is practised by people who analyze complex situations, deduce and validate possible solutions and make decisions in novel ways. Examples are medical doctors, financial advisers, strategy planners or leaders of large organizations. In the context of automatic learning, transduction has applications to learning from partially labeled sets and otherwise missing information, information retrieval, active learning and all types of diagnostics. Some proposals can be found e.g. in [34, 46, 47, 73]. Although many researchers recognize the importance of this principle, many remain also reluctant. This may be caused by unfamiliarity with this idea, few existing procedures, or by the accompanying computational costs as a complete decision process has to be constantly inferred anew.

在意识决策过程中真正的转化推理法则需要是对归纳、演译和溯因法则进行互补和综合,我们相信人类分析复杂事物、推理和验证可能性结论及用新奇方法进行决策是这样进行的,例如象那些医生、金融顾问、战略规划者和大型组织的领导者。在自动化学习过程中,转化推理拥有从局部标识的数据集中进行学习的程序,还有分别从丢失信息、不完整信息、已学知识和各种诊断学中进行学习的程序。这里可以发现一些学习方案,例如可以见文献[34, 46, 47, 73]。虽然有许多研究者认识到了这个转化推理法则的重要性,也有很多人对此表示怀疑,这也许是因为对这个思想不熟悉、相应的程序很少,或者是因为这样一个完整的决策每次都要重新被推断需要较多的计算开销。


In the Platonic scenario, Goldfarb and his colleagues have developed structural inductive learning, realized by the so-called evolving transformation systems (ETS) [31, 32]. Goldfarb first noticed the intrinsic and impossible to overcome inadequacy of vector spaces to truly learn from examples [30]. The reason is that such quantitative representations loose all information on object structure; there is no way an object can be generated given its numeric encoding. The second crucial observation was that all objects in the universe have a formative history. This led Goldfarb to the conclusion that an object representation should capture the object’s formative evolution, i.e. the way the object is created through a sequence of suitable transformations in time. The creation process is only possible through structural operations. So, ‘the resulting representation embodies temporal structural information in the form of a formative, or generative, history’ [31]. Consequently, objects are treated as evolving structural processes and a class is defined by structural processes, which are ‘similar’. This is an inductive structural/symbolic class representation,the central concept in ETS. This representation is learnable from a (small) set of examples and has the capability to generate objects from the class.



The generative history of a class starts from a single progenitor and is encoded as a multi-level hierarchical system. On a given level, the basic structural elements are defined together with their structural transformations, such that both are used to constitute a new structural element on a higher level.This new element becomes meaningful on that level. Similarity plays an important role, as it is used as a basic quality for a class representation as a set of similar structural processes. Similarity measure is learned in training to induce the optimal finite set of weighted structural transformations that are necessary on the given level, such that the similarity of an object to the class representation is large. ‘This mathematical structure allows one to capture dynamically, during the learning process, the compositional structure of objects/events within a given inductive, or evolutionary, environment’ [31].



Goldfarb’s ideas bear some similarity to the ones of Wolfram, presented in his book on ‘a new kind of science’ [80]. Wolfram considers computation as the primary concept in nature; all processes are the results of cellular automata(Cellular automata are discrete dynamical systems that operate on a regular lattice in space and time, and are characterized by ‘local’ interactions.)type of computational processes, and thereby inherently numerical. He observes that repetitive use of simple computational transformations can cause very complex phenomena, especially if computational mechanisms are used at different levels. Goldfarb also discusses dynamical systems, in which complexity is built from simpler structures, through hierarchical folding up (or enrichment). The major difference is that he considers structure of primary interest, which leads to evolving temporal structural processes instead of computational ones.

Goldfarb的思想与Wolfram有些相似,在他的书‘a new kind of science’[80]中有这方面的介绍。Woldfram认为计算是自然中首要的概念,所有的处理都是实现在胞元自动机类型上的计算过程(胞元自动机是离散的动态系统,在时间和空间中操作在有规则的格子上,具有局部交互的特点),显然是数字化的。他观察到反复使用简单的计算转换方法可以产生复杂的现象,特别是如果在不同层次进行计算。Goldfarb也论述类似的动态系统,认为复杂的事物可以通过分等级的折叠(或富集)方法,从更简单的结构来建立。主要不同的是他考虑主要有用的结构,且这种结构可以在结构上随时间的变化而进行演变,而不是计算出来。


In summary, Goldfarb proposes a revolutionary paradigm: an ontological model of a class representation in an epistemological context, as it is learnable from examples. This is a truly unique unification. We think it is the most complete and challenging approach to pattern recognition to this date, a breakthrough. By including the formative history of objects into their representation, Goldfarb attributes them some aspects of human consciousness. The far reaching consequence of his ideas is a generalized measurement process that will be one day present in sensors. Such sensors will be able to measure ‘in structural units’ instead of numerical units (say, meters) as it is currently done. The inductive process over a set of structural units lies at the foundation of new inductive informatics. The difficulty, however, is that the current formalism in mathematics and related fields is not yet prepared for adopting these far-reaching ideas. We, however, believe, they will pave the road and be found anew or rediscovered in the next decennia.


阅读(279)| 评论(0)
推荐 转载




<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->


网易公司版权所有 ©1997-2017