代写termpaper，数据挖掘技术在保险行业中的决策

浏览：　日期：2020-06-10

1 Introduction

With the rapid development of database technology and database management systems widely used, more and more data accumulate all walks of life. Growing surge of data hidden behind a lot of important information that people want to be able to be a higher level of analysis in order to make better use of the data. The current database systems can efficiently implement data entry, query, statistics and other functions, but can not find the data relationships and rules exist, can not be based on existing data to predict future trends. Lack of knowledge hidden behind data mining tools, led to the "data explosion but knowledge poor" phenomenon.

With the development of computer and network technology, access to a particular industry relevant information has been feasible. For large quantities, involving a wide range of data, relying on the traditional simple summary of the specified model to analyze the statistical methods of data analysis can not be completed. Therefore, an intelligent analysis of information technology - "data mining" (Data Mining) came into being.

Data Mining (Data Mining) is a large, incomplete, noisy, fuzzy, random data to extract implicit in them, people are not known in advance, but is potentially useful information and knowledge in the process . By mining data warehouse to store large amounts of data, and found a new association meaningful patterns and trends in the process. Data mining is a new business information processing technology, is a large number of commercial database business data extraction, transformation, analysis and processing of other models to extract critical data supporting business decisions. So that enterprises in the fierce market competition opportunities. As for the insurance industry, currently has a broad market demand.

2 Item Description

The project has developed "the insurance industry decision system V1.0". The main interface of system operation using ASP programming: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output functions; background database using the Sql Server 2005 network database implementation; mining tools using SPSS Clementine 11.0; experiments in the study stage Apriori algorithm exists for "Storage complexity" and "a lot of redundant rules," two major drawbacks of the algorithm to improve through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules .

The system consists of: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output and other major functional blocks.

(1) "preprocessing" modules include: upload, data platform, data processing, statistics, and other functions to generate data sets.

● Upload: to be completed by all branches Insurance Corporation under the data upload.

● Data Platform: allows the data before uploading data platform to choose.

● Data processing: cleaning up the data, format conversion and other operations.

● Statistics: The preprocessed data analysis, extraction efficacy data.

● generate data sets: the statistical data generating process to extract the active data set, to provide a higher quality data mining data source.

(2) "customers to buy insurance analysis" modules include: data import, parameter setting, result analysis and other functions.

● Data Import: In this user interface, by selecting different data platform will go through "data preprocessing" generated data sets were imported.

● Parameter setting: In this user interface settings "support", "confidence" and other parameters for effective analysis of the data set with the value range of the data record filter.

● Analysis: In this user interface can be "customers to buy insurance analysis," the final results of the analysis to the "report", "chart" format display, the results of this analysis for the industry to provide a "same customer buy our various (sub) insurance "customer information, thus providing the industry" to win customers' decision-making basis.

(3) "customer buying habits of" modules include: data import, parameter setting, result analysis and other functions.

● Data Import: This operation is the same (2) "customers to buy insurance analysis" module "Data Import."

● Parameter setting: In this setting, respectively, "Input Parameters" (including: age, gender, occupation and other basic customer information) and "Output Parameters" (customers buy insurance information).

● Analysis: With this interface can demonstrate customer buying habits analysis, thus providing the industry "to retain customers' decision-making basis.

(4) "analysis result output" modules include: "Analysis of customers to buy insurance" and "customer buying habits analysis" of the print output results.

Three projects improved fast algorithm

Since Apriori algorithm time and space complexity is high and there is a large amount of redundant rules two major defects. Therefore, this project through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing redundant rules appear.

3.1 a pattern tree structure

root is the one labeled as "null" the root, root root following the child's program as a prefix sub-tree collection, as well as project head table composition; tree each node contains four fields user_id, count, node_link, node_next. Which, user_id is user tags (uniquely identifies a user), count for the parent node of the node reaches the number of paths, node_link point to the same tree the user_id next node to the next node, the moment a node does not exist, node_link is null, node_next pointing to its child nodes in the tree; program header table for each table entry contains three fields: user_id, count, head of node, user_id with the same meaning as defined in the tree, count as user_id of the tree and all the same, head of node points to the tree with the same user_id value of the first node pointer.

3.2 Creating Pattern Tree

Algorithm is as follows:

Let the transaction database as A, one of the items set to Ai.

Algorithm: Patterntree (tree, p), constructed pattern tree

Input: A transaction database user

Output: User mode tree

Procedure Patterntree (T, p)

{Create_ tree (T) ;/ / create a Pattern-Tree root node to "null" mark

t = T; / / t for the current node

While A <> null do

{Read into a transactional database item set Ai

while p! = null

{If p.user_id == t ancestors n.user_id

then

{N.count = n.count + l;

t = n;

}

Elseif p.user_id == T kids c.user_id

then

{C.count = c.count + l;

t = c;

}

else

insert_Patterntree (T, p) ;/ / put p as a new node into the tree, as the current node's child nodes

p = p.next;

}

3.3 pairs pattern tree pruning

Pattern tree is established, there may be a large number of redundant branches, in order to ensure that the data mining results will not be the redundant branches affected by the noise generated, so the need for tree pruning, removing noise information.

Algorithm: SPT (Tree, a), by calling the model tree pruning algorithm

/ / SPT to support pattern tree, ie Supported Access Pattern Tree; a head table for the project

Input: Pattern tree PatternTree, Min_Sup (Pattern Tree minimum support)

Output: After pruning the support pattern tree SPT, mode B = {bi | i = 1,2,3 ...... n}

SPT (Tree, a)

{I = 1;

While (ai! = null) / / for the project head table in a one

{

if (ai.count> = Min_Sup)

then

{

Mode bi = ai.head of node;

p = ai.head of node ;/ / p in the schema tree pointing ai

Location

While (p! = null and ai.count> = Min_Sup)

{

Find the prefix p group, the p-group, and p connection prefix, configuration

Into Mode b;

if (bi.count> = Min_Sup)

then

{

/ / Bi.count the mode p and p b is the base of the prefix

The minimum count

P in the schema bi retain their prefixes base;

bi = bi. node_link

}

else

{

Depending on the mode of p and b prefix base deletion

PatternTree the corresponding node, a child node reconfiguration

With the parent node, and modify the project header table ai;

p = p. node_next / / p points in the pattern tree

Next position;

}

else

{

Modify the project head node ai value;

Delete mode corresponding node in the tree and prefix-based, reconstruction Sons

Node;

i + +;

}

The establishment of the tree can be avoided through mode multiple scans the transaction database; while taking advantage count field effectively retains the number of itemsets to avoid generating a large number of frequent itemsets, for reducing the complexity of space-time has played a certain role. Tree structure can be avoided through a large amount of redundant rules.

Through the pattern tree pruning, tree can be deducted in the pattern generation process produces a large number of redundant branches, played a role in reducing the space complexity, and can utilize the output mode B production rules, to avoid a number of sets appears frequently, reducing the time complexity.

4 Conclusion

The project tree structure by mode improved Apriori algorithm, Apriori algorithm to make up for the defects. This method is not only capable of Apriori algorithm from time complexity and space complexity to improve on, while avoiding the generation of intermediate rules. This study shows that by using a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules, which improved Apriori algorithm is an effective measure.

1引言

随着数据库技术和数据库管理系统中广泛应用的飞速发展，越来越多的数据积累各行各业。日益激增的数据背后隐藏了很多重要的信息，人们希望能够是一个更高层次的分析，以更好地利用数据。目前的数据库系统可以有效地实现数据的录入，查询，统计等功能，但无法找到数据的关系和规则的存在，无法根据现有的数据预测未来的发展趋势。背后隐藏的数据挖掘工具，知识的缺乏导致“数据爆炸但知识差”的现象。

随着计算机和网络技术的发展，进入特定行业相关信息一直是可行的。对于数量大，涉及范围广，数据，分析统计的数据分析方法，不能完成指定的模型依赖于传统的简单总结。因此，信息技术 - 智能分析“数据挖掘”（数据挖掘）应运而生。

数据挖掘（数据挖掘）是一个大型的，不完整的，嘈杂的，模糊的，随机的数据中提取隐含在他们的，人们事先不知道的，但在这个过程中是潜在有用的信息和知识。通过挖掘数据仓库存储大量的数据，并发现了一个新的关联有意义的模式和趋势的过程中。数据挖掘是一种新的商业信息处理技术，大量的商业数据库数据抽取，转换，分析和处理的其他车型中提取关键数据，支持业务决策。使企业在激烈的市场竞争机会。至于保险业，目前拥有广阔的市场需求。

2项目

该项目开发了“保险业决策系统V1.0”。使用ASP编程系统操作的主界面：数据预处理，客户买保险分析，顾客的购买习惯分析和结果输出功能，后台数据库采用SQL Server 2005中的网络数据库实现，采矿工具，使用SPSS Clementine的11.0;实验Apriori算法存在研究阶段“存储复杂性”，“很多冗余的规则，”两大弊端的算法，通过使用模式树结构，以减少存储Apriori算法的复杂性提高，同时减少了冗余的外观规则。

该系统包括：数据预处理，客户购买保险的分析，顾客的购买习惯分析，分析结果输出等主要功能模块。

（1）“预处理”模块包括：上传，数据平台，数据处理，统计，生成数据集等功能。

●上传：可完成所有分支机构的保险公司下的数据上传。

●数据平台：允许上传数据平台的数据，然后再选择。

●数据处理：清理数据，格式转换等操作。

●统计：预处理后的数据分析，提取疗效数据。

●生成数据集的统计数据生成处理，以提取有效的数据集，以提供更高质量的数据挖掘的数据源。

（2）“客户买保险分析”模块包括：数据导入，参数设置，结果分析等功能。

●数据导入：在这个用户界面，通过选择不同的数据平台将通过“数据预处理”生成数据集进口。

●参数设置：在用户界面设置“支持”，“信心”和其他参数的设定值范围的数据记录过滤器的数据进行有效的分析。

●分析：在这个用户界面可以是“客户买保险分析，”最后结果的分析“报告”，“图”的格式显示，这个分析结果，为行业提供了“同一个客户买我们的各个（子）保险“的客户信息，从而提供了业界”赢得了客户的决策依据。

（3）“顾客的购买习惯”模块包括：数据导入，参数设置，结果分析等功能。

●数据导入：这个操作是相同的（2）“客户买保险分析”模块“数据导入”。

●参数设置：在此设置，分别为“输入参数”（包括：年龄，性别，职业等基本的客户信息）和“输出参数”（客户购买保险的信息）。

●分析：有了这个接口，可以证明顾客的购买习惯分析，从而提供业界“保留客户的决策的基础。

（4）“分析结果输出”模块包括：“客户买保险”和“顾客的购买习惯分析”的打印输出结果的分析。

三个项目提高快速算法

由于Apriori算法的时间和空间复杂度为高，并有大量冗余规则两大缺陷。因此，该项目通过使用模式树结构，以减少存储Apriori算法的复杂性，同时减少冗余规则出现。

3.1的模式树结构

根是一个标记为“空”的根，根孩子的程序作为前缀子树集合，以及项目表头组成，树的每个节点包含四个领域的user_id，计数，node_link node_next。其中，user_id是用户标签（唯一标识用户），计数的节点的父节点的路径的数量达到相同的树的节点不存在的时刻到下一个节点的下一个节点的user_id，，node_link点， node_link为null，node_next指向其子树中的节点;程序头表每个表项包含三个字段：user_id的，计数，user_id的头节点，在树的定义具有相同含义，算作user_id为树都一样，头结点user_id的值具有相同的树的第一个节点的指针。

3.2创建模式树

算法如下：

让交易数据库为A，艾设定的项目之一。

算法：Patterntree（树，P），建造模式树

输入：事务数据库用户

输出模式：用户模式树

步骤Patterntree（T，P）

{CREATE_树（T）;/ /创建一个模式树的根节点到“空”的标志

T = T; / /吨，目前的节点

当A <>空

{读入数据库项目设置艾事务

而P！ = NULL

做

{如果p.user_id ==吨祖先的n.user_id

然后

{N.count = n.count +升;

T = N;

}

ELSEIF p.user_id == T孩子c.user_id

然后

{C.count = c.count +升;

T = C;

}

其他

insert_Patterntree（T，P）;/ /把p作为一个新的节点到树上，作为当前节点的子节点

P = p.next;

}

3.3双模式树的修剪

模式树的建立，有可能会出现大量的冗余分支，以确保数据挖掘结果将不影响所产生的噪音，所以需要修剪树木，去除噪声的信息的冗余分支。

算法：SPT（树），通过调用模型树剪枝算法

/ /模式树SPT支持，即支持的访问模式树头表项目

输入：的模式树PatternTree，Min_Sup（最小模式树的支持）

输出：修剪后的支持模式树SPT，模式B = {双向| I = 1,2,3 ...... n}的

SPT（树，A）

{i = 1;

（ai! = NULL）/ /在一个项目头表

{

（ai.count> = Min_Sup）

然后

{

模式的双向= ai.head节点;

P = ai.head ;/ / p的架构树节点指向AI

位置

（p! = null和ai.count> = Min_Sup）

{

搜寻p-群的前缀p基团，和p连接前缀，配置

到模式B;

（bi.count> = Min_Sup）

然后

{

/ / Bi.count模式p和pb是基的前缀

的最小计数

P在架构双向保留其前缀基地;

BI =双向。 node_link

}

其他

{

根据P和B模式的前缀碱基缺失

PatternTree相应的节点，子节点的重新配置

与父节点，并修改项目头表AI;

P = P。在模式树的node_next / / p指向

下一个位置;

}

其他

{

修改项目的头节点AI值;

删除模式对应的节点在树中前缀为基础，重建父子

节点;

i + +;

}

树的建立可以通过模式避免多个扫描事务数据库，同时有效利用计数字段保留项集数，以避免产生大量的频繁项集，减少时间和空间的复杂性，起到了一定的作用。树结构中，可避免通过大量的冗余规则。

通过模式树修剪，树可以扣模式生成过程中产生了大量的多余的树枝，发挥了作用，减少了空间的复杂性，可以利用输出模式B生产规则，避免了多套频繁出现，降低了时间复杂度。

4结论

该项目的树状结构模式改进的Apriori算法，Apriori算法来弥补缺陷。此方法不仅能够Apriori算法的时间复杂度和空间复杂度，以改善，同时也避免了产生中间规则。这项研究表明，通过使用模式树结构，减少存储Apriori算法的复杂性，同时减少了冗余的规则，提高Apriori算法的外观是一个有效的措施。

上一篇：代写paper-Development Planning of Nansha New Area

下一篇：加拿大代写paper ｜中小型企业如何控制成本费用