代写英国assignment:计算机类Di
浏览: 日期:2020-01-13
这是一篇代写计算机assignment之分布式列数据库的作业要求,要求内容如下
Assessment Summary
Weighting: 15%
Due Date: 11pm Sun 29 May (End of Week 11)
Submission
One word document containing all your answers
Assignment Overview
在此作业中,系统会要求您编写一份报告,以显示(1)如何使用Hbase存储您在作业1中使用的电影标题数据集,(2)与关系相比,如何从Hbase模型中获益。模型,以及(3)Hbase和Hadoop分布式文件系统(HDFS)之间的关系。
此类报告提供关键信息,以帮助组织确定像Hbase这样的新系统是否适合其业务。请注意,如果没有预先分析,在业务运营中将新系统置于试验中是不合理的。这项任务可以作为这种类型的批判性分析。
您使用的材料包括演讲幻灯片,推荐视频和教程(第8周文件夹)以及您可以找到的其他互联网资源和/或书籍。请注意,您不得复制这些材料;否则,你会犯抄袭,并会使用大学正式的抄袭程序。
应用和要求
您将获得一个word文档作为模板。重命名文档并在其中写下答案。您的答案适用于以下部分。
1. Identify 10 movies in the movie title dataset you used in assignment 1. The 10 movies should be representative in structure of the movies in the dataset. The data of these 10 movies is called the sample data.
You include the sample data as part of the report.
2. Design a relational representation for the selected data by showing a table with headings and the tuples for the sample data.
3. Design a logical schema for the Hbase for the movie title dataset and show the data for the sample data together with the schema.
The schema would include a row key, and some column families. The sample data would be presented as attribute-value pairs in each column family.
You justify the reasons which you choose the row key and the column families.
4. Show HTables and region files for the sample data. You assume that each region can contain data of 3-4 movies for a column family. Each of these should be shown in a separate table for clarity.
5. Given a HDFS with two racks of nodes and each rack with three slave computers, draw a diagram to show a way in which the Hbase region files will be stored in the HDFS.
6. Identify two example queries and analyze how they can be benefitted by the Hbase you design above in comparison with the relational model. One of the queries should be a search query (like the one shown below) and the other must be an aggregate query (with sum, avg, etc).
An example search query is like “find the year of a specific movie”.
To address whether the query is benefitted by the Hbase, you need to explain which part of the data will be retrieved in referencing your answers to Parts 4 and 5 above, how the final answer is calculated (as the data is distributed) etc. You then compare with the processing of the relational model in Part 2. The analysis of the relational database is also dependent on how many records a disk block can store. You assume that the relational database is centrally stored. The comparison needs to consider measures like disk reading time, calculation time, data transportation time/cost, and other measures that you think meaningful.
You may use tables and diagrams to make the presentation more readable.