hadoop

hadoop

 英

  • 网络分布式计算;分布式计算平台;分布式文件系统

例句

If I'm a developer using Hadoop and want to look at a bit of data, it will let me run some reports against the file system.

如果使用Hadoop开发者想要查看一些数据那么可以通过文件系统报表达成

We're trying to follow the path taken by the Hadoop project concentrating on robustness, scaling, correctness, and community-building first.

我们追随Hadoop项目采取路线首先精力集中健壮性扩展正确性以及社区建立

As the hadoop-0. 20 is one of your primary interfaces to the Hadoop cluster, you'll see this utility used quite a bit through this article.

因为hadoop-0.20Hadoop集群主要接口之一看到本文多次使用这个实用程序

Now that I've coded my map and reduce implementations, all that's left to do is link everything up into a Hadoop Job.

现在已经mapreduce实现进行编码接下来所有一切链接到一个HadoopJob

From this article, it's easy to see how Hadoop makes distributed computing simple for processing large datasets.

通过本文容易看出Hadoop显著简化处理大型数据集分布式计算

All that's needed is a representation of the data in a vector form that the Hadoop infrastructure can use.

所有一切需要就是矢量格式表达Hadoop基础设施可以使用数据

Well, as you've probably guessed, Hadoop makes that easy to do.

当然已经猜到Hadoop可以轻松做到

But from the previous discussion, it's easy to see how Hadoop provides parallel processing of work.

但是通过前面讨论容易看出Hadoop如何提供并行处理

Not to be outdone, commercial Hadoop pioneer Cloudera announced an HDFS partnership of its own yesterday.

商业Hadoop先驱Cloudera不甘示弱昨天发布自己HDFS合作伙伴计划

A key part of the announcement was that Yahoo would make available a Hadoop enabled super computing data center named M45.

声明关键Yahoo建立一个使用Hadoop超级计算数据中心M45。

Alas, there are several things that Hadoop does not do, at least when accessed through the MapReduce interface.

几件事情Hadoop至少通过MapReduce访问接口

Now you have set up the Hadoop Cluster on the cloud, and it's ready to run the MapReduce applications.

现在已经设置Hadoop集群运行MapReduce应用程序

Since we are going to be connecting to the hadoop file system, we might as well test that as well.

因为我们连接hadoop文件系统我们不妨测试

One particularly handy aspect of Hadoop is that it handles the raw parsing of an input file, so that you can deal with one line at a time.

Hadoop可以输入文件进行原始解析一点特别有用这样可以每次处理一行

For all the other settings, keep the defaults or choose the same values as you did for the Hadoop Master node.

对于所有其他设置保留默认或者选择HadoopMaster节点相同

It is assumed that the Hadoop slave node has been configured a priori in such a manner that it registers with the Hadoop master node.

这里假设Hadoop节点已经之前配置完成也就是已经注册Hadoop节点

Now that you have installed Hadoop and tested the basic interface to its file system, it's time to test Hadoop in a real application.

既然已经安装Hadoop测试文件系统基本接口现在真实应用程序测试Hadoop

This article introduces you to the important configurable parameters of Hadoop and the method for analyzing and tuning performance.

本文介绍重要Hadoop配置参数以及分析调优性能方法

That magically seems to work, indicating that we can, indeed, connect to another machine and run the hadoop commands.

魔法似乎工作表明我们可以事实上连接另一台机器运行hadoop命令

The Hadoop runtime will split up the data (log files) that needs to be processed and give each node in your cluster a chunk of data.

Hadoop运行时分割需要处理数据一些日志文件集群每个节点分配一个数据

data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

Avro[1]最近加入ApacheHadoop家族项目之一支持数据密集型应用定义一种数据格式多种编程语言支持这种格式

One irony of this code and the Hadoop framework is that the input files do not have to be in the same format.

一个讽刺代码Hadoop框架输入文件需要相同格式

Hadoop is really designed to run in a distributed manner where it handles the coordination of various nodes running map and reduce.

Hadoop设计旨在一种分布式方式运行处理运行mapreduce各个节点之间协调性

You can perform a couple of tests to ensure that Hadoop is up and running normally (at least the namenode).

可以通过几个检查确认Hadoop至少namenode已经启动正常运行

Thanks to the cloud and Hadoop, it is now possible to handle large amounts of structured or unstructured data in a timely manner.

由于Hadoop出现及时处理大量结构结构数据目前成为可能

So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a Hadoop cluster.

所以过去周末一直从事一个业余爱好项目这个项目可以Hudson集群转化Hadoop集群

Run the clustering algorithm of choice using one of the many Hadoop-ready driver programs available in Mahout.

使用Mahout可用Hadoop就绪驱动程序运行集群算法

The two core components are the Hadoop Distributed File System for storing data and Hadoop MapReduce for writing parallel-processing jobs.

其中两个核心组件用于存储数据HadoopDistributedFileSystemHadoop分布式文件系统用于写入并行处理任务HadoopMapReduce

The company employs many of the core Hadoop contributors and intends to provide support and training.

公司雇佣了众多Hadoop项目核心人员提供相应支持培训

Open source software designed by IBM to help students develop programs for clusters running Hadoop.

IBM设计开源软件帮助学生们运行Hadoop集群开发程序

You could just use the raw output from Hadoop (a name and value on each line, separated by a space).

可以只是使用来自Hadoop原始输出一个名称空格分隔)。

As a distributed framework, Hadoop enables many applications that benefit from parallelization of data processing.

作为分布式框架Hadoop许多应用程序能够受益并行数据处理

Standalone Mode: By default, Hadoop is configured to run in a non-distributed standalone mode.

单独模式默认情况Hadoop分布单独模式运行

If not what is the plan in terms of moving it from an experimental technology to a core infrastructure component.

如果没有什么计划Hadoop一个实验产品核心基础组件迁移

developed Hadoop, permits AI systems to run data and algorithms across multiple servers simultaneously.

结合可以AI系统多个服务器同时运行数据算法

This flexibility can open new opportunities for Hadoop in a richer set of applications.

更加丰富应用程序集中灵活性可以Hadoop创造机会

feel this would be a big boost to both performance and utility, and it would leverage the power already provided by the Hadoop framework.

觉得一个巨大鼓舞作用表现用途影响作用力量已经提供Hadoop框架

Those log files can be huge, but the work will be split up among the machines (nodes) in your Hadoop cluster.

那些日志文件可能但是挖掘工作Hadoop集群多个机器节点之间分配

Instead, Hadoop can be viewed as a way to distribute both data and algorithms to hosts for faster parallel processing.

相反Hadoop可以视为一种可以同时数据算法分配主机获得快速并行处理速度方法

The next article in this series will explore how to configure Hadoop in a multi-node cluster with additional examples. See you then!

系列篇文章通过更多示例讨论如何节点集群配置Hadoop