hadoop

美　英

网络分布式计算；分布式计算平台；分布式文件系统

例句

If I'm a developer using ~~Hadoop~~ and want to look at a bit of data, it will let me run some reports against the file system.

如果我是个使用Hadoop的开发者，想要查看一些数据，那么就可以通过文件系统报表达成所愿。

We're trying to follow the path taken by the ~~Hadoop~~ project concentrating on robustness, scaling, correctness, and community-building first.

我们将追随Hadoop项目所采取的路线，首先把精力集中在健壮性、扩展性、正确性以及社区建立上。

As the hadoop-0. 20 is one of your primary interfaces to the ~~Hadoop~~ cluster, you'll see this utility used quite a bit through this article.

因为hadoop-0.20是Hadoop集群的主要接口之一，您会看到本文中多次使用这个实用程序。

Now that I've coded my map and reduce implementations, all that's left to do is link everything up into a ~~Hadoop~~ Job.

现在我已经对我的map和reduce实现进行了编码，接下来所要做的是将所有这一切链接到一个HadoopJob。

From this article, it's easy to see how ~~Hadoop~~ makes distributed computing simple for processing large datasets.

通过本文很容易看出Hadoop显著简化了处理大型数据集的分布式计算。

All that's needed is a representation of the data in a vector form that the ~~Hadoop~~ infrastructure can use.

所有这一切的需要就是用矢量格式表达Hadoop基础设施可以使用的数据。

Well, as you've probably guessed, ~~Hadoop~~ makes that easy to do.

当然，您已经猜到了，Hadoop可以轻松地做到。

But from the previous discussion, it's easy to see how ~~Hadoop~~ provides parallel processing of work.

但是，通过前面的讨论很容易看出Hadoop如何提供并行处理。

Not to be outdone, commercial ~~Hadoop~~ pioneer Cloudera announced an HDFS partnership of its own yesterday.

商业Hadoop的先驱Cloudera也不甘示弱，于昨天发布了自己的HDFS合作伙伴计划。

A key part of the announcement was that Yahoo would make available a ~~Hadoop~~ enabled super computing data center named M45.

该声明的关键是Yahoo将建立一个使用Hadoop的超级计算数据中心，名为M45。

Alas, there are several things that ~~Hadoop~~ does not do, at least when accessed through the MapReduce interface.

唉，有几件事情Hadoop也不做，至少在通过MapReduce访问接口。

Now you have set up the ~~Hadoop~~ Cluster on the cloud, and it's ready to run the MapReduce applications.

现在，已经在云中设置了Hadoop集群，该运行MapReduce应用程序了。

Since we are going to be connecting to the ~~hadoop~~ file system, we might as well test that as well.

因为我们要连接到hadoop文件系统，我们不妨测试。

One particularly handy aspect of ~~Hadoop~~ is that it handles the raw parsing of an input file, so that you can deal with one line at a time.

Hadoop可以对输入文件进行原始解析，这一点特别有用，这样您就可以每次处理一行。

For all the other settings, keep the defaults or choose the same values as you did for the ~~Hadoop~~ Master node.

对于所有其他设置，保留其默认值或者选择与HadoopMaster节点相同的值。

It is assumed that the ~~Hadoop~~ slave node has been configured a priori in such a manner that it registers with the ~~Hadoop~~ master node.

这里假设Hadoop从节点已经在之前配置完成，也就是它已经注册到Hadoop主节点中。

Now that you have installed ~~Hadoop~~ and tested the basic interface to its file system, it's time to test ~~Hadoop~~ in a real application.

既然已经安装了Hadoop并测试了文件系统的基本接口，现在就该在真实的应用程序中测试Hadoop了。

This article introduces you to the important configurable parameters of ~~Hadoop~~ and the method for analyzing and tuning performance.

本文介绍重要的Hadoop可配置参数以及分析和调优性能的方法。

That magically seems to work, indicating that we can, indeed, connect to another machine and run the ~~hadoop~~ commands.

魔法般的似乎工作，表明我们可以，事实上，连接到另一台机器上，运行hadoop命令。

The ~~Hadoop~~ runtime will split up the data (log files) that needs to be processed and give each node in your cluster a chunk of data.

Hadoop运行时将分割需要处理的数据（一些日志文件）并向您的集群中的每个节点分配一个数据块。

data format designed to support data-intensive applications, and provides support for this format in a variety of programming languages.

Avro[1]是最近加入到Apache的Hadoop家族的项目之一。为支持数据密集型应用，它定义了一种数据格式并在多种编程语言中支持这种格式。

One irony of this code and the ~~Hadoop~~ framework is that the input files do not have to be in the same format.

一个讽刺，这段代码和Hadoop框架是输入文件不需要在相同的格式。

~~Hadoop~~ is really designed to run in a distributed manner where it handles the coordination of various nodes running map and reduce.

Hadoop的设计旨在以一种分布式方式运行，处理运行map和reduce的各个节点之间的协调性。

You can perform a couple of tests to ensure that ~~Hadoop~~ is up and running normally (at least the namenode).

可以通过几个检查确认Hadoop（至少是namenode）已经启动并正常运行。

Thanks to the cloud and ~~Hadoop~~, it is now possible to handle large amounts of structured or unstructured data in a timely manner.

由于云和Hadoop的出现，及时处理大量的结构化或非结构化数据目前已成为可能。

So over the past 2 weekends, I've worked on a hobby project, which lets you turn your Hudson cluster into a ~~Hadoop~~ cluster.

所以在过去的两个周末里，我一直在从事一个业余爱好项目，这个项目可以把Hudson集群转化成Hadoop集群。

Run the clustering algorithm of choice using one of the many ~~Hadoop~~-ready driver programs available in Mahout.

使用Mahout中可用的Hadoop就绪的驱动程序运行所选集群算法。

The two core components are the ~~Hadoop~~ Distributed File System for storing data and ~~Hadoop~~ MapReduce for writing parallel-processing jobs.

其中两个核心组件是用于存储数据的HadoopDistributedFileSystem（Hadoop分布式文件系统）和用于写入并行处理任务的HadoopMapReduce。

The company employs many of the core ~~Hadoop~~ contributors and intends to provide support and training.

该公司雇佣了众多Hadoop项目的核心人员欲以提供相应的支持和培训。

Open source software designed by IBM to help students develop programs for clusters running ~~Hadoop~~.

IBM设计了开源软件去帮助学生们为运行Hadoop的集群开发程序。

You could just use the raw output from ~~Hadoop~~ (a name and value on each line, separated by a space).

您可以只是使用来自Hadoop的原始输出（每行上有一个名称和值，用空格分隔）。

As a distributed framework, ~~Hadoop~~ enables many applications that benefit from parallelization of data processing.

作为分布式框架，Hadoop让许多应用程序能够受益于并行数据处理。

Standalone Mode: By default, ~~Hadoop~~ is configured to run in a non-distributed standalone mode.

单独模式：在默认情况下，Hadoop以非分布的单独模式运行。

If not what is the plan in terms of moving it from an experimental technology to a core infrastructure component.

如果还没有，有什么计划让Hadoop从一个实验性的产品向核心基础组件迁移？

developed ~~Hadoop~~, permits AI systems to run data and algorithms across multiple servers simultaneously.

的结合，可以让AI系统在多个服务器上同时的运行数据和算法。

This flexibility can open new opportunities for ~~Hadoop~~ in a richer set of applications.

在更加丰富的应用程序集中此灵活性可以为Hadoop创造新的机会。

feel this would be a big boost to both performance and utility, and it would leverage the power already provided by the ~~Hadoop~~ framework.

我觉得这将是一个巨大的鼓舞作用及表现的用途上，而它将影响作用的力量已经提供Hadoop框架。

Those log files can be huge, but the work will be split up among the machines (nodes) in your ~~Hadoop~~ cluster.

那些日志文件可能很大，但是挖掘工作将在您的Hadoop集群中的多个机器（节点）之间分配。

Instead, ~~Hadoop~~ can be viewed as a way to distribute both data and algorithms to hosts for faster parallel processing.

相反地，Hadoop可以被视为一种可以同时将数据和算法分配到主机以获得更快速的并行处理速度的方法。

The next article in this series will explore how to configure ~~Hadoop~~ in a multi-node cluster with additional examples. See you then!

本系列中的下一篇文章通过更多示例讨论如何在多节点集群中配置Hadoop。

热门查询

考试词汇表