Hadoop – HDP 2.3: grande passo para o Open Enterprise Hadoop

HPD (Hortonworks Data Platform) 2.3 representa mais um importante passo em direção ao Hadoop como plataforma de dados corporativos. Essa versão incorpora as mais recentes inovações que aconteceram no Hadoop e em seu ecossistema de projetos. A HDP 2.3 conta com mais de cem funcionalidades novas em todos os nossos projetos existentes. Cada componente é atualizado e adicionamos algumas tecnologias e recursos à HDP 2.3.

Principais destaques da HDP 2.3:

Breakthrough Usability for Hadoop

HDP 2.3 eliminates much of the complexity administering Hadoop and improves developer productivity

HDP 2.3 leverages the Ambari Views Framework to deliver new user views and a breakthrough user experience for both cluster operators and developers.

For Hadoop Operators…

  • Smart ConfigurationAn entirely new user experience within Ambari which is guided, opinionated, and more digestible for configuration of HDFS, YARN, HBase, and Hive.
  • YARN Capacity SchedulerConfigure shared access to large clusters through a much easier web interface to the YARN Capacity Scheduler.
  • Customized DashboardsCreate a tailored dashboard and keep an eye on the metrics you value most.

…and for developers

  • Fast and easy SQL Editor for Hive.An integrated experience that allows for SQL query building, displaying a visual “explain plan”, and allowing for an extended debugging experience when using the Tez execution engine.
  • Easy Pig editor and web based HDFS browserIn addition to the SQL builder, a Pig Latin Editor brings a modern browser-based IDE experience to Pig. There is also a File Browser for HDFS.
  • An entirely new user experience for Apache FalconA web-forms based approach allows for rapid development of feeds and processes. The new Falcon UI also allows you to search and browse processes that have executed, visualize lineage and setup mirroring jobs to replicate files and databases between clusters or to cloud storage such as Microsoft Azure Storage.

Impressive improvements across all data access engines

Consolidating access to data YARN as its architectural center As organizations strive to efficiently store their data in a single repository and interact with it simultaneously in different ways, they need SQL, streaming, data science, batch and more… all in the same cluster. HDP 2.3 adds new engines including:

Enhanced SQL Semantics in Apache Hive

Hive adds time intervals and UNION semantics, 2.5x performance improvements and improved query scheduling, along with a more streamlined user interface for Hive within Ambari.

Solr on YARN

The Solr search engine is being built to run on YARN and is now in technical preview. This critical advancement allows customers to reduce their total cost of ownership by deploying Solr within the same cluster as other workloads – eliminating the need for a “side cluster” dedicated to indexing data and delivering search results.

New capabilities for feature-rich Spark applications

Apache Spark on YARN is enhanced with the new DataFrame API, machine learning algorithms such as clustering, frequent pattern-mining algorithms and a technology preview of SparkSQL.

Advances towards comprehensive security and governance

Centralized Authorization
Security administrators can now define and manage security policies and capture security audit information for HDFS, Hive, HBase, Knox, Storm and now Solr, Kafka and YARN.
HDFS DARE (Data At Rest Encryption)
Provides security administrators the ability to manage keys and authorization policies for key management store (KMS) by introducing data encryption to encrypt data in HDFS files, combined with Apache Ranger embedded open source Hadoop KMS.
Audit Optimization and Scalable Storage
Provides a framework to optimize audit creation and storage, with interactive query powered by Solr. Users now have ability to combine security audit with data lineage in Apache Atlas for a comprehensive view of their data.

Introducing Apache Atlas

A common approach to Hadoop data governance from the open source community

As enterprises across all major industries deploy Hadoop into corporate data and processing environments, a common approach to working with metadata and data governance becomes a necessity.

Apache Atlas was created by a consortium of enterprises and Hortonworks to meet this need. Atlas enhances governance capabilities in Hadoop for both prescriptive and forensic models enriched by taxonomical metadata. Atlas, at its core, is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack. Atlas enables platform-agnostic governance controls that effectively address enterprise compliance requirements.

Hortonworks SmartSense

Proactive monitoring and maintenance with your HDP Support Subscription

Deploy HDP with proactive and intelligent support. Hortonworks SmartSense gathers insight, provides recommendations, and helps optimize cluster utilization and health. Hortonworks SmartSense is included with every HDP Support Subscription.

Faster Support Case Resolution
By easily capturing log files and metrics for insight and resolution.
Proactive Cluster Configuration
Via intelligent stream of cluster analytics and data-driven recommendation.
Long-Range Cluster Optimization
Through a proactive view into customer’s cluster utilization that can be used to drive capacity planning.

Sobre o autor


Somos uma consultoria de Business Intelligence e Data Warehousing que atua desde 2000, guiando as empresas a transformar seus dados em valiosas informações que transformam os seus negócios.

Posts recentes






/* ]]> */