Posts

Showing posts from April, 2020

Problem Statement of Crime

Analyze and categorize data into different kinds of crimes SF Police Department wants to categorize crimes that were registered in different police departments.   The data given by SF Police department has the following columns: Data Fields Dates - timestamp of the crime incident Category - category of the crime incident (only in train.csv). This is the target variable you are going to predict. Descript - detailed description of the crime incident (only in train.csv) DayOfWeek - the day of the week PdDistrict - name of the Police Department District Resolution - how the crime incident was resolved (only in train.csv) Address - the approximate street address of the crime incident X - Longitude Y – Latitude             First Load Data INTO Pig And Further Process for Next Step Work-Out In Hive Pig Send End-Data INTO Hbase Spark-Core & Spark-SQL Scala     Try to Write Multi[ple Use – Cases Based on DataSets …. And Scenario’s  

Solution of Real Estate Use Case

Image
                                              SOLUTION OF REAL ESTATE USE CASE   create table real1(street string,city string,zip int,state string,beds int,baths int,sq_ft int,type string,sale_date string,price int,lalitude string,longitude string)     row format delimited                 fields terminated by',';                 load data local inpath '/home/cloudera/hadoopData/real state.csv' into table real1;   1. Problem Statement:           City wise list all the Villa which is not less than ten thousand.                 select city from real where ((price >10000) and (type=='villa')) group by city;                 2.          Problem Statement: In GALT city which residential type has more than 800sq__ft. Display their respective details street,sq__ft,sale_date,city.                      select street,sq_ft,sale_date,city from real1 where ((city=='GALT') AND (sq_ft>800));                               

Problem Statement Of Real Estate Use Cases

Hive Use Case – Real Estate Analysis   Due to an industry, real estate activity is outlined as any economic dealings associated with the acquisition, sale, owner-operation or lease of property. This in addition includes income-generating residential properties, like flat, buildings and single-room rentals. Real estate services are not enclosed within the sector. Also, samples of real estate services embrace brokerages, property management, appraisers, investment property analysts and different consultants. All analysts, working with Big Data are using Hive or some other tool to query dataset and get results with ease. Although other querying languages exists, Hive gives us a variety of new features when compared to traditional approaches. So,On demand of accelerating consumers in real estate field, filtered series of information was collected and handed over to data analyst team. Load data into Hive. Problem Statement:           City wise list all the Villa which is not

Tableau

As a leading data visualization tool, Tableau has many desirable and unique features. Its powerful data discovery and exploration application allows you to answer important questions in seconds. You can use Tableau's drag and drop interface to visualize any data, explore different views, and even combine multiple databases easily. It does not require any complex scripting. Anyone who understands the business problems can address it with a visualization of the relevant data. After analysis, sharing with others is as easy as publishing to Tableau Server. Tableau Features Tableau provides solutions for all kinds of industries, departments, and data environments. Following are some unique features which enable Tableau to handle diverse scenarios. Speed of Analysis  − As it does not require high level of programming expertise, any user with access to data can start using it to derive value from the data. Self-Reliant  − Tableau does not need a complex software setup. The desktop version

Flume

Image
What is Flume? Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log files, events (etc...) from various sources to a centralized data store. Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to HDFS. Applications of Flume Assume an e-commerce web application wants to analyze the customer behavior from a particular region. To do so, they would need to move the available log data in to Hadoop for analysis. Here, Apache Flume comes to our rescue. Flume is used to move the log data generated by application servers into HDFS at a higher speed. Advantages of Flume Here are the advantages of using Flume − Using Apache Flume we can store the data in to any of the centralized stores (HBase, HDFS). When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as

MapReduce

Image
Introduction To MapReduce: ·        MapReduce is a computing mo del that decomposes larger manipulation jobs   into individual tasks . ·        These tasks can be executed parallel across the cluster. ·        The results of the tasks are joined together to form the final result . ·        MapReduce is the data processing component of Hadoop. ·        Mapreduce transforms the list of input data into list of output data elements. ·        Mapreduce is the heart of hadoop . It is designed for processing huge amount of data . ·        There are two different processing layers: 1.     Map 2.     Reduce Different Phases in Mapreduce :       Map: ·        Map takes the set of data & convert into another set of data where individual elements are broken down into tuples(key, value pairs). ·        Here data can be in structured or unstructured format. ·        Key is reference to input value.(IntWritable, LongWritable) ·        Value is a dataset on