excel

Language/Tool: Excel (PowerPivot)
Main Purpose: Basic (and most well-known) way to process data for simple analysis purposes
Benefits of this resource: most well-known and widely used resource
Limitations: limits observations and for use with Microsoft Excel 2010 only.  Excel 2013 support is in progress
Compatible with (file types?): xls(x)
Level of Expertise Required: Low
Where to access tool/Software: Free add-on for download http://www.microsoft.com/en-us/download/details.aspx?id=7609
Sources to Learn: http://technet.microsoft.com/en-us/library/gg399093.aspx
 
Back To Top


R - OPEN SOURCE

Language/Tool: R
Main Purpose: Large datasets (1M+ rows), statistical methods on datasets of any size
Benefits of this resource:
ggplot creates many different types of useful data visualizations
robust ecosystem--if you want to do something, there’s almost certainly a package that makes it easy to do
open source and free
Limitations:
no GUI, a little bit of programming is required
slow when working with huge datasets, can’t handle huge datasets as easily as SAS or Pandas (Python)
Compatible with (file types?):
probably all, definitely CSV, XLS(X), TXT, PSV
Level of Expertise Required: High
comfort with programming (programming for R in and of itself is not that tough, but you need to be willing to write commands instead of manipulating a GUI)
Where to Access Tool/Software: R system package free at http://cran.rstudio.com; R studio software free at http://www.rstudio.com/products/rstudio/download/
Sources to learn:
 
Back To Top


python

Language/Tool: Python
Main Purpose: web/app development, automation, data mining and analytics, data visualization
Benefits of this resource: More readable than many programming languages, and widely used among amateur and self-taught programmers. There are countless add-on library packages as well; often someone else has already done the work of writing a tricky piece of code, all you have to do is import it.
Limitations: slower than more advanced languages, open source libraries mean you might occasionally run into bugs
Compatible with (file types?): .py
Level of Expertise Required: High
Where to Access Tool/Software: free at https://www.python.org/downloads/ or for a more complete offering with multiple libraries and a few choices of command shells for easier coding and debugging: https://store.continuum.io/cshop/anaconda/
Sources to learn:
Code Academy (http://www.codecademy.com/learn
 
Back To Top



Language/Tool: Ruby
Main Purpose: web and application development as part of Rails, mostly
Benefits of this resource: Maybe the most intuitive popular programming language, Rails makes it easy to build large-scale applications in minutes
Limitations: slow, larger ecosystem for non-Rails Ruby tasks (scripting) in Python
Compatible with (file types?): N/A
Level of Expertise Required: High
Where to Access Tool/Software: free at http://rubyonrails.org/download/
Sources to learn:
Code Academy (http://www.codecademy.com/learn)
 
Back To Top

RUBY


Language/Tool: SQL
Main Purpose: Pulling data from databases and then in limited instances manipulating that data
Benefits of this resource:
Process extremely large datasets
Widespread implementation--used by a huge percentage of organizations to store data
Limitations: Very few resources for data analysis
Compatible with (file types?): .sql
Level of Expertise Required: Moderate
Where to Access Tool/Software: MySQL Community Server free at http://dev.mysql.com/downloads/mysql/
phpMyAdmin is a web based platform for learning SQL
Sources to learn:
 
Back To Top

sql


Language/Tool: SAS
Main Purpose: Clean and combine large datasets for easier use; perform statistical analysis of the data
Benefits of this resource:
Process extremely large datasets
Easy to clean, manipulate, and merge datasets
Limitations: Poor visualization
Compatible with (file types?): XLS(X), CSV, TXT, ACCDB, ...
Level of Expertise Required: High
Where to Access Tool/Software: license only (expensive)
Sources to learn:
The Little SAS Book: A Primer by Lora Delwiche and Susan Slaughter
 
Back To Top

SAS


Language/Tool: Tableau
Main Purpose: Visualize data
Benefits of this resource: easy to use, especially if you know how to use pivot tables; makes a wide variety of dynamic visuals, including location-based
Limitations: not easy to create certain calculated variables
Compatible with (file types?): XLS(X), CSV
Level of Expertise Required: Low
Where to Access Tool/Software: free for students at http://www.tableausoftware.com/academic/students
Sources to learn:
http://www.tableausoftware.com/learn/training
 
Back To Top

tableau


Language/Tool: Crystal Ball (by Oracle)
Main Purpose: Spreadsheet based application for predictive modeling, forecasting, simulation, and optmization
Benefits of this resource:
Builds on Monte Carlo and predictive modeling tools
Provides optimization and calculation capabilities
Limitations: Primarily used for predictive and optimization analytics.  Excel based.
Compatible with (file types?): XLS(X)
Level of Expertise Required: Low
Where to Access Tool/Software: TBD NYU stern students are offered the program for free
Sources to Learn: http://www.oracle.com/us/products/applications/crystalball/resources/index.html (under the resources tab)
 
Back To Top

Crystal Ball (by Oracle)


Language/Tool: Hadoop
Main Purpose: Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.
Benefits of this resource:
Incredibly robust for large sets of data
Does not require high-end hardware
Currently open source but there are other vendors that have more sophisticated software utilizing the Hadoop platform (Cloudera, IBM BigInsights)
Limitations: Primarily used for visualizing big data.  SQL based.
Level of Expertise Required: Medium
Where to Access Tool/Software:
Apache Hadoop: http://hadoop.apache.org/releases.html
Cloudera: http://www.cloudera.com/content/cloudera/en/downloads.html
SAS: http://www.sas.com/en_us/insights/big-data/hadoop.html
IBM: http://www-01.ibm.com/software/data/infosphere/hadoop/products.html

hadoop