Group 1 Notes

Present: Dominic, Dean, Peter, Mberry
Note-taker: Dominic
Date: 3/21/2017
Topic: Data science discussion continuation

Data Science
∘ Data visualization is an aspect of data science that aids in analysis can showcase interesting aspects of a data set
∘ data science is a continually emerging and growing field especially with the advent of big data and the challenges it brings
∘ benefits of data science in recent years have been brought it to the fore-front of modern business
∘ Data science has greatly improved the development of AI
--‣ where is data for AI stored?
--• cloud
--• cluster like Hadoop
--‣ need to store data so machine has access to it so that its learning algorithm (unsupervised) can build on the previous decisions it has made
--‣ Peter - combination of supervise and unsupervised algorithm worked extremely well for AI
∘ Peter - Random Forest
--‣ algorithm that takes like 100s decision trees on sample of data - the more the better
--‣ put trees into forest and feed it a sample that passes it through each decision tree which vote on status of data (each tree take a random sampling of the data)
--—• Peter's experience - determining if blood sample is contaminated or not (binary) decision, status of contamination is the result of decision tree voting
∘ Mary thinks Data science is the future of the world