Cloud Network

Saturday, April 13, 2019

AWS (Amazon Web Services) Interview Question and Answer for Fresher and Experience - 2019

Cloud Network April 13, 2019

Q: – What is AWS ?

Ans:-

AWS (Amazon Web Services) is a platform to provide secure cloud services, database storage, offerings to compute power, content delivery, and other services to help business level and develop.

AWS stands for Amazon Web Services. AWS is a platform that provides on-demand resources for hosting web services, storage, networking, databases and other resources over the internet with a pay-as-you-go pricing.

AWS stands for Amazon Web Service; it is a collection of remote computing services also known as a cloud computing platform. This new realm of cloud computing is also known as IaaS or Infrastructure as a Service.

It is the acronym for Amazon Web Service. It is a comprehensive, evolving cloud-computing platform of Amazon. It is also known as Infrastructure as a Service (IaaS).

Q: – What is AWS ?

This is one of the most basic AWS interview questions and can be answered in both simple and complex structures, depending on the interviewer.

According to the terminology, AWS or Amazon Web Services is defined as a platform which is designed to provide secure cloud services, computing power to clients, database storage options, content delivery and many other services which are all intended towards business development and growth.

AWS stands for Amazon Web Services and is a platform that provides database storage, secure cloud services, offering to compute power, content delivery, and many other services to develop business levels.

Q: – What is the importance of buffer in Amazon Web Services?

Ans:- An Elastic Load Balancer ensures that the incoming traffic is distributed optimally across various AWS instances. A buffer will synchronize different components and makes the arrangement additional elastic to a burst of load or traffic. The components are prone to work in an unstable way of receiving and processing the requests.

The buffer creates the equilibrium linking various apparatus and crafts them effort at the identical rate to supply more rapid services.

Q: – What do you know about Buffer in AWS?

Ans:- A buffer is necessary in any cloud computing technology in order to maintain seamless integration across a huge flow of traffic and loads. The Elastic Load Balancer in Amazon Web Services has been designed in a way to ensure that all the incoming traffic is optimally distributed across all channels of AWS instances.

The presence of a buffer enables the components to work in an unstable situation and receive and process requests as it gets them. Essentially the presence of a buffer is needed to create an equilibrium between all the apparatus and provide them with an identical ability to supply more rapid services

Q: – How is buffer used in AWS?

Ans:-It is used to make the system more robust and manage traffic by synchronizing different components.The component processes the requests in an imbalanced way.Using buffer, the components work at the same speed for faster services and will also be balanced.

Q: – What are the components of AWS?

Ans:- EC2 – Elastic Compute Cloud, S3 – Simple Storage Service, Route53, EBS – Elastic Block Store, Cloudwatch, Key-Paris are few of the components of AWS.

Q: – List the components required to build Amazon VPC?

Ans:- Subnet, Internet Gateway, NAT Gateway, HW VPN Connection, Virtual Private Gateway, Customer Gateway, Router, Peering Connection, VPC Endpoint for S3, Egress-only Internet Gateway.

Q: – Mention what are the key components of AWS are?

Ans:- The key components of AWS are

Route 53:A DNS web service
Simple E-mail Service:It allows sending e-mail using RESTFUL API call or via regular SMTP
Identity and Access Management:It provides enhanced security and identity management for your AWS account
Simple Storage Device or (S3):It is a storage device and the most widely used AWS service
Elastic Compute Cloud (EC2): It provides on-demand computing resources for hosting applications. It is handy in case of unpredictable workloads
Elastic Block Store (EBS):It offers persistent storage volumes that attach to EC2 to allow you to persist data past the lifespan of a single Amazon EC2 instance
CloudWatch: To monitor AWS resources, It allows administrators to view and collect key Also, one can set a notification alarm in case of trouble.

Q: –Name the basic components of Amazon Web Services?

Ans:- Amazon Web Services or AWS consists of 4 main components that are as listed below:

Amazon S3: This component has been designed to enable one to retrieve information which has been occupied in creating the cloud structural design and also retrieve the produced information as a consequence of the specified key.
Amazon EC2 instance: This component has been designed in order to run automatic parallelization and also achieve job scheduling. This instance is immensely helpful in running a large distributed system on the Hadoop Cluster.
Amazon SimpleDB: This component helps in the storage of the transitional positional log and also run the errands when they are executed by the client or the consumer.
Amazon SQS: This component has been mainly designed to act as a mediator between different controllers. This is an additional cushioning for the managers at Amazon.

Q: – List out the main components of AWS.

Ans:- Similar to other Cloud Services in the industry, AWS too has been designed in a structured manner and has several key components. Mentioned below is the list of the same:

Amazon Route 53: This is a DNS (Domain Name Service) web service.

Easy Email Service: This service allows customers and clients to address email utilization through normal SMTP or RESTFUL API.

Access Management and Identity: This has been designed in order to provide heightened identity control and protection for a client’s AWS account.

S3 or Simple Storage Device: It is a very well-known utility among all the AWS Services and is mainly used in warehouse equipment.

EC2 or Elastic Compute Cloud: This is a utility designed to manage variable workloads and gives clients the ability to afford on-demand computing sources for hosting.

EBS or Elastic Block Store: This particular utility is used to expand beyond EC2 and is designed to connect EC2 to enable the lifespan of data beyond the capacities of EC2.

Cloud Watch: This is mainly a crisis management utility and is designed to help managers inspect and obtain additional resources in the light of a crisis.

Q: – List the Features of Amazon cloud search?

Ans:- The Amazon cloud search features:

Range searches
Prefix Searches
Boolean Searches
Entire text search with language specific text processing
Highlighting
AutoComplete advice
Faceting term boosting

Q: – What is auto-scaling?

Ans:- Auto-scaling is a feature of AWS which allows you to configure and automatically provision and spin-up new instances without the need for your intervention.

Q: – Define auto-scaling ?

Ans:- Auto- scaling is one of the remarkable features of AWS where it permits you to arrange and robotically stipulation and spin up fresh examples without the requirement for your involvement. This can be achieved by setting brinks and metrics to watch. If those entrances are overcome, a fresh example of your selection will be configured, spun up and copied into the weight planner collection.

It is one of the outstanding features of AWS, which permits the arrangement and stipulation robotically and also the spin up fresh example without the user’s involvement.This feature can be achieved by setting metrics and brinks to the watch.

A fresh example of the user’s selection is configured, spinup and copied to the weight planner collection if we overcome all those entrances.

Q: – What do you mean by classic link?

Ans:- The Amazon virtual private cloud classic link will permit EC2 instances in the EC2 classic platform. This occurs so that it can communicate with the instances that are present in the virtual private cloud. The communication occurs with the help of private IP addresses. In order to use a classic link it is important that you enable it to for virtual private cloud in your account. Then you will need to associate a security group with an instance in the EC2 classic. This security group is from the VPC for which you enabled the classic link in your account. Each and every rule that is there for the VPC security group is applicable for the communications between the instances in EC2 classic and those instances in the VPC.

Q: –Explain what is Amazon S3 ?

Ans:-S3 stands for Simple Storage Service. You can use S3 interface to store and retrieve any amount of data, at any time and from anywhere on the web. For S3, the payment model is “pay as you go.”

S3 stands for Simple Storage Service. It is a storage service that provides an interface that you can use to store any amount of data, at any time, from anywhere in the world. With S3 you pay only for what you use and the payment model is pay-as-you-go.

Amazon S3 (Simple Storage Service) is an object storage with a simple web service interface to store and retrieve any amount of data from anywhere on the web.

S3 stands for Simple Storage Service, which is an interface to store and retrieve any amount of data from anywhere and at any time on the web.”Pay as you go” is the payment model for S3.

Q: – What are the pricing models for EC2 instances?

Ans:- The different pricing model for EC2 instances are as below,

On-demand
Reserved
Spot
Scheduled
Dedicated

Q: – What are key-pairs?

Ans:- Key-pairs are secure login information for your instances/virtual machines. To connect to the instances we use key-pairs that contain a public-key and private-key.

Q: – What is the way to secure data for carrying in the cloud?

Ans:- One thing must be ensured that no one should seize the information in the cloud while data is moving from point one to another and also there should not be any leakage with the security key from several storerooms in the cloud. Segregation of information from additional companies’ information and then encrypting it by means of approved methods is one of the options.

Amazon Web Services offers you a secure way of carrying data in the cloud

Q: – Name the several layers of Cloud Computing.

Ans:- Here is the list of layers of the cloud computing

PaaS – Platform as a Service

IaaS – Infrastructure as a Service

SaaS – Software as a Service

Q: – What type of performance can you expect from Elastic Block Storage? How do you back it up and enhance the performance ?

Ans:-Performance of an elastic block storage varies i.e. it can go above the SLA performance level and after that drop below it. SLA provides an average disk I/O rate which can at times frustrate performance experts who yearn for reliable and consistent disk throughput on a server. Virtual AWS instances do not behave this way. One can backup EBS volumes through a graphical user interface like elasticfox or use the snapshot facility through an API call. Also, the performance can be improved by using Linux software raid and striping across four volumes.

Q: – Imagine that you have an AWS application that requires 24x7 availability and can be down only for a maximum of 15 minutes. How will you ensure that the database hosted on your EBS volume is backed up?

Ans:- Automated backup are the key processes here as they work in the background without requiring any manual intervention. Whenever there is a need to back up the data, AWS API and AWS CLI play a vital role in automating the process through scripts. The best way is to prepare for a timely backup of EBS of the EC2 instance. The EBS snapshot should be stored on Amazon S3 and can be used for recovery of the database instance in case of any failure or downtime.

Q: – Difference between Security Groups and ACLs in a VPC?

Ans:- A Security Group defines which traffic is allowed TO or FROM EC2 instance. Whereas ACL, controls at the SUBNET level, scrutinize the traffic TO or FROM a Subnet.

Q: – What does an AMI include?

Ans:- An AMI includes the following things

A template for the root volume for the instance

Launch permissions decide which AWS accounts can avail the AMI to launch instances

A block device mapping that determines the volumes to attach to the instance when it is launched

Q: – What is Amazon Machine Image (AMI) ?

Ans:- AMI stands for Amazon Machine Image. It’s a template that provides the information (an operating system, an application server, and applications) required to launch an instance, which is a copy of the AMI running as a virtual server in the cloud. You can launch instances from as many different AMIs as you need.

Amazon Machine Image is the full form of AMI.It is actually a template that provides the information of the operating system, server, applications etc., required to launch an instance that is the replica of the AMI running in the cloud as a virtual server.

An instance can be launched from as many different AMIs as per the requirement.

A Machine Image on Amazon (AMI) contains a software configuration information like OS information, app server, and app information. We can even launch multiple instances of an AMI.

Q: – What is an AMI?

Ans:- AMI (Amazon Machine Image) is a snapshot of the root filesystem.

Q: – An organization wants to deploy a two-tier web applications on AWS. The application requires complex query processing and table joins. However, the company has limited resources and requires high availability. Which is the best configuration that company can opt for based on the requirements ?

Ans:- DynamoDB deals with core problems of database scalability, management, reliability, and performance but does not have the functionalities of a RDBMS. DynamoDB does not render support for complex joins or query processing or complex transactions. You can run a relational engine on Amazon RDS or EC2 for this kind of a functionality.

Q: – If you have half of the workload on public cloud while the other half is on local storage, what kind of architecture will you use for this ?

Ans:- Hybrid Cloud Architecture

Q: – Is it possible to cast-off S3 with EC2 instances ? If yes, how ?

Ans:- It is possible to cast-off S3 with EC2 instances using root approaches backed by native occurrence storage.

Q: – How many EC2 instances can be used in a VPC ?

Ans:- There is a limit of running up to a total of 20 on-demand instances across the instance family , you can purchase 20 reserved instances and request spot instances as per your dynamic spot limit region.

Tuesday, April 9, 2019

Machine Learning (ML) Interview Questions and Answers with Advanced

Cloud Network April 09, 2019

Interview Questions and Answers

Q: – What is Machine learning?
Ans:- Machine learning is a branch of computer science which deals with system programming in order to automatically learn and improve with experience.
For example: Robots are programmed so that they can perform the task based on data they gather from sensors. It automatically learns programs from data.

Machine learning is the field of study that gives the computer the ability to learn and improve from experience without explicitly taught or programmed.
In traditional programs, the rules are coded for a program to make decisions, but in machine learning, the program learns based on the data to make decisions.

Q: – What do you understand by Machine Learning?
Ans:- Machine learning is an application of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

Q: – What are Different types of learning in machine learning ?
Ans:-
Supervised learning
Unsupervised learning
Reinforcement learning
Semi supervised learning

Q: – What is the difference between supervised and unsupervised machine learning?
Ans:- By definition, the supervised and unsupervised learning algorithms are categorized based on the supervision required while training. Supervised learning algorithms work on data which are labelled i.e. the data has a desired solution or a label. On the other hand, unsupervised learning algorithms work on unlabeled data, meaning that the data does not contain the desired solution for the algorithm to learn from.

Variance is error due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset.

Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance.

You don’t want either high bias or high variance in your model.

Supervised algorithm examples:

Linear Regression
Logistic Regression
Random Forest
Naive bayes
Neural Networks/Deep Learning
Decision Trees
Support Vector Machine (SVM)
K-Nearest neighbours (KNN)
Classifications
Speech recognition
Regression
Predict time series
Annotate strings

Unsupervised algorithm examples:

Clustering Algorithms – K-means, Hierarchical Clustering Analysis (HCA)
Visualization and Dimensionality reduction – Principal component reduction (PCA)
Association rule learning
Find clusters of the data of the data
Find low-dimensional representations of the data
Find interesting directions in data
Interesting coordinates and correlations
Find novel observations
K-means
t-SNE Clustering
DBSCAN Clustering
Anomaly detection

Q: – Is recommendation supervised or unsupervised learning?
Ans:-
Recommendation algorithms are interesting as some of these are supervised and some are unsupervised. The recommendations based on your profile, previous purchases, page views fall under supervised learning. But there are recommendations based on hot selling products, country/location-based recommendations, which are unsupervised learning.

Q: – What is Reinforcement learning ?
Ans:- Reinforcement learning is an important type of Machine Learning where an agent learn how to behave in a environment by performing actions and seeing the results.

Q: – What are best algorithms for Reinforcement learning ?
Ans:- Q-Learning

Q: – Define what is Fourier Transform in a single sentence?
Ans:- A process of decomposing generic functions into a superposition of symmetric functions is considered to be a Fourier Transform.

Q: – What is deep learning?
Ans:- Deep learning is a process where it is considered to be a subset of machine learning process.

Q: – What is the F1 score?
Ans:- The F1 score is defined as a measure of a model’s performance

Q: – How is F1 score is used?
Ans:- The average of Precision and Recall of a model is nothing but F1 score measure. Based on the results, the F1 score is 1 then it is classified as best and 0 being the worst.

Q: – How can you ensure that you are not overfitting with a particular model?
Ans:- In Machine Learning concepts, they are three main methods or processes to avoid overfitting:
Firstly, keep the model simple
Must and should use cross validation techniques
It is mandatory to use regularization techniques, for example, LASSO.

Q: – What is the difference between Type I and Type II error?
Ans:- Type I error is false positive, while Type II is false negative. Type I error is claiming something has happened when it hasn’t. For instance, telling a man he is pregnant. On the other hand, Type II error means you claim nothing is happened but in fact something is. To exemplify, you tell a pregnant lady she isn’t carrying baby.

Q: – You are given a data set. The data set has missing values which spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?
Ans:- This question has enough hints for you to start thinking! Since, the data is spread across median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.

Q: – Why overfitting happens?
Ans:-The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

Q: – Explain what is precision and Recall?
Ans:- Recall:
It is known as a true positive rate. The number of positives that your model has claimed compared to the actual defined number of positives available throughout the data.
Precision:
It is also known as a positive predicted value. This is more based on the prediction. It is a measure of a number of accurate positives that the model claims when compared to the number of positives it actually claims.

Q: – Why do ensembles typically have higher scores than individual models?
Ans:- An ensemble is the combination of multiple models to create a single prediction. The key idea for making better predictions is that the models should make different errors. That way the errors of one model will be compensated by the right guesses of the other models and thus the score of the ensemble will be higher

We need diverse models for creating an ensemble. Diversity can be achieved by:
Using different ML algorithms. For example, you can combine logistic regression, k-nearest neighbors, and decision trees.

Using different subsets of the data for training. This is called bagging.
Giving a different weight to each of the samples of the training set. If this is done iteratively, weighting the samples according to the errors of the ensemble, it’s called boosting
.
Many winning solutions to data science competitions are ensembles. However, in real-life machine learning projects, engineers need to find a balance between execution time and accuracy

Q: – What is Supervised learning?
Ans:- When the data is labeled in during training process it’s calling supervised learning

Q: – What is Unsupervised learning?
Ans:- When the data is not labeled in during training process it’s calling supervised learning

Q: –What’s the trade-off between bias and variance?
Ans:-
Bias is error due to over simplistic assumptions in the learning algorithm that you are using, which can lead to model under fitting your data and making it hard for model to give accurate predictions.

Variance, on the other hand, is error due to way too much complexity in your learning algorithm. Due to this complexity, the algorithm is highly sensitive to high degrees of variation, which can lead your model to over fit the data. In addition, you will be carrying too much noise from your training data for your model to be useful.

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying data-set. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to trade-off bias and variance. You don’t want either high bias or high variance in your model.

Q: – What is Bayes’ Theorem? How is it useful in a machine learning context?
Ans:- Bayes’ Theorem gives you the posterior probability of an event given what is known as prior knowledge.

Mathematically, it’s expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition. Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test?
Bayes’ Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu.

Q: – What are the basic differences between Machine Learning and Deep Learning?
Ans:-

Q: – You are given a train data set having 1000 columns and 1 million rows. The data set is based on a classification problem. Your manager has asked you to reduce the dimension of this data so that model computation time can be reduced. Your machine has memory constraints. What would you do? (You are free to make practical assumptions.)
Ans:- Processing a high dimensional data on a limited memory machine is a strenuous task, your interviewer would be fully aware of that. Following are the methods you can use to tackle such situation:
Since we have lower RAM, we should close all other applications in our machine, including the web browser, so that most of the memory can be put to use.
We can randomly sample the data set. This means, we can create a smaller data set, let’s say, having 1000 variables and 300000 rows and do the computations.
To reduce dimensionality, we can separate the numerical and categorical variables and remove the correlated variables. For numerical variables, we’ll use correlation. For categorical variables, we’ll use chi-square test.

Also, we can use PCA and pick the components which can explain the maximum variance in the data set.

Using online learning algorithms like Vowpal Wabbit (available in Python) is a possible option.
Building a linear model using Stochastic Gradient Descent is also helpful.
We can also apply our business understanding to estimate which all predictors can impact the response variable. But, this is an intuitive approach, failing to identify useful predictors might result in significant loss of information.

Q: – mention the difference between Data Mining and Machine learning?
Ans:- Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns. During this process machine, learning algorithms are used.

Q: – How is KNN different from k-means clustering?
Ans:- K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points.
The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t — and is thus unsupervised learning.

The crucial difference between both is, K-Nearest Neighbor is a supervised classification algorithm, whereas k-means is an unsupervised clustering algorithm. While the procedure may seem similar at first, what it really means is that in order to K-Nearest Neighbors to work, you need labelled data which you want to classify an unlabeled point into. In k-means clustering it requires set of unlabeled points and a threshold only. The algorithm will take that unlabeled data and will learn how to cluster them into groups by computing the mean of the distance between different points.

Q: –Give an example that explains Machine Learning in industry.
Ans:-  Robots are replacing humans in many areas. It is because robots are programmed such that they can perform the task based on data they gather from sensors. They learn from the data and behaves intelligently.

Q: – Is rotation necessary in PCA? If yes, Why? What will happen if you don’t rotate the components?
Ans:-  Yes, rotation (orthogonal) is necessary because it maximizes the difference between variance captured by the component. This makes the components easier to interpret. Not to forget, that’s the motive of doing PCA where, we aim to select fewer components (than features) which can explain the maximum variance in the data set. By doing rotation, the relative location of the components doesn’t change, it only changes the actual coordinates of the points.
If we don’t rotate the components, the effect of PCA will diminish and we’ll have to select more number of components to explain variance in the data set.

Q: –  What is ‘Overfitting’ in Machine learning?
Ans:-   In machine learning, when a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfit.

Q: – What is the difference between Bias and Variance?
Ans:-
Bias:Bias can be defined as a situation where an error has occurred due to use of assumptions in the learning algorithm.
Variance:Variance is an error caused because of the complexity of the algorithm that is been used to analyze the data.

Q: – What is stratified cross-validation and when should we use it?
Ans:-
cross-validation is a technique for dividing data between training and validation sets. On typical cross-validation this split is done randomly. But in stratified cross-validation, the split preserves the ratio of the categories on both the training and validation datasets.

For example, if we have a dataset with 10% of category A and 90% of category B, and we use stratified cross-validation, we will have the same proportions in training and validation. In contrast, if we use simple cross-validation, in the worst case we may find that there are no samples of category A in the validation set. Stratified cross-validation may be applied in the following scenarios:

On a dataset with multiple categories. The smaller the dataset and the more imbalanced the categories, the more important it will be to use stratified cross-validation.
On a dataset with data of different distributions. For example, in a dataset for autonomous driving, we may have images taken during the day and at night. If we do not ensure that both types are present in training and validation, we will have generalization problems.

Q: – What are the different Algorithm techniques in Machine Learning?
Ans:-
• Reinforcement Learning
• Supervised Learning
• Unsupervised Learning
• Semi-supervised Learning
• Transduction
• Learning to Learn

Q: – Why is naive Bayes so naive?
Ans:- Naive Bayes is so naive because it assumes that all of the features in a dataset are equally important and independent.

Q: – What is Semi supervised learning ?
Ans:- Semi-supervised learning is a class of machine learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning and supervised learning

Q: – You are given a data set on cancer detection. You’ve build a classification model and achieved an accuracy of 96%. Why shouldn’t you be happy with your model performance? What can you do about it?
Ans:- If you have worked on enough data sets, you should deduce that cancer detection results in imbalanced data. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer.

Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier.

If the minority class performance is found to to be poor, we can undertake the following steps:

We can use undersampling, oversampling or SMOTE to make the data balanced.
We can alter the prediction threshold value by doing probability caliberation and finding a optimal threshold using AUC-ROC curve.
We can assign weight to classes such that the minority classes gets larger weight.
We can also use anomaly detection.

Q: –How can you avoid overfitting ?
Ans:-  By using a lot of data overfitting can be avoided, overfitting happens relatively as you have a small dataset, and you try to learn from it. But if you have a small database and you are forced to come with a model based on that. In such situation, you can use a technique known as cross validation. In this method the dataset splits into two section, testing and training datasets, the testing dataset will only test the model while, in training dataset, the datapoints will come up with the model.
In this technique, a model is usually given a dataset of a known data on which training (training data set) is run and a dataset of unknown data against which the model is tested. The idea of cross validation is to define a dataset to “test” the model in the training phase.

Q: –What Deep Learning is exactly?
Ans:-  Most people don’t know this but Machine Learning and Deep Learning is not two different things, but Deep learning is a subset of Machine learning. It mostly deals with neural networks: how to use back propagation and other certain principles from neuroscience to more accurately model large sets of unlabeled data. In a nutshell, Deep Learning represents unsupervised learning algorithm that learns data representation mainly through neural networks. Explore a little about neural nets to answer deep learning interview questions effectively.

Q: – What’s the difference between a generative and discriminative model?
Ans:-  A discriminating model will learn the distinction between different categories of data, while A generative model will learn categories of data. Discriminative models will predominantly outperform generative models on classification tasks.

Q: – What is your favorite algorithm and also explain the algorithm in briefly in a minute?
Ans:-  This type of questions is very common and asked by the interviewers to understand the candidate skills and assess how well he can communicate complex theories in the simplest language.
This one is a tough question and usually, individuals are not at all prepared for this situation so please be prepared and have a choice of algorithms and make sure you practice a lot before going into any sort of interviews.

Q: – What is the difference between Type 1 and Type 2 errors?
Ans:-
Type 1 error is classified as a false positive. I.e. This error claims that something has happened but the fact is nothing has happened. It is like a false fire alarm. The alarm rings but there is no fire.
Type 2 error is classified as a false negative. I.e. This error claims that nothing has happened but the fact is that actually, something happened at the instance.
The best way to differentiate a type 1 vs type 2 error is:
Calling a man to be pregnant- This is Type 1 example
Calling pregnant women and telling that she isn’t carrying any baby- This is type 2 example

Q: – What are the advantages of Naive Bayes?
Ans:-  The advantages of Naive Bayes are:
The classifier will converge quicker than discriminative models
It cannot learn the interactions between features
Let us move to the next Machine Learning Interview Questions.

Q: – What are the disadvantages of Naive Bayes?
Ans:-  The disadvantages of Naive Bayes are:
It is because the problem arises for continuous features
It makes a very strong assumption on the shape of your data distribution.
It can also happen because of data scarcity

Q: – What is Anomaly detection ?
Ans:-   Anomaly detection is the identification of data points, items, observations or events that do not conform to the expected pattern of a given group. These anomalies occur very infrequently but may signify a large and significant threat such as cyber intrusions or fraud.

Q: – What is Bias, Variance and Trade-off ?
Ans:-  Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.

Q: – What is Overfitting in Machine Learning?
Ans:-   This is the popular Machine Learning Interview Questions asked in an interview. Overfitting in Machine Learning is defined as when a statistical model describes random error or noise instead of the underlying relationship or when a model is excessively complex.

Q: –What are the conditions when Overfitting happens?
Ans:- One of the important reason and possibility of overfitting is because the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

Q: – How can you avoid overfitting?
Ans:-  We can avoid overfitting by using:
• Lots of data
• Cross-validation

Q: – What are the five popular algorithms for Machine Learning?
Ans:-  Below is the list of five popular algorithms of Machine Learning:
• Decision Trees
• Probabilistic networks
• Nearest Neighbor
• Support vector machines
• Neural Networks

Q: – What are the different use cases where machine learning algorithms can be used?
Ans:-
The different use cases where machine learning algorithms can be used are as follows:
• Fraud Detection
• Face detection
• Natural language processing
• Market Segmentation
• Text Categorization
• Bioinformatics

Q: –What are parametric models and Non-Parametric models?
Ans:-
Parametric models are those with a finite number of parameters and to predict new data, you only need to know the parameters of the model.
Non Parametric models are those with an unbounded number of parameters, allowing for more flexibility and to predict new data, you need to know the parameters of the model and the state of the data that has been observed.

Q: – Explain what is the function of ‘Supervised Learning’?
Ans:-
a) Classifications
b) Speech recognition

Q: – What are the three stages to build the hypotheses or model in machine learning?
Ans:-
This is the frequently asked Machine Learning Interview Questions in an interview. The three stages to build the hypotheses or model in machine learning are:
1. Model building
2. Model testing
3. Applying the model

Q: – What is not Machine Learning?
Ans:-
a) Artificial Intelligence
b) Rule based inference

Q: – What is classification ? When we will use this method ?
Ans:-  In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

Q: – What is clustering ? When we will use this method ?
Ans:-  Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups

Q: – What is regularization ?
Ans:- This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero. In other words, this technique discourages learning a more complex or flexible model, so as to avoid the risk of overfitting.

Q: – What is Difference between l1 regularization and l2 regularization ?
Ans:-  Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights

Q: – What ensemble learning ?
Ans:- Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem.

Q: – Why is naive Bayes so ‘naive’ ?
Ans:-   naive Bayes is so ‘naive’ because it assumes that all of the features in a data set are equally important and independent. As we know, these assumption are rarely true in real world scenario.

Q: – Explain prior probability, likelihood and marginal likelihood in context of naiveBayes algorithm?
Ans:- Prior probability is nothing but, the proportion of dependent (binary) variable in the data set. It is the closest guess you can make about a class, without any further information. For example: In a data set, the dependent variable is binary (1 and 0). The proportion of 1 (spam) is 70% and 0 (not spam) is 30%. Hence, we can estimate that there are 70% chances that any new email would be classified as spam.

Likelihood is the probability of classifying a given observation as 1 in presence of some other variable. For example: The probability that the word ‘FREE’ is used in previous spam message is likelihood. Marginal likelihood is, the probability that the word ‘FREE’ is used in any message.

Q: – How is True Positive Rate and Recall related? Write the equation.
Ans:- True Positive Rate = Recall. Yes, they are equal having the formula (TP/TP + FN).

Q: – What is inductive machine learning?
Ans:-  The inductive machine learning involves the process of learning by examples, where a system, from a set of observed instances tries to induce a general rule.

Q: – What are the five popular algorithms of Machine Learning?
Ans:-
Decision Trees
Neural Networks (back propagation)
Probabilistic networks
Nearest Neighbor
Support vector machines
-----------------------------------------------------------------------------

Thanks for Watching...

Please "SUBSCRIBE!" to Cloud Network Channel and Share my Channel and Videos to Friends/Relatives.

Suggest me How can i growth my channel views, videos, content, likes, subscriber .

Let me know

What can i video make and upload to viewers and anyone want to join or help they are free to join email me on itcloudnet@gmail.com for more discussion.

Cloud Network

Archive

Buymecoffe

Saturday, April 13, 2019

AWS (Amazon Web Services) Interview Question and Answer for Fresher and Experience - 2019

Tuesday, April 9, 2019

Machine Learning (ML) Interview Questions and Answers with Advanced

Recent

Popular

Comments

Tags

Categories

Contact Form