99999久久久久久亚洲,欧美人与禽猛交狂配,高清日韩av在线影院,一个人在线高清免费观看,啦啦啦在线视频免费观看www

熱線電話:13121318867

登錄
首頁精彩閱讀谷歌微軟等科技巨頭數(shù)據(jù)科學(xué)崗位面試題(108道)
谷歌微軟等科技巨頭數(shù)據(jù)科學(xué)崗位面試題(108道)
2017-04-05
收藏

來自 Glassdoor 的最新數(shù)據(jù)可以告訴我們各大科技公司最近在招聘面試時最喜歡向候選人提什么問題。首先有一個令人惋惜的結(jié)論:根據(jù)統(tǒng)計,幾乎所有的公司都有著自己的不同風(fēng)格。由于 Glassdoor 允許匿名提交內(nèi)容,很多樂于分享的應(yīng)聘者向大家提供了 Facebook、谷歌、微軟等大公司的面試題。

A fresh scrape from Glassdoor gives us a good idea about what applicants are asked during a data scientist interview at some of the top companies. Unfortunately for us, almost every company has their interviewees sign NDAs. Since Glassdoor allows anonymity, a few brave souls have given us some fantastic examples of what they were asked during the interview process at top companies like Facebook, Google, and Microsoft.


One.General Questions 
Apple

Suppose you’re given millions of users that each have hundreds of transactions and these millions of transactions are for tens of thousands of products. How would you group the users together in meaningful segments?

如果你有幾百萬用戶,每個用戶都會發(fā)生數(shù)百筆交易,這些交易存在于數(shù)十種產(chǎn)品中。你該如何把這些用戶細(xì)分成有意義的幾類?

Microsoft

Describe a project you’ve worked on and how it made a difference.

描述一個你曾經(jīng)參與的項目,以及它的優(yōu)點。

How would you approach a categorical feature with high-cardinality?

如何處理具有高基數(shù)(high-cardinality)的類屬特征?

What would you do to summarize a Twitter feed?

如果想要給 Twitter feed 寫 summarize,你要怎么辦?

What are the steps for wrangling and cleaning data before applying machine learning algorithms?

在應(yīng)用機器學(xué)習(xí)算法之前糾正和清理數(shù)據(jù)的步驟是什么?

How do you measure distance between data points?

如何測量數(shù)據(jù)點之間的距離?

Define variance.

請定義一下方差。

Describe the differences between and use cases for box plots and histograms.

請描述箱形圖(box plot)和直方圖(histogram)之間的差異,以及它們的用例。

Twitter

What features would you use to build a recommendation algorithm for users?

你會使用什么功能來為用戶構(gòu)建推薦算法?

Uber

Pick any product or app that you really like and describe how you would improve it.

選擇任何一個你真正喜歡的產(chǎn)品或應(yīng)用程序,并描述如何改善它。

How would you find an anomaly in a distribution ?

如何在分布中發(fā)現(xiàn)異常?

How would you go about investigating if a certain trend in a distribution is due to an anomaly?

如何檢查分布中的某個趨勢是否是由于異常產(chǎn)生的?

How would you estimate the impact Uber has on traffic and driving conditions?

如何估算 Uber 對交通和駕駛環(huán)境造成的影響?

What metrics would you consider using to track if Uber’s paid advertising strategy to acquire new customers actually works? How would you then approach figuring out an ideal customer acquisition cost?

你會考慮用什么指標(biāo)來跟蹤 Uber 付費廣告策略在吸引新用戶上是否有效?然后,你想用什么辦法估算出理想的客戶購置成本?

LinkedIn

Big Data Engineer Can you explain what REST is?

(大數(shù)據(jù)工程師)請解釋 REST 是什么。


Two. Machine Learning Questions 
Google

Why do you use feature selection?

為什么要使用特征選擇(feature selection)?

What is the effect on the coefficients of logistic regression if two predictors are highly correlated? What are the confidence intervals of the coefficients?

如果兩個預(yù)測變量高度相關(guān),它們對邏輯回歸系數(shù)的影響是什么?系數(shù)的置信區(qū)間是什么?

What’s the difference between Gaussian Mixture Model and K-Means?

高斯混合模型(Gaussian Mixture Model)和 K-Means 之間有什么區(qū)別?

How do you pick k for K-Means?

在 K-Means 中如何拾取 k?

How do you know when Gaussian Mixture Model is applicable?

你如何知道高斯混合模型是不是適用的?

Assuming a clustering model’s labels are known, how do you evaluate the performance of the model?

假設(shè)聚類模型的標(biāo)簽是已知的,你如何評估模型的性能?

Microsoft

What’s an example of a machine learning project you’re proud of?

你有哪些引以為豪的機器學(xué)習(xí)項目?

Choose any machine learning algorithm and describe it.

隨意選擇一個機器學(xué)習(xí)算法,并描述它。

Describe how Gradient Boosting works.

請解釋 Gradient Boosting 是如何工作的。

Data Mining Describe the decision tree model.

數(shù)據(jù)挖掘工程師)請解釋決策樹模型。

Data Mining What is a neural network?


Explain the Bias-Variance Tradeoff

請解釋偏差方差權(quán)衡(Bias-Variance Tradeoff)。

How do you deal with unbalanced binary classification?

如何處理不平衡二進制分類?

What’s the difference between L1 and L2 regularization?

L1 和 L2 正則化之間有什么區(qū)別?

Uber

What sort features could you give an Uber driver to predict if they will accept a ride request or not? What supervised learning algorithm would you use to solve the problem and how would compare the results of the algorithm?

你會通過哪種特征來預(yù)測 Uber 司機是否會接受訂單請求?你會使用哪種監(jiān)督學(xué)習(xí)算法來解決這個問題,如何比較算法的結(jié)果?

LinkedIn

Name and describe three different kernel functions and in what situation you would use each.

點出及描述三種不同的內(nèi)核函數(shù),在哪些情況下使用哪種?

Describe a method used in machine learning.

隨意解釋機器學(xué)習(xí)里的一種方法。

How do you deal with sparse data?

如何應(yīng)付稀疏數(shù)據(jù)?

IBM

How do you prevent overfitting?

如何防止過擬合(overfitting)?

How do you deal with outliers in your data?

如何處理數(shù)據(jù)中的離群值?

How do you analyze the performance of the predictions generated by regression models versus classification models?

如何評估邏輯回歸與簡單線性回歸模型預(yù)測的性能?

How do you assess logistic regression versus simple linear regression models?

如何確定邏輯回歸與簡單線性回歸模型?

What’s the difference between supervised learning and unsupervised learning?

監(jiān)督學(xué)習(xí)和無監(jiān)督學(xué)習(xí)有什么區(qū)別?

What is cross-validation and why would you use it?

什么是交叉驗證(cross-validation),為什么要使用它?

What’s the name of the matrix used to evaluate predictive models?

用于評估預(yù)測模型的矩陣的稱為什么?

What relationships exist between a logistic regression’s coefficient and the Odds Ratio?

邏輯回歸系數(shù)和勝算比(Odds Ratio)之間存在怎樣的關(guān)聯(lián)?

What’s the relationship between Principal Component Analysis (PCA) and Linear & Quadratic Discriminant Analysis (LDA & QDA)

主成分分析(PCA)與線性判別分析(LDA)、二次判別分析(QDA)之間存在怎樣的關(guān)聯(lián)?

If you had a categorical dependent variable and a mixture of categorical and continuous independent variables, what algorithms, methods, or tools would you use for analysis?

如果你有一個因變量分類,又有一個連續(xù)自變量的混合分類,你將使用什么算法,方法或工具進行分析?

Business Analytics What’s the difference between logistic and linear regression? How do you avoid local minima?

(行業(yè)分析師)邏輯與線性回歸有什么區(qū)別?如何避免局部極小值?

Salesforce

What data and models would would you use to measure attrition/churn? How would you measure the performance of your models?

你會使用哪些數(shù)據(jù)和模型來測量損耗/流失?如何測試模型性能?

Explain a machine learning algorithm as if you’re talking to a non-technical person.

請嘗試向非技術(shù)人員解釋一種機器學(xué)習(xí)算法。

Capital One

How would you build a model to predict credit card fraud?

如何構(gòu)建一個模型來預(yù)測信用卡詐騙?

How do you handle missing or bad data?

如何處理丟失或不良數(shù)據(jù)?

How would you derive new features from features that already exist?

如何從已存在的特征中導(dǎo)出新的特征?

If you’re attempting to predict a customer’s gender, and you only have 100 data points, what problems could arise?

如果你試圖預(yù)測客戶的性別,但只有 100 個數(shù)據(jù)點,可能會出現(xiàn)什么問題?

Suppose you were given two years of transaction history. What features would you use to predict credit risk?

在擁有兩年交易歷史的情況下,哪些特征可以用來預(yù)測信用風(fēng)險?

Design an AI program for Tic-tac-toe

設(shè)計一個用來下井字棋的人工智能程序。

Zillow

Explain overfitting and what steps you can take to prevent it.

請解釋過度擬合,以及如何防止過度擬合。

Why does SVM need to maximize the margin between support vectors?

為什么 SVM 需要在支持向量之間最大化邊緣?


Three.Hadoop
Twitter

How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?

如何使用 Map/Reduce 將非常大的圖形分割成更小的塊,并根據(jù)數(shù)據(jù)的快速/動態(tài)變化并行計算它們的邊緣?

Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?

(數(shù)據(jù)工程師)給定一個列表:123, 345234, 678345, 123…其中第一列是粉絲的 ID,第二列是被粉者的 ID。查找所有相互后續(xù)對(上面的示例中的對是 123,345)。當(dāng)列表超出內(nèi)存時,如何使用 Map / Reduce 來解決問題?

Capital One

Data Engineer What is Hadoop serialization?

(對數(shù)據(jù)工程師)什么是 Hadoop 序列化(serialization)?

Explain a simple Map/Reduce problem.

闡述一個簡單的 Map / Reduce 問題。


Four.Hive
LinkedIn

Data Engineer Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a restaurant states “Good food, bad service,” your score might be 1 – 1 = 0.

(數(shù)據(jù)工程師)請編寫返回情感分?jǐn)?shù)的 Hive UDF。例如,假如好=1,壞=-1,平均數(shù)=0,那么對餐廳做評價時因為「食物好,服務(wù)差」,你的分?jǐn)?shù)可能為 1 - 1 = 0


Five.Spark
Capital One

Data Engineer Explain how RDDs work with Scala in Spark

(數(shù)據(jù)工程師)闡釋使用 Scala 語言時RDD 在 Spark 中是如何工作的?


Six.Statistics & Probability Questions
Google

Explain Cross-validation as if you’re talking to a non-technical person.

請嘗試向非技術(shù)人員闡釋交叉驗證(Cross-validation)。

Describe a non-normal probability distribution and how to apply it.

請描述一下非正態(tài)概率分布以及該如何應(yīng)用?

Microsoft

Data Mining Explain what heteroskedasticity is and how to solve it

數(shù)據(jù)挖掘)請解釋異方差(heteroskedasticity)是什么,以及如何解決它。

Twitter

Given Twitter user data, how would you measure engagement?

在給定 Twitter 用戶數(shù)據(jù)的情況下,你該如何衡量參與度?

Uber

What are some different Time Series forecasting techniques?

時間序列預(yù)測技術(shù)有什么不同?

Explain Principle Component Analysis (PCA) and equations PCA uses.

解釋原理組件分析(PCA)及其 使用的方程。

How do you solve Multicollinearity?

如何解決多重共線性(Multicollinearity)?

Analyst Write an equation that would optimize the ad spend between Twitter and Facebook.

(分析師)請嘗試列出優(yōu)化我們在 推特和臉書上的廣告費用支出的方程。

Facebook

What’s the probability you’ll draw two cards of the same suite from a single deck?

在一副牌中抽取兩張,出現(xiàn)同一花色的概率是多少?

IBM

What are p-values and confidence intervals?

什么是 p-value 和置信區(qū)間?

Capital One

Data Analyst If you have 70 red marbles, and the ratio of green to red marbles is 2 to 7, how many green marbles are there?

(數(shù)據(jù)分析師)如果你有 70 個紅色彈珠,綠色和紅色彈珠的比例是 2 :7,有多少綠色彈珠?

What would the distribution of daily commutes in New York City look like?

紐約市的通勤數(shù)據(jù)看起來應(yīng)該遵從什么分布?

Given a die, would it be more likely to get a single 6 in six rolls, at least two 6s in twelve rolls, or at least one-hundred 6s in six-hundred rolls?

一個骰子,在扔 6 次的情況下出現(xiàn) 1 個 6 的幾率,與扔 12 次的情況下出現(xiàn)至少兩個 6 的幾率,和扔 600 次出現(xiàn)至少 100 次 6 的幾率相比哪個大?

PayPal

What’s the Central Limit Theorem, and how do you prove it? What are its applications?

什么是中心極限定理(Central Limit Theorem),如何證明它?它的應(yīng)用方向是什么?


Seven.Programming & Algorithms 編程和算法
Google

Data Analyst Write a program that can determine the height of an arbitrary binary tree

(數(shù)據(jù)分析師)請寫一個程序可以判定二叉樹的高度。

Microsoft

Create a function that checks if a word is a palindrome.

請創(chuàng)建一個函數(shù)檢查一個詞是否具有回文結(jié)構(gòu)。

Twitter

Build a power set.

請構(gòu)建一個冪集(power set)。

How do you find the median of a very large dataset?

請問如何在一個巨大的數(shù)據(jù)集中找到中值?

Uber

Data Engineer Code a function that calculates the square root (2-point precision) of a given number. Follow up: Avoid redundant calculations by now optimizing your function with a caching mechanism.

(數(shù)據(jù)工程師)編寫一個函數(shù)用來計算給定數(shù)字的平方根(精確到百分位)。隨后:避免冗余計算,現(xiàn)在使用緩存機制優(yōu)化你的功能。

Facebook

Suppose you’re given two binary strings, write a function adds them together without using any builtin string-to-int conversion or parsing tools. For example, if you give your function binary strings 100 and 111, it should return 1011. What’s the space and time complexity of your solution?

假設(shè)給定兩個二進制字符串,寫一個函數(shù)將它們添加在一起,而不使用任何內(nèi)置的字符串到 int 轉(zhuǎn)換或解析工具。例如:如果給函數(shù)二進制字符串 100 和 111,它應(yīng)該返回 1011。你的解決方案的空間和時間復(fù)雜性如何?

Write a function that accepts two already sorted lists and returns their union in a sorted list.

編寫一個函數(shù),它接受兩個已排序的列表,并在排序列表中返回它們的并集。

LinkedIn

Data Engineer Write some code that will determine if brackets in a string are balanced

(數(shù)據(jù)工程師)請編寫一些代碼來確定字符串中的左右括號是否是平衡的?

How do you find the second largest element in a Binary Search Tree?

如何找到二叉搜索樹中第二大的元素?

Write a function that takes two sorted vectors and returns a single sorted vector.

請編寫一個函數(shù),它接受兩個排序的向量,并返回一個排序的向量。

If you have an incoming stream of numbers, how would you find the most frequent numbers on-the-fly?

 如果你有一個輸入的數(shù)字流,如何在運行過程中找到最頻繁出現(xiàn)的數(shù)字?

Write a function that raises one number to another number, i.e. the pow() function.

編寫一個函數(shù),將一個數(shù)字增加到另一個數(shù)字,就像 pow()函數(shù)一樣。

Split a large string into valid words and store them in a dictionary. If the string cannot be split, return false. What’s your solution’s complexity?

將大字符串拆分成有效字段并將它們存儲在 dictionary 中。如果字符串不能拆分,返回 false。你的解決方案的復(fù)雜性如何?

Salesforce

What’s the computational complexity of finding a document’s most frequently used words?

查找文檔最常用的詞的計算復(fù)雜性是什么?

If you’re given 10 TBs of unstructured customer data, how would you go about finding extracting valuable information from it?

如果給你10 TBs的非結(jié)構(gòu)化客戶數(shù)據(jù),你會如何發(fā)現(xiàn)提取有價值的信息呢?

Capital One

Data Engineer How would you ‘disjoin’ two arrays (like JOIN for SQL, but the opposite)?

(對數(shù)據(jù)工程師)如何「拆散」兩個數(shù)列(就像 SQL 中的 JOIN 反過來)?

Create a function that does addition where the numbers are represented as two linked lists.

請創(chuàng)建一個用于添加的函數(shù),數(shù)字表示為兩個鏈表。

Create a function that calculates matrix sums.

請創(chuàng)建一個計算矩陣的函數(shù)。

How would you use Python to read a very large tab-delimited file of numbers to count the frequency of each number?

如何使用 Python 讀取一個非常大的制表符分隔的數(shù)字文件,來計算每個數(shù)字出現(xiàn)的頻率?

PayPal

Write a function that takes a sentence and prints out the same sentence with each word backwards in O(n) time.

請編寫一個函數(shù),讓它能在 O(n)的時間內(nèi)取一個句子并逆向打印出來。

Write a function that takes an array, splits the array into every possible set of two arrays, and prints out the max differences between the two array’s minima in O(n) time.

 請編寫一個函數(shù),從一個數(shù)組中拾取,將它們分成兩個可能的數(shù)組,然后打印兩個數(shù)組之間的最大差值(在 O(n) 時間內(nèi))。

Write a program that does merge sort.

請編寫一個執(zhí)行合并排序的程序。


Eight.SQL Questions
Microsoft

Data Analyst Define and explain the differences between clustered and non-clustered indexes.

(數(shù)據(jù)分析師)定義和解釋聚簇索引和非聚簇索引之間的差異。

Data Analyst What are the different ways to return the rowcount of a table?

(數(shù)據(jù)分析師)返回表的行計數(shù)有哪些不同的方法?

Facebook

Data Engineer If you’re given a raw data table, how would perform ETL (Extract, Transform, Load) with SQL to obtain the data in a desired format?

(數(shù)據(jù)工程師)如果給定一個原始數(shù)據(jù)表,如何使用 SQL 執(zhí)行 ETL(提取,轉(zhuǎn)換,加載)以獲取所需格式的數(shù)據(jù)?

How would you write a SQL query to compute a frequency table of a certain attribute involving two joins? What changes would you need to make if you want to ORDER BY or GROUP BY some attribute? What would you do to account for NULLS?

如何編寫 SQL 查詢來計算涉及兩個連接的某個屬性的頻率表?如果你想要 ORDER BY 或 GROUP BY 一些屬性,你需要做什么變化?你該怎么解釋 NULL?

LinkedIn

Data Engineer How would you improve ETL (Extract, Transform, Load) throughput?

(數(shù)據(jù)工程師)如何改進 ETL(提取,轉(zhuǎn)換,加載)的吞吐量?


Nine.Brain Teasers & Word Problems
Google

Suppose you have ten bags of marbles with ten marbles in each bag. If one bag weighs differently than the other bags, and you could only perform a single weighing, how would you figure out which one is different?

假設(shè)你有 10 包彈球,每包里面都是 10 個彈球。如果其中一包的重量和其他的不同,但你只能進行一次稱重,你該用什么辦法?

Facebook

You are about to hop on a plane to Seattle and want to know if you should carry an umbrella. You call three friends of yours that live in Seattle and ask each, independently, if it’s raining.Each of your friends will tell you the truth ? of the time and mess with you by lying ? of the time. If all three friends answer “Yes, it’s raining,” what is the probability that is it actually raining in Seattle?

你打算坐飛機去西雅圖,想知道是不是需要帶傘,于是你分別打電話給三位在西雅圖的朋友。每個朋友都有 2/3 的幾率說真話,1/3 的幾率在騙你。如果他們都說「會下雨」,西雅圖下雨的概率是多少?

Uber

Imagine you are working with a hospital. Patients arrive at the hospital in a Poisson Distribution, and the doctors attend to the patients in a Uniform Distribution. Write a function or code block that outputs the patient’s average wait time and total number of patients that are attended to by doctors on a random day.

想象一下你在一家醫(yī)院工作?;颊邅砭驮\的頻率符合泊松分布,而醫(yī)生照顧患者的頻率符合均勻分布。請寫一個函數(shù)或一段代碼來輸出患者的平均等待時間和醫(yī)生在某日的參與度。

Facebook

Imagine there are three ants in each corner of an equilateral triangle, and each ant randomly picks a direction and starts traversing the edge of the triangle. What’s the probability that none of the ants collide? What about if there are N ants sitting in N corners of an equilateral polygon?

假如在一個等邊三角形的三個角上都有一只螞蟻,每只隨機選擇方向然后直走一直到另一個邊緣,三只螞蟻互相不交匯的幾率是多少?如果有 n 只螞蟻在 n 角形中,概率又是多少?

How many trailing zeros are in 100 factorial (i.e. 100!)?

在 100! 的結(jié)果里有多少個零?

LinkedIn

Imagine you’re climbing a staircase that contains n stairs, and you can take any number k steps. How many distinct ways can you reach the top of the staircase? (This is a modification of the original stair step problem)

你正在攀爬一個 n 階的樓梯,你可以采取任何數(shù)量的 k 個步驟。你到達(dá)樓梯頂部有多少不同的方式?(這是樓梯問題的修改版)

本文轉(zhuǎn)自蝸牛讀寫(chuhanread),轉(zhuǎn)載需授權(quán)

數(shù)據(jù)分析咨詢請掃描二維碼

若不方便掃碼,搜微信號:CDAshujufenxi

數(shù)據(jù)分析師資訊
更多

OK
客服在線
立即咨詢
客服在線
立即咨詢
') } function initGt() { var handler = function (captchaObj) { captchaObj.appendTo('#captcha'); captchaObj.onReady(function () { $("#wait").hide(); }).onSuccess(function(){ $('.getcheckcode').removeClass('dis'); $('.getcheckcode').trigger('click'); }); window.captchaObj = captchaObj; }; $('#captcha').show(); $.ajax({ url: "/login/gtstart?t=" + (new Date()).getTime(), // 加隨機數(shù)防止緩存 type: "get", dataType: "json", success: function (data) { $('#text').hide(); $('#wait').show(); // 調(diào)用 initGeetest 進行初始化 // 參數(shù)1:配置參數(shù) // 參數(shù)2:回調(diào),回調(diào)的第一個參數(shù)驗證碼對象,之后可以使用它調(diào)用相應(yīng)的接口 initGeetest({ // 以下 4 個配置參數(shù)為必須,不能缺少 gt: data.gt, challenge: data.challenge, offline: !data.success, // 表示用戶后臺檢測極驗服務(wù)器是否宕機 new_captcha: data.new_captcha, // 用于宕機時表示是新驗證碼的宕機 product: "float", // 產(chǎn)品形式,包括:float,popup width: "280px", https: true // 更多配置參數(shù)說明請參見:http://docs.geetest.com/install/client/web-front/ }, handler); } }); } function codeCutdown() { if(_wait == 0){ //倒計時完成 $(".getcheckcode").removeClass('dis').html("重新獲取"); }else{ $(".getcheckcode").addClass('dis').html("重新獲取("+_wait+"s)"); _wait--; setTimeout(function () { codeCutdown(); },1000); } } function inputValidate(ele,telInput) { var oInput = ele; var inputVal = oInput.val(); var oType = ele.attr('data-type'); var oEtag = $('#etag').val(); var oErr = oInput.closest('.form_box').next('.err_txt'); var empTxt = '請輸入'+oInput.attr('placeholder')+'!'; var errTxt = '請輸入正確的'+oInput.attr('placeholder')+'!'; var pattern; if(inputVal==""){ if(!telInput){ errFun(oErr,empTxt); } return false; }else { switch (oType){ case 'login_mobile': pattern = /^1[3456789]\d{9}$/; if(inputVal.length==11) { $.ajax({ url: '/login/checkmobile', type: "post", dataType: "json", data: { mobile: inputVal, etag: oEtag, page_ur: window.location.href, page_referer: document.referrer }, success: function (data) { } }); } break; case 'login_yzm': pattern = /^\d{6}$/; break; } if(oType=='login_mobile'){ } if(!!validateFun(pattern,inputVal)){ errFun(oErr,'') if(telInput){ $('.getcheckcode').removeClass('dis'); } }else { if(!telInput) { errFun(oErr, errTxt); }else { $('.getcheckcode').addClass('dis'); } return false; } } return true; } function errFun(obj,msg) { obj.html(msg); if(msg==''){ $('.login_submit').removeClass('dis'); }else { $('.login_submit').addClass('dis'); } } function validateFun(pat,val) { return pat.test(val); }