合肥生活安徽新闻合肥交通合肥房产生活服务合肥教育合肥招聘合肥旅游文化艺术合肥美食合肥地图合肥社保合肥医院企业服务合肥法律

代做CAP 4611、代写C/C++,Java程序
代做CAP 4611、代写C/C++,Java程序

时间:2025-04-28  来源:合肥网hfw.cc  作者:hfw.cc 我要纠错



Final Exam
Instructor: Amrit Singh Bedi
Instructions
This exam is worth a total of 100 points. Please answer all questions clearly
and concisely. Show all your work and justify your answers.
• For Question 1 and 2, please submit the PDF version of your solution
via webcourses. You can either write it in latex or do it on paper and
submit the scanned version. But if you do it on paper and scan it,
you are responsible for ensuring it is readable and properly scanned.
There will be zero marks if it is not clearly written or scanned.
• The total time to complete the exam is 24 hours and it is due at 4:00
pm EST, Friday (April 25th, 2025). This is a take-home exam. Please
do not use AI like ChatGPT to complete the exam. There are zero
marks if found (believe me, we would know if you use it).
Question 1 50 marks
Context: In supervised learning, understanding the bias-variance tradeoff
is crucial for developing models that generalize well to unseen data.
Problem 1 10 marks
Define the terms bias, variance, and irreducible error in the context of su pervised learning. Explain how each contributes to the total expected error
of a model.
1
Problem 2 20 marks
Derive the bias-variance decomposition of the expected squared error for a
regression problem. That is, show that:
ED,ε[(y − f
ˆ(x))2
] =  Bias[f
ˆ(x)]
2
+ Var[f
ˆ(x)] + σ
2
where f
ˆ(x) is the prediction of the model trained on dataset D, y = f(x)+ε,
and σ
2
is the variance of the noise ε.
Hint: You can start by taking y = f(x) + ε, where E[ε] = 0, and
Var[ε] = σ
2
. Let f
ˆ(x) be a learned function from the training set D. Then
proceed towards the derivation.
Problem 3 10 marks
Consider two models trained on the same dataset:
• Model A: A simple linear regression model.
• Model B: A 10th-degree polynomial regression model.
Discuss, in terms of bias and variance, the expected performance of each
model on training data and unseen test data. Which model is more likely
to overfit, and why?
Problem 4 10 marks
Explain how increasing the size of the training dataset affects the bias and
variance of a model. Provide reasoning for your explanation. (10 marks)
Question 2: Using Transformer Attention 50
marks
Context. Consider a simplified Transformer with a vocabulary of six to kens:
• I (ID 0): embedding  1.0, 0.0

• like (ID 1): embedding  0.0, 1.0

• to (ID 2): embedding  1.0, 1.0

2
• eat (ID 3): embedding  0.5, 0.5

• apples (ID 4): embedding  0.6, 0.4

• bananas (ID 5): embedding  0.4, 0.6

All three projection matrices are the 2 × 2 identity:
WQ = WK = WV = I2.
When predicting the next token, the model uses masked self-attention: the
query comes from the last position, while keys and values come from all
previous tokens. (Note: show step by step calculation for all questions
below)
(a) (10 marks) For the input sequence [I, like, to] (IDs [0, 1, 2]),
compute the query, key and value vectors for each token.
(b) (15 marks) Let Q be the query of the last token and K, V the keys
and values of all three tokens.
• Compute the row vector of raw attention scores qK⊤, where q is
the query of the last token and K is the 3×2 matrix of keys. .
• Scale by √
dk (with dk = 2) and apply softmax to obtain attention
weights.
• Compute the context vector as the weighted sum of the values.
(c) (15 marks) Given the context vector c ∈ R
2
from part (b), com pute the unnormalized score for each vocabulary embedding via c ·
embed(w), i.e. dot-product.
• Apply softmax over these six scores to get a probability distribu tion.
• Which token has the highest probability? [Note: Because the six
embeddings are synthetic and not trained on real text, the token
that receives the highest probability may look ungrammatical in
normal English; this is an artifact of the toy setup.]
(d) (10 marks) Explain why the model selects the token you found in
(c). In your answer, discuss:
• How the attention weights led to that choice.
• Explain why keys/values may include the current token but never
future tokens .
3

请加QQ:99515681  邮箱:99515681@qq.com   WX:codinghelp

扫一扫在手机打开当前页
  • 上一篇:代做ISYS1001、代写C++,Java程序
  • 下一篇:FINM7406代做、代写Java/Python编程
  • ·代做ISYS1001、代写C++,Java程序
  • ·代做COMP2221、代写Java程序设计
  • ·代写MATH3030、代做c/c++,Java程序
  • ·COMP 5076代写、代做Python/Java程序
  • ·代写COP3503、代做Java程序设计
  • ·COMP3340代做、代写Python/Java程序
  • ·COM1008代做、代写Java程序设计
  • ·MATH1053代做、Python/Java程序设计代写
  • ·CS209A代做、Java程序设计代写
  • ·ITC228编程代写、代做Java程序语言
  • 合肥生活资讯

    合肥图文信息
    出评 开团工具
    出评 开团工具
    挖掘机滤芯提升发动机性能
    挖掘机滤芯提升发动机性能
    戴纳斯帝壁挂炉全国售后服务电话24小时官网400(全国服务热线)
    戴纳斯帝壁挂炉全国售后服务电话24小时官网
    菲斯曼壁挂炉全国统一400售后维修服务电话24小时服务热线
    菲斯曼壁挂炉全国统一400售后维修服务电话2
    美的热水器售后服务技术咨询电话全国24小时客服热线
    美的热水器售后服务技术咨询电话全国24小时
    海信罗马假日洗衣机亮相AWE  复古美学与现代科技完美结合
    海信罗马假日洗衣机亮相AWE 复古美学与现代
    合肥机场巴士4号线
    合肥机场巴士4号线
    合肥机场巴士3号线
    合肥机场巴士3号线
  • 上海厂房出租 短信验证码 酒店vi设计