10篇代码生成的论文,包括代码评估、代码搜索、代码生成、survey、代码或bug分类
资源文件列表:

代码生成论文_20241021/
代码生成论文_20241021/代码或bug分类/
代码生成论文_20241021/.DS_Store 6KB
__MACOSX/代码生成论文_20241021/._.DS_Store 120B
代码生成论文_20241021/代码生成/
代码生成论文_20241021/代码评估/
代码生成论文_20241021/代码搜索/
代码生成论文_20241021/代码模型survey/
代码生成论文_20241021/代码或bug分类/.DS_Store 6KB
__MACOSX/代码生成论文_20241021/代码或bug分类/._.DS_Store 120B
代码生成论文_20241021/代码或bug分类/LLMBRC A large language model-based bug report classification framework.pdf 2.56MB
__MACOSX/代码生成论文_20241021/代码或bug分类/._LLMBRC A large language model-based bug report classification framework.pdf 418B
代码生成论文_20241021/代码评估/.DS_Store 6KB
__MACOSX/代码生成论文_20241021/代码评估/._.DS_Store 120B
代码生成论文_20241021/代码评估/Program Code Generation with Generative AIs.pdf 480.83KB
__MACOSX/代码生成论文_20241021/代码评估/._Program Code Generation with Generative AIs.pdf 425B
代码生成论文_20241021/代码评估/A_Comparison_of_the_Effectiveness_of_ChatGPT_and_Co-Pilot_for_Generating_Quality_Python_Code_Solutions.pdf 352.52KB
__MACOSX/代码生成论文_20241021/代码评估/._A_Comparison_of_the_Effectiveness_of_ChatGPT_and_Co-Pilot_for_Generating_Quality_Python_Code_Solutions.pdf 510B
代码生成论文_20241021/代码评估/Comparing large language models and human programmers for generating programming code.pdf 2.04MB
__MACOSX/代码生成论文_20241021/代码评估/._Comparing large language models and human programmers for generating programming code.pdf 340B
代码生成论文_20241021/代码搜索/.DS_Store 6KB
__MACOSX/代码生成论文_20241021/代码搜索/._.DS_Store 120B
代码生成论文_20241021/代码搜索/Multimodal Representation for Neural Code Search.pdf 1019.4KB
__MACOSX/代码生成论文_20241021/代码搜索/._Multimodal Representation for Neural Code Search.pdf 340B
代码生成论文_20241021/代码模型survey/A Survey on Large Language Models for Code Generation .pdf 2.33MB
__MACOSX/代码生成论文_20241021/代码模型survey/._A Survey on Large Language Models for Code Generation .pdf 340B
代码生成论文_20241021/代码模型survey/.DS_Store 6KB
__MACOSX/代码生成论文_20241021/代码模型survey/._.DS_Store 120B
资源介绍:
题目 类型 分区 摘要 精读链接 Comparing large language models and humanprogrammers for generating programming code 代码评估 arxiv 评估七种LLMs在生成编程代码方面的性能,探讨不同提示策略对LLMs编码性能的影响,直接比较LLMs与人类程序员的编程能力,评估LLMs在不同编程语言之间生成和翻译代码的能力,以及考察LLMs的计算效率和从过去错误中学习的能力。 A Comparison of the Effectiveness of ChatGPT andCo-Pilot for Generating Quality Python Code 代码评估 会议 包括评估ChatGPT和Copilot在解决LeetCode编程问题上的有效性,探讨ChatGPT在接收到反馈后纠正代码的能力,以及其在提高代码质量和性能方面的潜力。 Program Code Generation with Generative AIs 代码评估 MDPI水刊-Algorithms非SCI 比较了人类生成的代码
Vol.:(0123456789)
Software Quality Journal (2024) 32:985–1005
https://doi.org/10.1007/s11219-024-09675-3
1 3
RESEARCH
LLM‑BRC: Alarge language model‑based bug report
classification framework
XiaotingDu
1,2
· ZhihaoLiu
3
· ChenglongLi
3
· XiangyueMa
3
· YingzhuoLi
1
·
XinyuWang
1
Accepted: 23 April 2024 / Published online: 24 May 2024
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
Abstract
Deep learning frameworks serve as the cornerstone for constructing robust deep learning
systems. However, bugs within these frameworks can have severe consequences, nega-
tively affecting various applications. Accurately classifying and understanding these
bugs is essential to ensure framework reliability. By doing so, developers can proactively
take appropriate measures to mitigate potential risks associated with specific bug types
in both current and future software releases. Despite the significance of bug report clas-
sification, existing methods fall short in terms of performance, rendering them impractical
for real-world applications. To address this limitation, we propose a bug report classifi-
cation framework for deep learning frameworks, called LLM–BRC, leveraging OpenAI’s
latest embedding model, text-embedding-ada-002. OurLLM–BRCframework achieves an
impressive accuracy range of 92% to 98.75% in bug report classification for three deep
learning frameworks: TensorFlow, MXNET, and PaddlePaddle. This represents a substan-
tial improvement of 17.21% to 69.15% compared to existing methods. Furthermore, we
conduct a comprehensive investigation into the impact of different bug report components
and different models.
Keywords Bug report classification· Deep learning framework· Large-language model
* Xiaoting Du
duxiaoting@bupt.edu.cn
Chenglong Li
li_chenglong@buaa.edu.cn
1
School ofComputer Science (National Pilot Software Engineering School), Beijing University
ofPosts andTelecommunications, Beijing, China
2
Shanghai Key Laboratory ofTrustworthy Computing (East China Normal University),
Shanghai200062, China
3
School ofAutomation Science andElectrical Engineering, Beihang University, Beijing, China

986
Software Quality Journal (2024) 32:985–1005
1 3
1 Introduction
Deep learning frameworks play a crucial role in building robust deep learning systems
(Zhang etal., 2020). With the rapid advancement of deep learning technology, the demand
for deep learning frameworks has experienced exponential growth (Guo etal., 2018). This
expansion encompasses the incorporation of new interfaces, the enhancement of function-
alities, and the optimization of compatibility with a wide array of hardware devices and
underlying drivers. Throughout this evolutionary process, the continuous iteration of code
and version updates inevitably introduces bugs into deep learning frameworks (Zhang etal.,
2018). Bugs in deep learning frameworks can have a significant and wide-reaching impact
on a larger user base compared to specific deep learning models. Particularly in safety- and
security-critical domains like autonomous driving (Chen et al., 2015) and healthcare (Cai
etal., 2014), the consequences of these bugs can be more severe. Therefore, ensuring the
reliability of deep learning frameworks is of utmost importance.
Numerous studies have been conducted to gain insights into the characteristics of bugs
in deep learning frameworks and provide assistance in their resolution. For instance, Jia
et al. (2021) conducted an analysis of bugs in TensorFlow based on 202 bug fixes. The
findings revealed that bugs in TensorFlow can be classified into 6 distinct categories based
on symptoms and 11 distinct categories based on root causes. In (Islam etal., 2019), Islam
etal. examined five deep learning libraries, namely Caffe (Jia etal., 2014), Keras (Lux &
Bertini, 2019), TensorFlow (Girija,2016), Theano (Team etal.,2016) and Torch (Collobert
et al., 2002). They analyzed 2,716 posts from Stack Overflow and 500 bug fix commits
from GitHub to identify commonly occurring bug types in deep learning frameworks.
According to the classification results, there are five different bug types, including API
bugs, Coding bugs, Data bugs, Structural bugs, and Non model structural bugs. In (Du
et al., 2022), we conducted a classification of bug reports in TensorFlow, MXNET, and
PaddlePaddle based on fault-triggering conditions. Bugs were categorized into Bohrbugs
(BOHs) and Mandelbugs (MANs), taking into account the conditions of fault activation
and error propagation. Moreover, within the MAN category, bugs were further classified as
either non-aging related Mandelbugs (NAMs) or aging-related bugs (ARBs).
However, the bug classification process in the aforementioned studies was all performed
manually. As the number of bug reports in deep learning frameworks continues to increase,
manually classifying all bug reports becomes impractical. Therefore, the development
of bug report classification methods becomes essential. In (Xia etal., 2014), the authors
employed the bag-of-words model to represent bug reports and utilized machine learn-
ing classifiers to classify them. However, the bag-of-words model neglects the contextual
semantic information present in bug reports, resulting in inadequate classification results.
To address this limitation and effectively utilize the semantic information embed-
ded within bug reports, we proposed the DeepSIM method in Du et al. (2021). Deep-
SIM employed a word2vec semantic model that was trained based on over two million
bug reports. However, the effectiveness of DeepSIM is hindered by the constrained size
of the training corpus utilized for the semantic model. To address the aforementioned
issues, we propose a Large Language Model-based Bug Report Classification framework
(LLM–BRC) for deep learning frameworks. Large language models (LLMs), particularly
GPT-3 and GPT-4 (Brown etal., 2020; Radford etal., 2018, 2019) have proven transforma-
tive in numerous fields and have made remarkable contributions in domains ranging from
mathematics (Frieder etal., 2023) and communication (Guo etal., 2023) to even medicine
(Nov etal., 2023). In particular, the prowess of LLMs lies in their ability to revolutionize

987
Software Quality Journal (2024) 32:985–1005
1 3
text processing across diverse tasks, substantially propelling the fields of natural language
understanding and generation to new heights (Ray, 2023). One of the core strengths of
LLMs is their mastery of language representation through dense vector embeddings. By
capturing intricate semantic meaning and contextual information, these embeddings allow
for a more nuanced understanding of language and context-aware language processing.
In our framework, we leverage the text-embedding-ada-002 model, which is the second-
generation embedding model announced by OpenAI on December 15, 2022, to represent
bug reports and facilitate bug report classification. Based on this model, bug reports can
be transformed into embeddings of a dimension size of 1,536. These embedding vectors
are then fed into a feed-forward neural network (FFN) for bug report classification. Unlike
traditional machine learning classifiers, FFN excels at capturing intricate patterns and
dependencies within the data, enabling it to learn highly representative and discriminative
features. This allows for enhanced accuracy of bug report classification and the ability to
handle high-dimensional input data efficiently. Finally, the effectiveness ofLLM–BRCis
evaluated on bug reports from three deep learning frameworks.
In summary, this article makes the following main contributions.
1. We present LLM–BRC, a Large Language Model-based Bug Report Classification
framework that combines a large language model with a deep learning classifier.
Through this method, we achieved accurate classification of bugs in deep learning
frameworks, with an accuracy ranging from 92% to 98.75%.
2. We explore the factors influencings classification results, including information from
different components of bug reports and types of language models, to further promote
the practical application of this method.
3. In order to facilitate bug report classification research, we have open-sourced both the
data and the method, which can be accessed at the following webpage: https:// sites.
google. com/ view/ llmbp/.
The rest of the paper is organized as follows. Section II presents the proposed approach.
Section III provides an overview of the experimental setup. Section IV describes the evalu-
ation and analysis of the results. In section V, we discuss the threats to validity. Section VI
presents the related work. Finally, the last section concludes the paper.
2 Our approach
In this section, we propose a bug report classification framework called LLM–BRC.
The overall procedure of LLM–BRC is depicted in Fig. 1. As shown in the figure,
LLM–BRCcomprises three sequential steps: data preparation, LLM-based bug report rep-
resentation, and bug report classification. In the data preparation phase, we start by extract-
ing information from bug reports in deep learning frameworks’ GitHub repositories, using
a custom-designed web crawl tool. Next, the preprocessed bug reports are fed into the
OpenAI’s text-embedding-ada-002 model, which transforms the natural language text into
dense embedding vector representations. These embeddings capture the semantic meaning
and contextual information present in the bug reports. Finally, a FFN is constructed and
trained using labeled bug reports. The FFN utilizes the learned embeddings to perform the
bug report classification task. In the subsequent parts of this section, we provide a detailed
explanation of each step of LLM–BRC.

988
Software Quality Journal (2024) 32:985–1005
1 3
2.1 Data preparation
We initiate the data preparation process by crawling bug reports based on their Bug-ID
from the GitHub repositories of TensorFlow, MXNET, and PaddlePaddle. This crawl
phase considers a total of 3,110 bug reports from these three deep learning frame-
works, which were previously labeled in our previous work (Du etal., 2022). Since text
is the dominant feature contained in bug reports, we collect natural language informa-
tion including title, description, and comments from each bug report. Among them, the
title provides a concise summary of the entire bug report, offering a brief overview of
the entire bug report. The description section contains a detailed account of the issue,
including observed software anomalies, the software runtime environment, reproduction
steps, and other relevant details. Furthermore, the comment section comprises discus-
sions and exchanges among developers, the report submitter, and other interest parties.
These comments provide valuable insights and additional information related to the
reported issue.
2.2 LLM‑based bug report representation
After extracting bug reports, we obtain a corpus of text data. To represent these texts effec-
tively, we utilize a powerful pre-trained large language model called text-embedding-ada-002.
By applying text-embedding-ada-002 to the texts, we obtain dense and low-dimensional
embedding vectors that serve as compact representations of the original bug reports.
Specifically, text-embedding-ada-002 model employs the Transformer architecture
(Ashish etal., 2017) to convert input into a 1,536-dimensional vector. Firstly, each input
bug report is tokenized and segmented into tokens. Next, the tokens pass through 96
decoder layers, each comprising a masked multi-head self-attention mechanism and a feed-
forward neural network. The multi-head self-attention layer computes self-attention on the
input sequential data, generating feature representations for each position in the sequence.
The feed-forward network performs fully connected calculations on the feature vectors at
each position, producing new feature representations. Its crucial role is to provide nonlin-
ear transformations.
Fig. 1 Detailed structure of LLM-BRC

989
Software Quality Journal (2024) 32:985–1005
1 3
The decoder layers start by applying
h
different linear projections to the Query, Key, and
Value. The resulting attention values for each head i are calculated as follows:
where
Q
,
K
, and
V
represent the query vector, key vector, and value vector, respectively.
The attention mechanism used in the transformer employs scaled dot-product attention,
which can be defined as:
where,
d
k
represents the dimension of the query/key vectors.
The resulting attention values from all the heads are concatenated together, resulting in
a single multi-head attention output:
where
W
O
is a weight matrix used to combine the multi-head attention outputs.
Additionally, the decoder includes an additional masked multi-head self-attention layer.
This layer prevents the model from seeing future information during sequence prediction.
Hence, the final output of the decoder can be represented as:
where
y
represents the input sequential data,
x
refers to the output sequence from the
encoder,
MHA
denotes the multi-head self-attention layer,
FFN
represents the feed forward
layer,
LN
represents the layer normalization layer, and
M
MHA
signifies the masked multi-
head self-attention layer.
Finally, the output of the attention layer undergoes processing through a feed-forward
neural network. The position-wise feed-forward network is a fully connected feed-forward
neural network where each word at a position passes through the same network indepen-
dently. It essentially consists of two fully connected layers. After passing through all the
decoder layers, the final output is generated by the last decoder layer. This output contains
the contextual information of the bug reports and serves as the ultimate embedding vector
representation for bug reports. This embedding vector will be used for subsequent classifi-
cation tasks.
2.3 Bug report classification
In this section, we conduct the bug report classification task at three levels, as depicted
in Fig.2. At the first level, we classify bug reports into two categories: bugs and non-
bugs. As depicted in Herzig et al. (2013), not all bug reports contain actual bugs.
Therefore, bug reports related to requests for new features or enhancements, documen-
tation issues (e.g., missing information, outdated documentation, or harmless warn-
ing outputs), compile-time issues (e.g., cmake errors or linking errors), operator errors
or duplicate reports are considered non-bugs and should be filtered out. Based on the
complexity of fault activation and/or error propagation conditions, we predict bugs into
Bohrbugs (BOHs) and Mandelbugs (MANs) in the second level (Grottke & Trivedi,
2005). Finally, within the MAN category, we further differentiate between aging-related
(1)
head
i
= attention(QW
Q
i
, KW
K
i
, VW
V
i
)
(2)
Attention(Q, K, V)=softmax(
QK
T
√
(d
k
)
)V
(3)
MultiHead
(
Q
,
K
,
V
)=
concat
(
head
1
, ...,
head
h
)
W
O
(4)
DecoderLayer
(
y
)=
LN
(
y
+
M
MHA(y)
+
MHA
(
y, x
)+
FFN
(
y))