Memorial de projetos : evolução e impactos das funções de ativação em redes neurais profunda

Graça, Pedro de Sousa Alves

Visualizar/Abrir

R - E - PEDRO DE SOUSA ALVES GRACA.pdf (25.43Mb)

Data

2026

Autor

Graça, Pedro de Sousa Alves

Metadata

Mostrar registro completo

Resumo

Resumo: As redes neurais artificiais formam a base das tecnologias modernas deinteligência artificial, sendo essenciais para tarefas avançadas como oreconhecimento de imagens e o processamento de linguagem natural. No entanto,para que esses modelos consigam aprender dados complexos do mundo real e nãoapenas realizar cálculos matemáticos simples, eles dependem inteiramente de umcomponente crucial chamado função de ativação. Sem essas funções, uma redeneural seria incapaz de resolver problemas difíceis, comportando-se apenas como ummodelo linear básico. Este memorial apresenta uma revisão elaborada sobre a históriae o desenvolvimento técnico dessas funções ao longo do tempo. Inicialmente, o textoexplora as abordagens clássicas, como a função Sigmoide e a Tangente Hiperbólica,que foram muito populares nas primeiras décadas da inteligência artificial. Apesar desua importância histórica, o estudo demonstra que essas funções antigas apresentamproblemas graves quando usadas em redes com muitas camadas, pois elas tendema "esmagar" os dados nas extremidades. Isso faz com que o sinal de erro diminuaprogressivamente até desaparecer durante o treinamento, um fenômeno conhecidocomo o problema do desvanecimento do gradiente, que impediu o avanço da área poranos. Em resposta a isso, o texto analisa a grande mudança causada pela introduçãoda Unidade Linear Retificada (ReLU). A ReLU tornou-se o padrão atual na indústriaporque é computacionalmente leve e resolve a questão do desaparecimento do sinal,permitindo o treinamento de redes muito mais profundas e rápidas. Porém, como aReLU pode, às vezes, fazer com que alguns neurônios "morram" e parem de funcionartotalmente, o texto também investiga as variantes mais novas e adaptativas, como aLeaky ReLU e a função Swish. Essas versões modernas buscam corrigir as falhas daReLU original, garantindo que a rede continue aprendendo de forma eficiente emqualquer situação

Abstract: Artificial neural networks form the foundation of modern artificial intelligencetechnologies and are essential for advanced tasks such as image recognition andnatural language processing. However, for these models to successfully learn complexpatterns from the real world rather than just performing simple mathematicalcalculations, they depend entirely on a critical component known as the activationfunction. Without these functions, a neural network would be unable to solve difficultproblems, behaving effectively like a basic linear model. This text presents a detailedreview of the history and technical development of these functions over time. Initially,the text explores classical approaches, such as the Sigmoid function and theHyperbolic Tangent, which were very popular in the early decades of artificialintelligence. Despite their historical importance, the study demonstrates that theseolder functions present serious problems when used in networks with many layers, asthey tend to "squash" data at the extremes. This causes the error signal toprogressively decrease until it disappears during training, a phenomenon known as thevanishing gradient problem, which hindered progress in the field for years. In responseto this, the text analyzes the major shift caused by the introduction of the RectifiedLinear Unit (ReLU). ReLU has become the current industry standard because it iscomputationally lightweight and solves the issue of the disappearing signal, allowingfor the training of much deeper and faster networks. However, since ReLU cansometimes cause certain neurons to "die" and stop working entirely, the text alsoinvestigates newer and more adaptive variants, such as Leaky ReLU and the Swishfunction. These modern versions seek to correct the flaws of the original ReLU,ensuring that the network continues to learn efficiently in any situation

URI

https://hdl.handle.net/1884/101853

Collections

Inteligência Artificial Aplicada [143]