-
-
Notifications
You must be signed in to change notification settings - Fork 158
The impact of loss function on training result #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @rzhli , On the loss function, it makes sense that the choice between SAD and MSE could significantly impact the training, especially if the scale of errors differs greatly. Perhaps we could explore the distribution of the differences in the original data. Regarding the callback function, feature-specific early stopping sounds like a useful idea. I wonder if there are existing libraries or techniques that facilitate this. The difference in pre-processing order (split before/after standardization) is also something to consider, as it could affect the information leakage between the train and test sets. I'm keen to investigate these points further. Do you have any initial thoughts or suggestions on where I should start? |
julia> y_mean
4×1 Matrix{Float64}:
25.27671416254674
61.30422906824019
7.194851184048889
1007.321897943855
julia> y_scale
4×1 Matrix{Float64}:
7.484413247165987
15.784237081943546
1.8812446147951691
7.75348347013801
function dimensionless(x)
x_dimless = (x .- minimum(x)) / (maximum(x) - minimum(x))
return x_dimless
end this transforms each data point into the [0, 1] without distortion, and only affected by the maximum and minimum value of the feature.
|
Hi @rzhli , |
Uh oh!
There was an error while loading. Please reload this page.
In Weather forecasting example, you choose the
sum(abs2)
as the loss function, but in Sebastian Callh personal blog, he use theFlux.mse
as the loss function. And the difference oflosses
are orders of magnitude. The forecasting result also not satisfied compared with the original one. Is this because of the different loss functions?The callback function used
false
, can we set different criteria for eachFeature
so we can terminate if loss is small enough?All raw data was pre-processed as a whole in the original example, while in this example, you divided it into train and test, and then standardized it separately, this resulted in slightly different training data, despite using the same set of data. How much impact does this have on the training and the final test outcome?
The text was updated successfully, but these errors were encountered: