Open
Description
Problem
Linked list is one of the essential data structures beside an array. It allows you to add, insert, or remove elements in constant time, without re-allocating the whole structure.
Fortran doesn't have a linked list. There are 3rd party libraries, but no obvious go-to solution. Fortran stdlib should have a linked list. I would use it.
Examples
- FLIBS by @arjenmarkus
- fortran-list by @LadaF
- flist by @jacobwilliams
- PolyCon by @cmacmackin
- Petaca by @nncarlson
- Modern Fortran Explained by MRC has an implementation in Appendix C
- Many others?
What kind of data can the linked list hold?
There's various levels of capability we could pursue:
- Single type: Basically just like an array, but allows insertion in constant time;
- Elements can be of any intrinsic type in a single list;
- Can take intrinsic type and user-defined derived types (is this even possible in current Fortran?)
API
I don't know, something like this?
use stdlib_experimental_collections, only :: List
type(List) :: a = List()
call a % append(42)
call a % append(3.141)
call a % append('text')
print *, a % get(2) ! prints 3.141
call a % remove(3) ! a is now List([42, 3.141])
call a % insert(2, 'hello') ! a is now List([42, 'hello', 3.141])
a = List([1, 2, 3]) ! instantiate a list from an array
Activity
certik commentedon Jan 2, 2020
C++ has std::list for this. (I added Petaca to your Examples above.)
I would mention that I personally have never had a need for a
std::list
in C++, nor any linked list implementation in Fortran, because linked list is very slow (to create, traverse, destrogy, ...) compared to just a regular array orstd::vector
. The only operation that might be faster is insertion or deletion of individual items in the middle. In my use cases, I typically need to add elements to the end, in which case array works great.But since there are at least 6 different people who reimplemented this already in Fortran and given that C++ has it too in their standard library, I would say that this would be a good candidate to include in stdlib, so that if people want to use it, they can. So +1 from me.
milancurcic commentedon Jan 2, 2020
Me too, but how do you do it? I thought that appending to an array always re-allocates on heap, e.g.:
It's okay, for small-to-moderate arrays, but for very large ones, isn't it crippling?
certik commentedon Jan 2, 2020
The canonical way is to pre-allocate the array and then append to it, like this:
Then you use your actual application to figure out what the maximum size of the array is (100000000 in this example), and then you can either keep
a
as is (only use the first 1000 elements, as in this example), or you can copy it to a smaller array. A real world example is e.g. here: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sparse.f90#L111, theBj_
array is pre-allocated to the maximum size first (determined from the sparse arrays), and then downsized before returning to the user: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sparse.f90#L127. This is typically still much faster than a linked list implementation. If you don't know the size ahead of time, then you can set some maximum at compile time and fail the program if you go over it (real world example: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/basis.f90#L230) --- many times this is fine, as you can recompile the code easily. But sometimes that's not appropriate, so then you can also do whatstd::vector
does --- it doubles the allocation every time you reach it, and copies the data. Here is a fast implementation of that that I use in LFortran (that's in C++, but one can do something similar in Fortran also): https://gitlab.com/lfortran/lfortran/blob/57d3b8077d884f0ff3945ad3a86b2da920e4b6b3/src/lfortran/parser/parser_stype.h#L22. All of these are fast options.But as I said, it's good to have linked list in stdlib, if people prefer that, so that they do not need to reimplement it.
rweed commentedon Jan 2, 2020
First I think we need to define which types of linked list we need. I prefer a circular double-linked list as the basic type since its the type I use most in FEM codes etc. I also think we would need a single-link list to implement stacks and queues. Also do we need some form of reference counting. As to @milancurcic question as to current Fortran support list that can contain both intrinsic and user defined types, yes it can. I've implemented both a circular list class and a single link class using unlimited polymorphic variables. They works but are not pretty and will probably have poor perfomance when compared to a type specific list generated by pre-processing/templating methods ala the
Fortran Template Library approach.
everythingfunctional commentedon Jan 3, 2020
Generic linked-list, or really any generic data structure, is really cumbersome with the current Fortran capabilities. They work, but you end up having to use a
select type
block every time you want to access the data. So for convenience you'd end up with some wrapper class or library, at which point you might as well have re-implemented for your specific use case. Until we get fully parameterized types or template capabilities I don't think these are a great idea.victorsndvg commentedon Jan 3, 2020
I think the supported data types should be wrapped with containers in order to be extendible.
In the main issue (#1) containers are mentioned, but I don't see any other specific issue.
I think FPL (https://github.com/victorsndvg/FPL) contains a smart implementation strategy for supporting native data types and allow to extend to other user defined data types. It contains lists, hash tables, etc. All of them depend on containers (aka wrappers) in order to manage different data types.
I agree that with this kind of data types you don't get performance, but amazing flexibility. This kind of data types (usually) are not for computation purposes.
Edit:
ivan-pi commentedon Jan 3, 2020
I think many of the projects in the list of popular projects contain linked list implementations. Perhaps it would be good to do a grep over all of those repositories to get a feeling for linked list usage in production codes (e.g. whether they use generic lists supporting multiple kinds or only specific ones for the intrinsic kinds and potentially derived types).
nshaffer commentedon Jan 3, 2020
I agree with @everythingfunctional on this issue. There's a ton of up-front labor in implementing fully polymorphic containers, and I'm not convinced that they're that much more useful than having generic (but homogeneous) containers. That is, I don't think it's worthwhile to support, say, linked lists where each element is of arbitrary type.
The more common use case I find is to need a linked list of
int32
, say. Or a binary tree ofclass(my_derived_t)
. These can be implemented withoutselect type
all over the place. There's still the labor of implementing all the intrinsic types, but this can be templated.Letting users make containers of derived types is tricker. The common solution is to provide an abstract base class that users need to extend in order to have containers of derived types. I think that solution kind of sucks, but I have an alternate idea... Just ship source code templates that implement each container for
class(__T__)
and then let users runsed s/__T__/mytype/g
on it to produce derived type containers on demand. (This will be slightly more involved for, e.g., mapping types, but just slightly).I confess I have not thought through if there is some great pitfall to this approach besides being slightly "icky" from a distribution p.o.v.
nncarlson commentedon Jan 3, 2020
I'm in agreement with @nshaffer here. I've done the linked-list-of
class(*)
variables in my own library which I use as a backend for some very specific things where such generality is needed. Otherwise it is incredibly clunky to use with all theselect type
and isn't an acceptable for general use, imo.Someone else seemed to suggest that perhaps performance shouldn't be a concern here. I think it would be a big mistake to ignore performance. Linked lists come with their intrinsic performance overhead that most would be aware of, but any implementation that significantly added to that I would find unacceptable to include in a standard library.
I think the best solution beyond intrinsic types, which could all have very performant implementations, would be, as @nshaffer suggested, to provide a literal template that a user could adapt for their particular case. In fact that's more or less what I do myself.
zbeekman commentedon Jan 3, 2020
A note on performance:
I think there is merit to providing classic data structures and algorithms. I would add hash tables to this list as well as binary-trees, octrees, K-D trees, and a number of others. Obviously they are not useful to all users and applications but having a decent implementation is worthwhile.
I agree that right now the
select type
combinatorial explosion makes unlimited polymorphics nearly useless, and very awkward. In my opinion better generic programming should be the highest priority for the next major standard revision.certik commentedon Jan 3, 2020
@zbeekman Generic programming will not make it to the next standard revision -- simply because there is no proposal that is ready. I think the latest most developed idea is pursued at j3-fortran/fortran_proposals#125, and we need everybody's help to help transform the idea into a solid proposal. Once we have a proposal that is community backed, I'll be happy to bring it to the committee and try to get it into the next standard.
zbeekman commentedon Jan 3, 2020
I know @rouson is working with Magne who leads the Bergen Language Design Lab and also @tclune on generics. They have something here but I don't know how up to date it is with their current efforts. Hopefully they can combine efforts and we can get something in, we'll see.
certik commentedon Jan 3, 2020
Yes, the issue j3-fortran/fortran_proposals#125 is the latest based on our discussion with Magne at the last meeting. Anyway, let's move the discussion about this there, I just wanted to point this out, that we need help.
tclune commentedon Jan 3, 2020
In the mean time I have a project https://github.com/Goddard-Fortran-Ecosystem/gFTL which provides (by far less elegant means) a generic container system. Currently it supports Vector and Map (ala C++ STL), but also has Set which is used under the hood.
gFTL uses the C preprocessor and requires explicit instantiation, but is still a real game changer for doing some common operations within Fortran. I have a separate project gFTL-shared that provides common instantiations.
But I do look forward to the day that this could be done much more elegantly through a proper generic facility. (And yes, I realize that other preprocessors could do what I have done more elegantly than the C preprocessor, but ... cpp is already integrated into the build systems for the other projects I work with.
gronki commentedon Jan 3, 2020
I agree here with @zbeekman that linked lists are essential and I think the approach to preallocate array is very ineffective (cause then you have to check for overflow and re-allocate it etc). I also sadly agree that this is undoable in the current Fortran. Gotta wait for generics (or hopefully an intrinsic highly-optimized types for lists and dicts).
83 remaining items