Skip to content

Linked list #68

Open
Open
@milancurcic

Description

@milancurcic

Problem

Linked list is one of the essential data structures beside an array. It allows you to add, insert, or remove elements in constant time, without re-allocating the whole structure.

Fortran doesn't have a linked list. There are 3rd party libraries, but no obvious go-to solution. Fortran stdlib should have a linked list. I would use it.

Examples

What kind of data can the linked list hold?

There's various levels of capability we could pursue:

  1. Single type: Basically just like an array, but allows insertion in constant time;
  2. Elements can be of any intrinsic type in a single list;
  3. Can take intrinsic type and user-defined derived types (is this even possible in current Fortran?)

API

I don't know, something like this?

use stdlib_experimental_collections, only :: List
type(List) :: a = List()

call a % append(42)
call a % append(3.141)
call a % append('text')
print *, a % get(2) ! prints 3.141
call a % remove(3) ! a is now List([42, 3.141])
call a % insert(2, 'hello') ! a is now List([42, 'hello', 3.141])

a = List([1, 2, 3]) ! instantiate a list from an array

Activity

certik

certik commented on Jan 2, 2020

@certik
Member

C++ has std::list for this. (I added Petaca to your Examples above.)

I would mention that I personally have never had a need for a std::list in C++, nor any linked list implementation in Fortran, because linked list is very slow (to create, traverse, destrogy, ...) compared to just a regular array or std::vector. The only operation that might be faster is insertion or deletion of individual items in the middle. In my use cases, I typically need to add elements to the end, in which case array works great.

But since there are at least 6 different people who reimplemented this already in Fortran and given that C++ has it too in their standard library, I would say that this would be a good candidate to include in stdlib, so that if people want to use it, they can. So +1 from me.

milancurcic

milancurcic commented on Jan 2, 2020

@milancurcic
MemberAuthor

In my use cases, I typically need to add elements to the end, in which case array works great.

Me too, but how do you do it? I thought that appending to an array always re-allocates on heap, e.g.:

integer :: i
integer, allocatable :: a(:)
a = [integer ::]
do i = 1, 100000000
  a = [a, i] ! re-allocates a on every append
end do

It's okay, for small-to-moderate arrays, but for very large ones, isn't it crippling?

certik

certik commented on Jan 2, 2020

@certik
Member

The canonical way is to pre-allocate the array and then append to it, like this:

integer :: i
integer, allocatable :: a(:)
allocate(a(100000000))
do i = 1, 1000
    a(i) = i
end do

Then you use your actual application to figure out what the maximum size of the array is (100000000 in this example), and then you can either keep a as is (only use the first 1000 elements, as in this example), or you can copy it to a smaller array. A real world example is e.g. here: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sparse.f90#L111, the Bj_ array is pre-allocated to the maximum size first (determined from the sparse arrays), and then downsized before returning to the user: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/sparse.f90#L127. This is typically still much faster than a linked list implementation. If you don't know the size ahead of time, then you can set some maximum at compile time and fail the program if you go over it (real world example: https://github.com/certik/hfsolver/blob/b4c50c1979fb7e468b1852b144ba756f5a51788d/src/basis.f90#L230) --- many times this is fine, as you can recompile the code easily. But sometimes that's not appropriate, so then you can also do what std::vector does --- it doubles the allocation every time you reach it, and copies the data. Here is a fast implementation of that that I use in LFortran (that's in C++, but one can do something similar in Fortran also): https://gitlab.com/lfortran/lfortran/blob/57d3b8077d884f0ff3945ad3a86b2da920e4b6b3/src/lfortran/parser/parser_stype.h#L22. All of these are fast options.

But as I said, it's good to have linked list in stdlib, if people prefer that, so that they do not need to reimplement it.

rweed

rweed commented on Jan 2, 2020

@rweed

First I think we need to define which types of linked list we need. I prefer a circular double-linked list as the basic type since its the type I use most in FEM codes etc. I also think we would need a single-link list to implement stacks and queues. Also do we need some form of reference counting. As to @milancurcic question as to current Fortran support list that can contain both intrinsic and user defined types, yes it can. I've implemented both a circular list class and a single link class using unlimited polymorphic variables. They works but are not pretty and will probably have poor perfomance when compared to a type specific list generated by pre-processing/templating methods ala the
Fortran Template Library approach.

everythingfunctional

everythingfunctional commented on Jan 3, 2020

@everythingfunctional
Member

Generic linked-list, or really any generic data structure, is really cumbersome with the current Fortran capabilities. They work, but you end up having to use a select type block every time you want to access the data. So for convenience you'd end up with some wrapper class or library, at which point you might as well have re-implemented for your specific use case. Until we get fully parameterized types or template capabilities I don't think these are a great idea.

victorsndvg

victorsndvg commented on Jan 3, 2020

@victorsndvg

I think the supported data types should be wrapped with containers in order to be extendible.
In the main issue (#1) containers are mentioned, but I don't see any other specific issue.

I think FPL (https://github.com/victorsndvg/FPL) contains a smart implementation strategy for supporting native data types and allow to extend to other user defined data types. It contains lists, hash tables, etc. All of them depend on containers (aka wrappers) in order to manage different data types.

I agree that with this kind of data types you don't get performance, but amazing flexibility. This kind of data types (usually) are not for computation purposes.

Edit:

  • Note 1: FPL data types naming is probably not good
  • Note 2: Multiple dimensions management is another usual issue also treated by FPL
ivan-pi

ivan-pi commented on Jan 3, 2020

@ivan-pi
Member

I think many of the projects in the list of popular projects contain linked list implementations. Perhaps it would be good to do a grep over all of those repositories to get a feeling for linked list usage in production codes (e.g. whether they use generic lists supporting multiple kinds or only specific ones for the intrinsic kinds and potentially derived types).

nshaffer

nshaffer commented on Jan 3, 2020

@nshaffer
Contributor

I agree with @everythingfunctional on this issue. There's a ton of up-front labor in implementing fully polymorphic containers, and I'm not convinced that they're that much more useful than having generic (but homogeneous) containers. That is, I don't think it's worthwhile to support, say, linked lists where each element is of arbitrary type.

The more common use case I find is to need a linked list of int32, say. Or a binary tree of class(my_derived_t). These can be implemented without select type all over the place. There's still the labor of implementing all the intrinsic types, but this can be templated.

Letting users make containers of derived types is tricker. The common solution is to provide an abstract base class that users need to extend in order to have containers of derived types. I think that solution kind of sucks, but I have an alternate idea... Just ship source code templates that implement each container for class(__T__) and then let users run sed s/__T__/mytype/g on it to produce derived type containers on demand. (This will be slightly more involved for, e.g., mapping types, but just slightly).

I confess I have not thought through if there is some great pitfall to this approach besides being slightly "icky" from a distribution p.o.v.

nncarlson

nncarlson commented on Jan 3, 2020

@nncarlson
Contributor

I'm in agreement with @nshaffer here. I've done the linked-list-of class(*) variables in my own library which I use as a backend for some very specific things where such generality is needed. Otherwise it is incredibly clunky to use with all the select type and isn't an acceptable for general use, imo.

Someone else seemed to suggest that perhaps performance shouldn't be a concern here. I think it would be a big mistake to ignore performance. Linked lists come with their intrinsic performance overhead that most would be aware of, but any implementation that significantly added to that I would find unacceptable to include in a standard library.

I think the best solution beyond intrinsic types, which could all have very performant implementations, would be, as @nshaffer suggested, to provide a literal template that a user could adapt for their particular case. In fact that's more or less what I do myself.

zbeekman

zbeekman commented on Jan 3, 2020

@zbeekman
Member

A note on performance:

  1. Yes linked lists have some overhead compared with arrays
  2. They perform well for sorted data, and in instances where you're always manipulating one end of the list or the other, e.g., stacks, but, in general are NOT constant time lookup for random access read or insertion unless you're always operating on data "nearby"
  3. They are a building block component for hash tables which are in general constant time insertion and lookup.
  4. In some cases where storage needs vary greatly and dynamically in complex ways pre-allocating a huge array may not be feasible and you may want/need to use a linked list

I think there is merit to providing classic data structures and algorithms. I would add hash tables to this list as well as binary-trees, octrees, K-D trees, and a number of others. Obviously they are not useful to all users and applications but having a decent implementation is worthwhile.

I agree that right now the select type combinatorial explosion makes unlimited polymorphics nearly useless, and very awkward. In my opinion better generic programming should be the highest priority for the next major standard revision.

certik

certik commented on Jan 3, 2020

@certik
Member

@zbeekman Generic programming will not make it to the next standard revision -- simply because there is no proposal that is ready. I think the latest most developed idea is pursued at j3-fortran/fortran_proposals#125, and we need everybody's help to help transform the idea into a solid proposal. Once we have a proposal that is community backed, I'll be happy to bring it to the committee and try to get it into the next standard.

zbeekman

zbeekman commented on Jan 3, 2020

@zbeekman
Member

I know @rouson is working with Magne who leads the Bergen Language Design Lab and also @tclune on generics. They have something here but I don't know how up to date it is with their current efforts. Hopefully they can combine efforts and we can get something in, we'll see.

certik

certik commented on Jan 3, 2020

@certik
Member

Yes, the issue j3-fortran/fortran_proposals#125 is the latest based on our discussion with Magne at the last meeting. Anyway, let's move the discussion about this there, I just wanted to point this out, that we need help.

tclune

tclune commented on Jan 3, 2020

@tclune

In the mean time I have a project https://github.com/Goddard-Fortran-Ecosystem/gFTL which provides (by far less elegant means) a generic container system. Currently it supports Vector and Map (ala C++ STL), but also has Set which is used under the hood.

gFTL uses the C preprocessor and requires explicit instantiation, but is still a real game changer for doing some common operations within Fortran. I have a separate project gFTL-shared that provides common instantiations.

But I do look forward to the day that this could be done much more elegantly through a proper generic facility. (And yes, I realize that other preprocessors could do what I have done more elegantly than the C preprocessor, but ... cpp is already integrated into the build systems for the other projects I work with.

gronki

gronki commented on Jan 3, 2020

@gronki

I agree here with @zbeekman that linked lists are essential and I think the approach to preallocate array is very ineffective (cause then you have to check for overflow and re-allocate it etc). I also sadly agree that this is undoable in the current Fortran. Gotta wait for generics (or hopefully an intrinsic highly-optimized types for lists and dicts).

83 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: container(Abstract) data structures and containerstopic: utilitiescontainers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Linked list · Issue #68 · fortran-lang/stdlib