क्या सीएसवी कॉलम को पदानुक्रमित संबंधों में बदलने का एक तरीका है?

27

मेरे पास 7 मिलियन जैव विविधता रिकॉर्ड्स का एक सीएसवी है जहां टैक्सोनॉमी का स्तर कॉलम के रूप में है। उदाहरण के लिए:

RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis
3,Plantae,nan,Magnoliopsida,Brassicales,Brassicaceae,Arabidopsis,Arabidopsis thaliana
4,Plantae,nan,Magnoliopsida,Fabales,Fabaceae,Phaseoulus,Phaseolus vulgaris

मैं डी 3 में एक विज़ुअलाइज़ेशन बनाना चाहता हूं, लेकिन डेटा प्रारूप एक नेटवर्क होना चाहिए, जहां प्रत्येक अलग-अलग मान स्तंभ एक निश्चित मान के लिए पिछले स्तंभ का एक बच्चा है। मुझे csv से कुछ इस तरह जाने की आवश्यकता है:

{
  name: 'Animalia',
  children: [{
    name: 'Chordata',
    children: [{
      name: 'Mammalia',
      children: [{
        name: 'Primates',
        children: 'Hominidae'
      }, {
        name: 'Carnivora',
        children: 'Canidae'
      }]
    }]
  }]
}

मैं यह जानने के लिए नहीं आया हूं कि छोरों के लिए एक हजार का उपयोग किए बिना यह कैसे करना है। क्या किसी के पास यह सुझाव है कि इस नेटवर्क को या तो अजगर या जावास्क्रिप्ट पर कैसे बनाया जाए?

— एंड्रेस कैमिलो ज़ुनीगा गोंजालेज
स्रोत

आपके प्रश्न से संबंधित नहीं है, लेकिन जब मैंने अपना उत्तर लिखा, तब मैंने nanPhylum के लिए Magnoliopsida युक्त देखा । वह क्या है nan? फाइलम एंथोफाइटा है, या वैकल्पिक रूप से मैगनोलिया (यह पुरानी फाइलम एंजियोस्पर्मे है)।

— गेरार्डो फर्टाडो

16

आप चाहते हैं कि सटीक नेस्टेड ऑब्जेक्ट बनाने के लिए, हम शुद्ध जावास्क्रिप्ट के मिश्रण और एक डी 3 नाम का उपयोग करेंगे d3.stratify। हालाँकि, ध्यान रखें कि 7 मिलियन पंक्तियाँ (कृपया नीचे पोस्ट स्क्रिप्ट देखें) गणना करने के लिए बहुत कुछ है।

यह उल्लेख करना बहुत महत्वपूर्ण है कि, इस प्रस्तावित समाधान के लिए, आपको विभिन्न डेटा सरणियों (उदाहरण के लिए, उपयोग करके ) में राज्यों को अलग करना होगा Array.prototype.filter। यह प्रतिबंध इसलिए होता है क्योंकि हमें रूट नोड की आवश्यकता होती है, और लिनैनायन टैक्सोनॉमी में राज्यों के बीच कोई संबंध नहीं है (जब तक कि आप "डोमेन" को एक शीर्ष रैंक के रूप में नहीं बनाते हैं , जो सभी यूकेरियोट्स के लिए मूल होगा, लेकिन तब आपके पास एक ही होगा। आर्किया और बैक्टीरिया के लिए समस्या)।

तो, मान लीजिए कि आपके पास यह CSV है (मैंने कुछ और पंक्तियाँ जोड़ दी हैं)

RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis latrans
3,Animalia,Chordata,Mammalia,Cetacea,Delphinidae,Tursiops,Tursiops truncatus
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Pan,Pan paniscus

उस CSV के आधार पर, हम यहां एक सरणी बनाएंगे जिसका नाम है tableOfRelationships, जैसा कि नाम से पता चलता है, रैंकों के बीच संबंध हैं:

const data = d3.csvParse(csv);

const taxonomicRanks = data.columns.filter(d => d !== "RecordID");

const tableOfRelationships = [];

data.forEach(row => {
  taxonomicRanks.forEach((d, i) => {
    if (!tableOfRelationships.find(e => e.name === row[d])) tableOfRelationships.push({
      name: row[d],
      parent: row[taxonomicRanks[i - 1]] || null
    })
  })
});

उपरोक्त डेटा के लिए, यह है tableOfRelationships:

+---------+----------------------+---------------+
| (Index) |         name         |    parent     |
+---------+----------------------+---------------+
|       0 | "Animalia"           | null          |
|       1 | "Chordata"           | "Animalia"    |
|       2 | "Mammalia"           | "Chordata"    |
|       3 | "Primates"           | "Mammalia"    |
|       4 | "Hominidae"          | "Primates"    |
|       5 | "Homo"               | "Hominidae"   |
|       6 | "Homo sapiens"       | "Homo"        |
|       7 | "Carnivora"          | "Mammalia"    |
|       8 | "Canidae"            | "Carnivora"   |
|       9 | "Canis"              | "Canidae"     |
|      10 | "Canis latrans"      | "Canis"       |
|      11 | "Cetacea"            | "Mammalia"    |
|      12 | "Delphinidae"        | "Cetacea"     |
|      13 | "Tursiops"           | "Delphinidae" |
|      14 | "Tursiops truncatus" | "Tursiops"    |
|      15 | "Pan"                | "Hominidae"   |
|      16 | "Pan paniscus"       | "Pan"         |
+---------+----------------------+---------------+

nullके माता-पिता के रूप में एक नज़र रखें Animalia: इसीलिए मैंने आपको बताया कि आपको अपने डेटासेट को राज्यों द्वारा अलग करने की आवश्यकता है, nullपूरी तालिका में केवल एक ही मूल्य हो सकता है ।

अंत में, उस तालिका के आधार पर, हम उपयोग करके पदानुक्रम बनाते हैं d3.stratify():

const stratify = d3.stratify()
    .id(function(d) { return d.name; })
    .parentId(function(d) { return d.parent; });

const hierarchicalData = stratify(tableOfRelationships);

और यहाँ डेमो है। अपने ब्राउज़र का कंसोल खोलें (स्निपेट इस कार्य के लिए बहुत अच्छा नहीं है) और childrenऑब्जेक्ट के कई स्तरों ( ) का निरीक्षण करें :

कोड स्निपेट दिखाएं

const csv = `RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis latrans
3,Animalia,Chordata,Mammalia,Cetacea,Delphinidae,Tursiops,Tursiops truncatus
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Pan,Pan paniscus`;

const data = d3.csvParse(csv);

const taxonomicRanks = data.columns.filter(d => d !== "RecordID");

const tableOfRelationships = [];

data.forEach(row => {
  taxonomicRanks.forEach((d, i) => {
    if (!tableOfRelationships.find(e => e.name === row[d])) tableOfRelationships.push({
      name: row[d],
      parent: row[taxonomicRanks[i - 1]] || null
    })
  })
});

const stratify = d3.stratify()
  .id(function(d) {
    return d.name;
  })
  .parentId(function(d) {
    return d.parent;
  });

const hierarchicalData = stratify(tableOfRelationships);

console.log(hierarchicalData);

<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.7.0/d3.min.js"></script>

स्निपेट का विस्तार करें

पुनश्च : मुझे नहीं पता कि आप किस प्रकार का डेटाविज़ बनाएंगे, लेकिन आपको वास्तव में टैक्सोनोमिक रैंक से बचना चाहिए। संपूर्ण लिनैयोन टैक्सोनॉमी पुरानी है, हम अब रैंक का उपयोग नहीं करते हैं: चूंकि 60 के दशक के मध्य में फाइटोलैनेटिक सिस्टमैटिक्स विकसित किया गया था, हम बिना किसी टैक्सोनोमिक रैंक (यहां विकासवादी जीव विज्ञान शिक्षक) के बिना केवल टैक्स का उपयोग करते हैं। इसके अलावा, मैं इन 7 मिलियन पंक्तियों के बारे में काफी उत्सुक हूं, क्योंकि हमने केवल 1 मिलियन से अधिक प्रजातियों का वर्णन किया है!

— गेरार्डो फर्टाडो
स्रोत

3

। @ gerardo आपके उत्तर के लिए धन्यवाद, मैं देखूंगा कि यह 7M पंक्तियों के नमूने में काम करता है या नहीं। डेटाबेस में कई प्रजातियों के लिए दोहराया पंक्तियाँ होती हैं। इसलिए विचार यह दिखाना है कि एक निश्चित वर्गीकरण रैंक के लिए कितने रिकॉर्ड हैं। यह विचार माइक बोस्सॉक के जूमेबल आइकॉल ट्री के समान कुछ बनाने के लिए है ।

— एंड्रेस कैमिलो ज़ुनीगा गोंजालेज

9

अजगर और python-benedictपुस्तकालय का उपयोग करने के लिए आपको ठीक वही करना आसान है जो यह आवश्यक है (यह गीथूब पर खुला स्रोत है :

स्थापना pip install python-benedict

from benedict import benedict as bdict

# data source can be a filepath or an url
data_source = """
RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis
3,Plantae,nan,Magnoliopsida,Brassicales,Brassicaceae,Arabidopsis,Arabidopsis thaliana
4,Plantae,nan,Magnoliopsida,Fabales,Fabaceae,Phaseoulus,Phaseolus vulgaris
"""
data_input = bdict.from_csv(data_source)
data_output = bdict()

ancestors_hierarchy = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
for value in data_input['values']:
    data_output['.'.join([value[ancestor] for ancestor in ancestors_hierarchy])] = bdict()

print(data_output.dump())
# if this output is ok for your needs, you don't need the following code

keypaths = sorted(data_output.keypaths(), key=lambda item: len(item.split('.')), reverse=True)

data_output['children'] = []
def transform_data(d, key, value):
    if isinstance(value, dict):
        value.update({ 'name':key, 'children':[] })
data_output.traverse(transform_data)

for keypath in keypaths:
    target_keypath = '.'.join(keypath.split('.')[:-1] + ['children'])
    data_output[target_keypath].append(data_output.pop(keypath))

print(data_output.dump())

पहला प्रिंट आउटपुट होगा:

{
    "Animalia": {
        "Chordata": {
            "Mammalia": {
                "Carnivora": {
                    "Canidae": {
                        "Canis": {
                            "Canis": {}
                        }
                    }
                },
                "Primates": {
                    "Hominidae": {
                        "Homo": {
                            "Homo sapiens": {}
                        }
                    }
                }
            }
        }
    },
    "Plantae": {
        "nan": {
            "Magnoliopsida": {
                "Brassicales": {
                    "Brassicaceae": {
                        "Arabidopsis": {
                            "Arabidopsis thaliana": {}
                        }
                    }
                },
                "Fabales": {
                    "Fabaceae": {
                        "Phaseoulus": {
                            "Phaseolus vulgaris": {}
                        }
                    }
                }
            }
        }
    }
}

दूसरा प्रिंटेड आउटपुट होगा:

{
    "children": [
        {
            "name": "Animalia",
            "children": [
                {
                    "name": "Chordata",
                    "children": [
                        {
                            "name": "Mammalia",
                            "children": [
                                {
                                    "name": "Carnivora",
                                    "children": [
                                        {
                                            "name": "Canidae",
                                            "children": [
                                                {
                                                    "name": "Canis",
                                                    "children": [
                                                        {
                                                            "name": "Canis",
                                                            "children": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "name": "Primates",
                                    "children": [
                                        {
                                            "name": "Hominidae",
                                            "children": [
                                                {
                                                    "name": "Homo",
                                                    "children": [
                                                        {
                                                            "name": "Homo sapiens",
                                                            "children": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        },
        {
            "name": "Plantae",
            "children": [
                {
                    "name": "nan",
                    "children": [
                        {
                            "name": "Magnoliopsida",
                            "children": [
                                {
                                    "name": "Brassicales",
                                    "children": [
                                        {
                                            "name": "Brassicaceae",
                                            "children": [
                                                {
                                                    "name": "Arabidopsis",
                                                    "children": [
                                                        {
                                                            "name": "Arabidopsis thaliana",
                                                            "children": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "name": "Fabales",
                                    "children": [
                                        {
                                            "name": "Fabaceae",
                                            "children": [
                                                {
                                                    "name": "Phaseoulus",
                                                    "children": [
                                                        {
                                                            "name": "Phaseolus vulgaris",
                                                            "children": []
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

— फैबियो कैकेमको
स्रोत

5

var log = console.log;
var data = `
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis
3,Plantae,nan,Magnoliopsida,Brassicales,Brassicaceae,Arabidopsis,Arabidopsis thaliana
4,Plantae,nan,Magnoliopsida,Fabales,Fabaceae,Phaseoulus,Phaseolus vulgaris`;
//make array of rows with array of values
data = data.split("\n").map(v=>v.split(","));
//init tree
var tree = {};
data.forEach(row=>{
    //set current = root of tree for every row
    var cur = tree; 
    var id = false;
    row.forEach((value,i)=>{
        if (i == 0) {
            //set id and skip value
            id = value;
            return;
        }
        //If branch not exists create. 
        //If last value - write id
        if (!cur[value]) cur[value] = (i == row.length - 1) ? id : {};
        //Move link down on hierarhy
        cur = cur[value];
    });
}); 
log("Tree:");
log(JSON.stringify(tree, null, "  "));

//Now you have hierarhy in tree and can do anything with it.
var toStruct = function(obj) {
    let ret = [];
    for (let key in obj) {
        let child = obj[key];
        let rec = {};
        rec.name = key;
        if (typeof child == "object") rec.children = toStruct(child);
        ret.push(rec);
    }
    return ret;
}
var struct = toStruct(tree);
console.log("Struct:");
console.log(struct);

स्निपेट का विस्तार करें

— क्रच मास्टर
स्रोत

5

यह सीधा लगता है, इसलिए शायद मैं आपकी समस्या को नहीं समझ रहा हूँ।

आप जो डेटा संरचना चाहते हैं, वह शब्दकोशों, कुंजी / मूल्य जोड़े का एक नेस्टेड सेट है। आपके शीर्ष स्तर के शब्दकोष में आपके प्रत्येक राज्य के लिए एक कुंजी है, जिसके मूल्य शब्दकोष शब्दकोश हैं। फ़ाइलम डिक्शनरी (एक राज्य के लिए) में प्रत्येक फ़ाइलम नाम के लिए एक कुंजी होती है और प्रत्येक कुंजी का एक मान होता है जो एक क्लास डिक्शनरी है, और इसी तरह।

कोड को सरल बनाने के लिए, आपके जीनस शब्दकोशों में प्रत्येक प्रजाति के लिए एक कुंजी होगी, लेकिन प्रजातियों के लिए मान खाली शब्दकोष होंगे।

यह वही होना चाहिए जो आप चाहते हैं; कोई अजीब पुस्तकालयों की आवश्यकता है।

import csv

def read_data(filename):
    tree = {}
    with open(filename) as f:
        f.readline()  # skip the column headers line of the file
        for animal_cols in csv.reader(f):
            spot = tree
            for name in animal_cols[1:]:  # each name, skipping the record number
                if name in spot:  # The parent is already in the tree
                    spot = spot[name]  
                else:
                    spot[name] = {}  # creates a new entry in the tree
                    spot = spot[name]
    return tree

इसका परीक्षण करने के लिए, मैंने आपके डेटा का उपयोग किया और pprintमानक पुस्तकालय से।

from pprint import pprint
pprint(read_data('data.txt'))

मिल रहा

{'Animalia': {'Chordata': {'Mammalia': {'Carnivora': {'Canidae': {'Canis': {'Canis': {}}}},
                                        'Primates': {'Hominidae': {'Homo': {'Homo sapiens': {}}}}}}},
 'Plantae': {'nan': {'Magnoliopsida': {'Brassicales': {'Brassicaceae': {'Arabidopsis': {'Arabidopsis thaliana': {}}}},
                                       'Fabales': {'Fabaceae': {'Phaseoulus': {'Phaseolus vulgaris': {}}}}}}}}

अपने प्रश्न को फिर से पढ़ते हुए, आप जोड़ियों की एक बड़ी तालिका ('अधिक सामान्य समूह से लिंक', 'अधिक विशिष्ट समूह से लिंक') चाहते हो सकते हैं। यही है, 'एनीमलिया' का लिंक 'एनीमलिया: कॉर्डेटा' और 'एनीमलिया: कॉर्डेटा' से 'एनीमलिया: कॉर्डेटा: ममालिया' आदि से जुड़ता है। दुर्भाग्य से, आपके डेटा में 'नेन' का अर्थ है कि आपको प्रत्येक लिंक में पूर्ण नाम चाहिए। यदि ( माता-पिता, बच्चे) जोड़े आप क्या चाहते हैं, इस तरह से पेड़ पर चलें:

def walk_children(tree, parent=''):
    for child in tree.keys():
        full_name = parent + ':' + child
        yield (parent, full_name)
        yield from walk_children(tree[child], full_name)

tree = read_data('data.txt')
for (parent, child) in walk_children(tree):
    print(f'parent="{parent}" child="{child}"')

दे रही है:

parent="" child=":Animalia"
parent=":Animalia" child=":Animalia:Chordata"
parent=":Animalia:Chordata" child=":Animalia:Chordata:Mammalia"
parent=":Animalia:Chordata:Mammalia" child=":Animalia:Chordata:Mammalia:Primates"
parent=":Animalia:Chordata:Mammalia:Primates" child=":Animalia:Chordata:Mammalia:Primates:Hominidae"
parent=":Animalia:Chordata:Mammalia:Primates:Hominidae" child=":Animalia:Chordata:Mammalia:Primates:Hominidae:Homo"
parent=":Animalia:Chordata:Mammalia:Primates:Hominidae:Homo" child=":Animalia:Chordata:Mammalia:Primates:Hominidae:Homo:Homo sapiens"
parent=":Animalia:Chordata:Mammalia" child=":Animalia:Chordata:Mammalia:Carnivora"
parent=":Animalia:Chordata:Mammalia:Carnivora" child=":Animalia:Chordata:Mammalia:Carnivora:Canidae"
parent=":Animalia:Chordata:Mammalia:Carnivora:Canidae" child=":Animalia:Chordata:Mammalia:Carnivora:Canidae:Canis"
parent=":Animalia:Chordata:Mammalia:Carnivora:Canidae:Canis" child=":Animalia:Chordata:Mammalia:Carnivora:Canidae:Canis:Canis"
parent="" child=":Plantae"
parent=":Plantae" child=":Plantae:nan"
parent=":Plantae:nan" child=":Plantae:nan:Magnoliopsida"
parent=":Plantae:nan:Magnoliopsida" child=":Plantae:nan:Magnoliopsida:Brassicales"
parent=":Plantae:nan:Magnoliopsida:Brassicales" child=":Plantae:nan:Magnoliopsida:Brassicales:Brassicaceae"
parent=":Plantae:nan:Magnoliopsida:Brassicales:Brassicaceae" child=":Plantae:nan:Magnoliopsida:Brassicales:Brassicaceae:Arabidopsis"
parent=":Plantae:nan:Magnoliopsida:Brassicales:Brassicaceae:Arabidopsis" child=":Plantae:nan:Magnoliopsida:Brassicales:Brassicaceae:Arabidopsis:Arabidopsis thaliana"
parent=":Plantae:nan:Magnoliopsida" child=":Plantae:nan:Magnoliopsida:Fabales"
parent=":Plantae:nan:Magnoliopsida:Fabales" child=":Plantae:nan:Magnoliopsida:Fabales:Fabaceae"
parent=":Plantae:nan:Magnoliopsida:Fabales:Fabaceae" child=":Plantae:nan:Magnoliopsida:Fabales:Fabaceae:Phaseoulus"
parent=":Plantae:nan:Magnoliopsida:Fabales:Fabaceae:Phaseoulus" child=":Plantae:nan:Magnoliopsida:Fabales:Fabaceae:Phaseoulus:Phaseolus vulgaris"

— चार्ल्स मरियम
स्रोत

यह एक नेस्टेड के साथ nameऔर childrenसवाल में अनुरोध के रूप में वापस नहीं करता है ।

— Fabio Caccamo

नहीं, यह नहीं है। अनुरोध किया गया था "कुछ इस तरह"; मुझे लगता है कि विचार डेटा संरचना को खोजने की कोशिश के रूप में। पेड़ पर चलने से एक कस्टम संरचना का निर्माण हो सकता है, एक चार लाइन व्यायाम।

— चार्ल्स मेरियम

3

पायथन में, एक पेड़ को एन्कोड करने का एक तरीका है एक का उपयोग करना dict, जहां चाबियाँ नोड्स का प्रतिनिधित्व करती हैं और संबंधित मूल्य नोड का है:

{'Homo sapiens': 'Homo',
 'Canis': 'Canidae',
 'Arabidopsis thaliana': 'Arabidopsis',
 'Phaseolus vulgaris': 'Phaseoulus',
 'Homo': 'Hominidae',
 'Arabidopsis': 'Brassicaceae',
 'Phaseoulus': 'Fabaceae',
 'Hominidae': 'Primates',
 'Canidae': 'Carnivora',
 'Brassicaceae': 'Brassicales',
 'Fabaceae': 'Fabales',
 'Primates': 'Mammalia',
 'Carnivora': 'Mammalia',
 'Brassicales': 'Magnoliopsida',
 'Fabales': 'Magnoliopsida',
 'Mammalia': 'Chordata',
 'Magnoliopsida': 'nan',
 'Chordata': 'Animalia',
 'nan': 'Plantae',
 'Animalia': None,
 'Plantae': None}

इसका एक फायदा यह है कि आप सुनिश्चित करते हैं कि नोड्स अद्वितीय हैं, क्योंकि dictsडुप्लिकेट कुंजियाँ नहीं हो सकती हैं।

यदि आप इसके बजाय एक अधिक सामान्य निर्देशित ग्राफ को सांकेतिक शब्दों में बदलना चाहते हैं (यानी, नोड्स में एक से अधिक माता-पिता हो सकते हैं), आप मानों के लिए सूचियों का उपयोग कर सकते हैं और बच्चों (या माता-पिता, मुझे लगता है) का प्रतिनिधित्व कर सकते हैं:

{'Homo': ['Homo sapiens', 'ManBearPig'],
'Ursus': ['Ursus arctos', 'ManBearPig'],
'Sus': ['ManBearPig']}

यदि आवश्यक हो, तो आप सूचियों के लिए Arrays को प्रतिस्थापित करते हुए JS में वस्तुओं के साथ कुछ ऐसा ही कर सकते हैं।

यहाँ पायथन कोड है जिसका उपयोग मैंने ऊपर पहला पहला लेख बनाने के लिए किया था:

import csv

ROWS = []
# Load file: tbl.csv
with open('tbl.csv', 'r') as in_file:
    csvreader = csv.reader(in_file)

    # Ignore leading row numbers
    ROWS = [row[1:] for row in csvreader]
    # Drop header row
    del ROWS[0]

# Build dict
mytree = {row[i]: row[i-1] for row in ROWS for i in range(len(row)-1, 0, -1)}
# Add top-level nodes
mytree = {**mytree, **{row[0]: None for row in ROWS}}

— dizzy77
स्रोत

2

संभवत: आपके डेटा को पदानुक्रम में बदलने का सबसे सरल तरीका है डी 3 के बिल्ट-इन नेस्टिंग ऑपरेटर का उपयोग d3.nest():

नेस्टिंग एक सरणी में तत्वों को एक श्रेणीबद्ध वृक्ष संरचना में वर्गीकृत करने की अनुमति देता है;

आपके द्वारा प्रमुख कार्यों को पंजीकृत करके nest.key()आप आसानी से अपनी पदानुक्रम की संरचना को निर्दिष्ट कर सकते हैं। जेरार्डो ने अपने जवाब में कहा कि आप .columnsअपने प्रमुख कार्यों को स्वचालित करने के लिए अपने सीएसवी को पार्स करने के बाद डेटा सरणी पर उजागर संपत्ति का उपयोग कर सकते हैं । पूरा कोड निम्नलिखित पंक्तियों में उबलता है:

const nester = d3.nest();                             // Create a nest operator
const [, ...taxonomicRanks] = data.columns;           // Get rid of the RecordID property
taxonomicRanks.forEach(r => nester.key(d => d[r]));   // Register key functions
const nest = nester.entries(data);                    // Calculate hierarchy

हालाँकि, ध्यान दें कि परिणामी पदानुक्रम आपके प्रश्न में मांगी गई संरचना के समान नहीं है क्योंकि वस्तुएं इसके { key, values }बजाय हैं { name, children }; वैसे, यह गेरार्डो के जवाब के लिए भी सही है। यह दोनों जवाबों के लिए चोट नहीं करता है, हालांकि, परिणाम के रूप में d3.hierarchy()एक बच्चे के एक्सेसर फ़ंक्शन को निर्दिष्ट करके भीड़भाड़ की जा सकती है:

d3.hierarchy(nest, d => d.values)   // Second argument is the children accessor

निम्नलिखित डेमो सभी भागों को एक साथ रखता है:

कोड स्निपेट दिखाएं

const csv = `RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis latrans
3,Animalia,Chordata,Mammalia,Cetacea,Delphinidae,Tursiops,Tursiops truncatus
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Pan,Pan paniscus`;

const data = d3.csvParse(csv);

const nester = d3.nest();
const [, ...taxonomicRanks] = data.columns;
taxonomicRanks.forEach(r => nester.key(d => d[r]));
const nest = nester.entries(data);

console.log(nest);

const hierarchy = d3.hierarchy(nest, d => d.values);

console.log(hierarchy);

<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.12.0/d3.js"></script>

स्निपेट का विस्तार करें

आप d3.nest () नाम और बच्चों के लिए महत्वपूर्ण रूपांतरण मूल्यों पर भी एक नज़र डालना चाहते हैं , यदि आपको अपनी पोस्ट की गई संरचना की आवश्यकता महसूस हो।

— altocumulus
स्रोत

आनंद d3.nestजब तक रहता है: यह जल्द ही पदावनत हो जाएगा।

— गेरार्डो फर्टाडो

@GerardoFurtado यह मेरा अपना पहला विचार था। हालाँकि, मुझे इस धारणा के समर्थन में कोई संदर्भ नहीं मिला। मैंने सोचा कि मैंने इसके निष्कासन के बारे में पढ़ा था और अभी भी इसे बंडल में समाहित पाकर आश्चर्यचकित था। डी 3-संग्रह को संग्रहीत किया जाता है, फिर भी इस पर कोई नोट नहीं है। क्या आपके पास इस मामले की कोई विश्वसनीय जानकारी है?

— अल्टोक्यूम्यलस

यह v6 के लिए है, यहाँ देखें । देखो "d3-संग्रह [निकाला गया!]" ।

— गेरार्डो फर्टाडो

@GerardoFurtado नहीं, यह मेरे मन में था संदर्भ नहीं था। फिर भी, यह मेरे सवाल का जवाब देता है, दुख की बात है।

— अल्टोक्यूम्यलस

1

एक मजेदार चुनौती। इस जावास्क्रिप्ट कोड को आज़माएं। मैं सादगी के लिए लोडश के सेट का उपयोग करता हूं।

import { set } from 'lodash'

const csvString = `RecordID,kingdom,phylum,class,order,family,genus,species
    1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
    2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis
    3,Plantae,nan,Magnoliopsida,Brassicales,Brassicaceae,Arabidopsis,Arabidopsis thaliana
    4,Plantae,nan,Magnoliopsida,Fabales,Fabaceae,Phaseoulus,Phaseolus vulgaris`

// First create a quick lookup map
const result = csvString
  .split('\n') // Split for Rows
  .slice(1) // Remove headers
  .reduce((acc, row) => {
    const path = row
      .split(',') // Split for columns
      .filter(item => item !== 'nan') // OPTIONAL: Filter 'nan'
      .slice(1) // Remove record id
    const species = path.pop() // Pull out species (last entry)
    set(acc, path, species)
    return acc
  }, {})

console.log(JSON.stringify(result, null, 2))

// Then convert to the name-children structure by recursively calling this function
const convert = (obj) => {
  // If we're at the end of our chain, end the chain (children is empty)
  if (typeof obj === 'string') {
    return [{
      name: obj,
      children: [],
    }]
  }
  // Else loop through each entry and add them as children
  return Object.entries(obj)
    .reduce((acc, [key, value]) => acc.concat({
      name: key,
      children: convert(value), // Recursive call
    }), [])
}

const result2 = convert(result)

console.log(JSON.stringify(result2, null, 2))

यह अंतिम परिणाम (जैसा) आप चाहते हैं पैदा करता है।

[
  {
    "name": "Animalia",
    "children": [
      {
        "name": "Chordata",
        "children": [
          {
            "name": "Mammalia",
            "children": [
              {
                "name": "Primates",
                "children": [
                  {
                    "name": "Hominidae",
                    "children": [
                      {
                        "name": "Homo",
                        "children": [
                          {
                            "name": "Homo sapiens",
                            "children": []
                          }
                        ]
                      }
                    ]
                  }
                ]
              },
              {
                "name": "Carnivora",
                "children": [
                  {
                    "name": "Canidae",
                    "children": [
                      {
                        "name": "Canis",
                        "children": [
                          {
                            "name": "Canis",
                            "children": []
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "name": "Plantae",
    "children": [
      {
        "name": "Magnoliopsida",
        "children": [
          {
            "name": "Brassicales",
            "children": [
              {
                "name": "Brassicaceae",
                "children": [
                  {
                    "name": "Arabidopsis",
                    "children": [
                      {
                        "name": "Arabidopsis thaliana",
                        "children": []
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "name": "Fabales",
            "children": [
              {
                "name": "Fabaceae",
                "children": [
                  {
                    "name": "Phaseoulus",
                    "children": [
                      {
                        "name": "Phaseolus vulgaris",
                        "children": []
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
]

— ZephDavies
स्रोत

1

वास्तव में, @Charles मेरियम इसका समाधान बहुत ही सुरुचिपूर्ण है।

यदि आप प्रश्न के समान परिणाम बनाना चाहते हैं, तो निम्नलिखित के रूप में प्रयास करें।

from io import StringIO
import csv


CSV_CONTENTS = """RecordID,kingdom,phylum,class,order,family,genus,species
1,Animalia,Chordata,Mammalia,Primates,Hominidae,Homo,Homo sapiens
2,Animalia,Chordata,Mammalia,Carnivora,Canidae,Canis,Canis
3,Plantae,nan,Magnoliopsida,Brassicales,Brassicaceae,Arabidopsis,Arabidopsis thaliana
4,Plantae,nan,Magnoliopsida,Fabales,Fabaceae,Phaseoulus,Phaseolus vulgaris
"""


def recursive(dict_data):
    lst = []
    for key, val in dict_data.items():
        children = recursive(val)
        lst.append(dict(name=key, children=children))
    return lst


def main():
    with StringIO() as io_f:
        io_f.write(CSV_CONTENTS)
        io_f.seek(0)
        io_f.readline()  # skip the column headers line of the file
        result_tree = {}
        for row_data in csv.reader(io_f):
            cur_dict = result_tree  # cursor, back to root
            for item in row_data[1:]:  # each item, skip the record number
                if item not in cur_dict:
                    cur_dict[item] = {}  # create new dict
                    cur_dict = cur_dict[item]
                else:
                    cur_dict = cur_dict[item]

    # change answer format
    result_list = []
    for cur_kingdom_name in result_tree:
        result_list.append(dict(name=cur_kingdom_name, children=recursive(result_tree[cur_kingdom_name])))

    # Optional
    import json
    from os import startfile
    output_file = 'result.json'
    with open(output_file, 'w') as f:
        json.dump(result_list, f)
    startfile(output_file)


if __name__ == '__main__':
    main()

— कार्सन आरुकार्ड
स्रोत