Sourcing data fromat from multiple different structures

Problem

I want to read in the data to dictionary

person = {

    'name': 'John Doe',

    'email': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

The data comes from different formats:

Format A.

dict_a = {

    'name': {

        'first_name': 'John',

        'last_name': 'Doe'

    },

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

Format B.

dict_b = {

    'fullName': 'John Doe',

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

There will ba additional sources added in the future with additional structures.

Background

For this specific case, I'm building a Scrapy spider that scrapes the data from different APIs and web pages. Scrapy's recommended way would be to use their Item or ItemLoader, but it's ruled out in my case.

There could be potentially 5-10 different structures from which the data will be read from.

My ideas so far

One potential way to solve this by taking advantage of polymorphism, where I create a Person class

class Person:

    def __init__(self, name, email, age, connected):

        self.name = name

        self.email = email

        self.age = age

        self.connected = connected

and subclass it to all the "data mappers" of different data structures, e.g.

class FormatA(Person):

    def __init__(self, dict_a):

        self.name = ' '.join([dict_a.get('name').get(key) for key in ['first_name', 'last_name']])

        self.email = dict_a.get('workEmail')

        self.age = dict_a.get('age')

        self.connected = dict_a.get('connected')



class FormatB(Person):

    def __init__(self, dict_b):

        self.name = dict_b.get('fullName')

        self.email = dict_b.get('workEmail')

        self.age = dict_b.get('age')

        self.connected = dict_b.get('connected')

Now let's say I want to store these objects with SQLAlchemy

from sqlalchemy import Column, Integer, String, Boolean

class Person(Base):

    __tablename__ = 'Person'

    id = Column(Integer, primary_key=True)

    name = Column(String)

    email = Column(String)

    age = Column(Integer)

    connected = Column(Boolean)

So now I can unpack the FormatA.__dict__ or FormatB.__dict__ to instantiate a new SQLAlchemy object like this:

person = Person(**FormatA.__dict__)

person.add()

person.commit()

Question

I have limited experience in python programming and I'm building my first scraper application for a data engineering project that needs to scale to storing millions of Persons from tens of different structures I was wondering if:

This is a good solution? What could be the drawbacks and problems down the line?

Is there a better industry tested solution in python that is in use for this input data mapping?

asked 5 mins ago

Maivel

New contributor

add a comment |

Problem

I want to read in the data to dictionary

person = {

    'name': 'John Doe',

    'email': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

The data comes from different formats:

Format A.

dict_a = {

    'name': {

        'first_name': 'John',

        'last_name': 'Doe'

    },

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

Format B.

dict_b = {

    'fullName': 'John Doe',

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

There will ba additional sources added in the future with additional structures.

Background

There could be potentially 5-10 different structures from which the data will be read from.

My ideas so far

One potential way to solve this by taking advantage of polymorphism, where I create a Person class

class Person:

    def __init__(self, name, email, age, connected):

        self.name = name

        self.email = email

        self.age = age

        self.connected = connected

and subclass it to all the "data mappers" of different data structures, e.g.

class FormatA(Person):

    def __init__(self, dict_a):

        self.name = ' '.join([dict_a.get('name').get(key) for key in ['first_name', 'last_name']])

        self.email = dict_a.get('workEmail')

        self.age = dict_a.get('age')

        self.connected = dict_a.get('connected')



class FormatB(Person):

    def __init__(self, dict_b):

        self.name = dict_b.get('fullName')

        self.email = dict_b.get('workEmail')

        self.age = dict_b.get('age')

        self.connected = dict_b.get('connected')

Now let's say I want to store these objects with SQLAlchemy

from sqlalchemy import Column, Integer, String, Boolean

class Person(Base):

    __tablename__ = 'Person'

    id = Column(Integer, primary_key=True)

    name = Column(String)

    email = Column(String)

    age = Column(Integer)

    connected = Column(Boolean)

So now I can unpack the FormatA.__dict__ or FormatB.__dict__ to instantiate a new SQLAlchemy object like this:

person = Person(**FormatA.__dict__)

person.add()

person.commit()

Question

This is a good solution? What could be the drawbacks and problems down the line?

Is there a better industry tested solution in python that is in use for this input data mapping?

asked 5 mins ago

Maivel

New contributor

add a comment |

Problem

I want to read in the data to dictionary

person = {

    'name': 'John Doe',

    'email': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

The data comes from different formats:

Format A.

dict_a = {

    'name': {

        'first_name': 'John',

        'last_name': 'Doe'

    },

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

Format B.

dict_b = {

    'fullName': 'John Doe',

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

There will ba additional sources added in the future with additional structures.

Background

There could be potentially 5-10 different structures from which the data will be read from.

My ideas so far

One potential way to solve this by taking advantage of polymorphism, where I create a Person class

class Person:

    def __init__(self, name, email, age, connected):

        self.name = name

        self.email = email

        self.age = age

        self.connected = connected

and subclass it to all the "data mappers" of different data structures, e.g.

class FormatA(Person):

    def __init__(self, dict_a):

        self.name = ' '.join([dict_a.get('name').get(key) for key in ['first_name', 'last_name']])

        self.email = dict_a.get('workEmail')

        self.age = dict_a.get('age')

        self.connected = dict_a.get('connected')



class FormatB(Person):

    def __init__(self, dict_b):

        self.name = dict_b.get('fullName')

        self.email = dict_b.get('workEmail')

        self.age = dict_b.get('age')

        self.connected = dict_b.get('connected')

Now let's say I want to store these objects with SQLAlchemy

from sqlalchemy import Column, Integer, String, Boolean

class Person(Base):

    __tablename__ = 'Person'

    id = Column(Integer, primary_key=True)

    name = Column(String)

    email = Column(String)

    age = Column(Integer)

    connected = Column(Boolean)

So now I can unpack the FormatA.__dict__ or FormatB.__dict__ to instantiate a new SQLAlchemy object like this:

person = Person(**FormatA.__dict__)

person.add()

person.commit()

Question

This is a good solution? What could be the drawbacks and problems down the line?

Is there a better industry tested solution in python that is in use for this input data mapping?

asked 5 mins ago

Maivel

New contributor

Problem

I want to read in the data to dictionary

person = {

    'name': 'John Doe',

    'email': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

The data comes from different formats:

Format A.

dict_a = {

    'name': {

        'first_name': 'John',

        'last_name': 'Doe'

    },

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

Format B.

dict_b = {

    'fullName': 'John Doe',

    'workEmail': 'johndoe@email.com',

    'age': 50,

    'connected': False

}

There will ba additional sources added in the future with additional structures.

Background

There could be potentially 5-10 different structures from which the data will be read from.

My ideas so far

One potential way to solve this by taking advantage of polymorphism, where I create a Person class

class Person:

    def __init__(self, name, email, age, connected):

        self.name = name

        self.email = email

        self.age = age

        self.connected = connected

and subclass it to all the "data mappers" of different data structures, e.g.

class FormatA(Person):

    def __init__(self, dict_a):

        self.name = ' '.join([dict_a.get('name').get(key) for key in ['first_name', 'last_name']])

        self.email = dict_a.get('workEmail')

        self.age = dict_a.get('age')

        self.connected = dict_a.get('connected')



class FormatB(Person):

    def __init__(self, dict_b):

        self.name = dict_b.get('fullName')

        self.email = dict_b.get('workEmail')

        self.age = dict_b.get('age')

        self.connected = dict_b.get('connected')

Now let's say I want to store these objects with SQLAlchemy

from sqlalchemy import Column, Integer, String, Boolean

class Person(Base):

    __tablename__ = 'Person'

    id = Column(Integer, primary_key=True)

    name = Column(String)

    email = Column(String)

    age = Column(Integer)

    connected = Column(Boolean)

So now I can unpack the FormatA.__dict__ or FormatB.__dict__ to instantiate a new SQLAlchemy object like this:

person = Person(**FormatA.__dict__)

person.add()

person.commit()

Question

This is a good solution? What could be the drawbacks and problems down the line?

Is there a better industry tested solution in python that is in use for this input data mapping?

python design-patterns scrapy

asked 5 mins ago

Maivel

New contributor

asked 5 mins ago

Maivel

New contributor

asked 5 mins ago

Maivel

New contributor

asked 5 mins ago

Maivel

asked 5 mins ago

Maivel

New contributor

Maivel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Maivel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211007%2fsourcing-data-fromat-from-multiple-different-structures%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Maivel is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Maivel is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

YW sc6LAfGsIhR,cc5k8fO,P6SqF7,5IY1w IGwfWVFRiwyTqP BmbS 5F6GQ O,x

搜尋此網誌

Krdytkyu

Sourcing data fromat from multiple different structures

Problem

Background

My ideas so far

Question

Problem

Background

My ideas so far

Question

Problem

Background

My ideas so far

Question

Problem

Background

My ideas so far

Question

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Orthoptera

Ellipse (mathématiques)

Quarter-circle Tiles